Last updated: Apr 9, 2024
Reading timeยท8 min
Use the re.sub()
method to remove all non-alphanumeric characters from a
string.
The re.sub()
method will remove all non-alphanumeric characters from the
string by replacing them with empty strings.
import re my_str = 'bobby !hadz@ com 123' # โ Remove all non-alphanumeric characters from string new_str = re.sub(r'[\W_]', '', my_str) print(new_str) # ๐๏ธ 'bobbyhadzcom123' # ----------------------------------------------- # โ Remove all non-alphanumeric characters from string, # preserving whitespace new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # ๐๏ธ 'bobby hadz com 123'
If you need to remove the non-alphabetic characters from a string, click on the following subheading.
The example uses the re.sub()
method to remove all non-alphanumeric characters
from a string.
The re.sub() method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.
If the pattern isn't found, the string is returned as is.
The first argument we passed to the re.sub()
method is a regular expression.
The square brackets []
are used to indicate a set of characters.
The \W
(capital W
) special character matches any character that is not a
word character.
If you want to preserve the whitespace and remove all non-alphanumeric characters, use the following regular expression.
import re my_str = 'bobyb !hadz@ com 123' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # ๐๏ธ 'bobyb hadz com 123'
^
at the beginning of the set means "NOT". In other words, match all characters that are NOT Unicode word characters, numbers, underscores or spaces.The \w
character is the opposite of the \W
character and matches:
The \s
character matches Unicode whitespace characters like [ \t\n\r\f\v]
.
If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.
The page contains a list of all of the special characters with many useful examples.
If your string has multiple spaces next to one another, you might have to replace multiple consecutive spaces with a single space.
import re my_str = 'bobby !hadz@ com 123' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # ๐๏ธ 'bobby hadz com 123' result = " ".join(new_str.split()) print(result) # ๐๏ธ 'bobby hadz com 123'
str.split()
method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.Alternatively, you can use a generator expression.
This is a three-step process:
str.isalnum()
method to check if each character is alphanumeric.str.join()
method to join the alphanumeric characters.my_str = 'bobby !hadz@ com 123' new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # ๐๏ธ 'bobbyhadzcom123' new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ') print(new_str) # ๐๏ธ 'bobby hadz com 123'
We used a generator expression to iterate over the string.
On each iteration, we use the str.isalnum()
method to check if the current
character is alphanumeric and return the result.
The str.isalnum()
method returns True
if all characters in the string are alphanumeric and the
string contains at least one character, otherwise, the method returns False
.
print('A'.isalnum()) # ๐๏ธ True print('!'.isalnum()) # ๐๏ธ False print('5'.isalnum()) # ๐๏ธ True
The generator object contains only alphanumeric characters.
The last step is to use the str.join()
method to join the alphanumeric
characters into a string.
my_str = 'bobby !hadz@ com 123' new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # ๐๏ธ 'bobbyhadzcom123'
The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.
The string the method is called on is used as the separator between the elements.
join()
method on an empty string to join the alphanumeric characters without a separator.If you want to remove the non-alphanumeric characters and preserve the whitespace, use the boolean OR operator.
my_str = 'bobby !hadz@ com 123' new_str = ''.join( char for char in my_str if char.isalnum() or char == ' ' ) print(new_str) # ๐๏ธ 'bobby hadz com 123'
We used the boolean or
operator, so for the character to be added to the
generator object, one of the conditions has to be met.
The character has to be alphanumeric or it has to be a space.
You can also use the filter()
function to remove all non-alphanumeric
characters from a string.
my_str = 'bobby !hadz@ com 123' new_str = ''.join(filter(str.isalnum, my_str)) print(new_str) # ๐๏ธ bobbyhadzcom123
The fnmatch.filter() method takes an iterable and a pattern and returns a new list containing only the elements of the iterable that match the provided pattern.
We passed the str.isalnum
method to filter()
so the method gets called with
each character in the string.
The filter
method returns a new object containing only the characters for
which the str.isalnum()
method returned True
.
The last step is to use the str.join()
method to join the filter
object into
a string.
The re.sub()
method can also be used to remove all non-alphabetic characters
from a string.
import re my_str = 'bobby! hadz@ com' # โ Remove all non-alphabetic characters from string (re.sub()) new_str = re.sub(r'[^a-zA-Z]', '', my_str) print(new_str) # ๐๏ธ 'bobbyhadzcom' # ----------------------------------------------------- # โ Remove all non-alphabetic characters from string, preserving whitespace new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐๏ธ 'bobby hadz com'
The example uses the re.sub()
method to remove all non-alphabetic characters
from a string.
The re.sub method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.
import re my_str = 'bobby! hadz@ com' new_str = re.sub(r'[^a-zA-Z]', '', my_str) print(new_str) # ๐๏ธ 'bobbyhadzcom' new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐๏ธ 'bobby hadz com'
If the pattern isn't found, the string is returned as is.
The first argument we passed to the re.sub()
method is a regular expression.
The square brackets []
are used to indicate a set of characters.
^
at the beginning of the set means "NOT". In other words, match all characters that are NOT letters.The a-z
and A-Z
characters represent lowercase and uppercase letter ranges.
If you need to remove all non-alphabetic characters and preserve the whitespace, use the following regular expression.
import re my_str = 'bobby! hadz@ com' new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐๏ธ 'bobby hadz com'
The \s
character matches Unicode whitespace characters like [ \t\n\r\f\v]
.
In its entirety, the regular expression matches all non-letters or whitespace characters.
If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.
The page contains a list of all of the special characters with many useful examples.
If your string has multiple spaces next to one another, you might have to replace multiple consecutive spaces with a single space.
import re my_str = 'bobby! hadz@ com' new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐๏ธ 'bobby hadz com' result = ' '.join(new_str.split()) print(result) # ๐๏ธ 'bobby hadz com'
str.split()
method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.Alternatively, you can use a generator expression.
This is a three-step process:
str.isalpha()
method to check if each character is alphabetic.str.join()
method to join the alphabetic characters.my_str = 'bobby! hadz@ com' new_str = ''.join( char for char in my_str if char.isalpha() ) print(new_str) # ๐๏ธ 'bobbyhadzcom' new_str = ''.join( char for char in my_str if char.isalpha() or char == ' ' ) print(new_str) # ๐๏ธ 'bobby hadz com'
We used a generator expression to iterate over the string.
On each iteration, we use the str.isalpha()
method to check if the current
character is alphabetic and we return the result.
The str.isalpha
method returns True
if all characters in the string are
alphabetic and there is at least one character, otherwise, the method returns
False
.
print('H'.isalpha()) # ๐๏ธ True print('@'.isalpha()) # ๐๏ธ False
The generator object contains only alphabetic characters.
my_str = 'bobby! hadz@ com' new_str = ''.join( char for char in my_str if char.isalpha() ) print(new_str) # ๐๏ธ 'bobbyhadzcom'
The last step is to use the str.join()
method to join the alphabetic
characters into a string.
The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.
The string the method is called on is used as the separator between the elements.
join()
method on an empty string to join the alphabetic characters without a separator.If you want to remove the non-alphabetic characters and preserve the whitespace,
use the boolean or
operator.
my_str = 'bobby! hadz@ com' new_str = ''.join( char for char in my_str if char.isalpha() or char == ' ' ) print(new_str) # ๐๏ธ 'bobby hadz com'
We used the boolean or
operator, so for the character to be added to the
generator object, one of the conditions has to be met.
The character has to be alphabetic or it has to be a space.
This is a three-step process:
str.isalpha()
method and the string to the filter()
function.str.isalpha()
method will filter out all non-letter characters.str.join()
method to join the result into a string.a_string = 'bobby123hadz456.com' only_letters = ''.join( filter( str.isalpha, a_string ) ) print(only_letters) # ๐๏ธ bobbyhadzcom
The filter() function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.
We passed the str.isalpha()
method to the filter()
function.
str.isalpha()
method gets called with each character in the string and returns True
if the character is a letter.The last step is to use the str.join()
method to join all matching characters
into a string.
Which approach you pick is a matter of personal preference. I'd use the
str.isalpha()
method with a generator expression because the approach is quite
direct and intuitive.
You can learn more about the related topics by checking out the following tutorials: