Last updated: Apr 10, 2024
Reading timeยท5 min
Use the re.sub()
method to remove all special characters except for space
from a string.
The re.sub()
method will remove all special characters except for space by
replacing them with empty strings.
import re a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = re.sub(r'[^a-zA-Z0-9\s]+', '', a_string) print(new_string) # ๐๏ธ 'bobby hadz com'
The re.sub() method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.
The first argument we passed to the re.sub()
method is a regular expression.
The square []
brackets are used to indicate a set of characters.
import re a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = re.sub(r'[^a-zA-Z0-9\s]+', '', a_string) print(new_string) # ๐๏ธ 'bobby hadz com'
a-z
and A-Z
characters represent lowercase and uppercase ranges of letters.The 0-9
characters match the digits in the range.
The \s
character matches Unicode whitespace characters like [ \t\n\r\f\v]
.
^
at the beginning of the set means "NOT". In other words, match all non-letters, non-digits and non-spaces and replace them with empty strings (remove them).You can add more characters between the square brackets if you want to keep them.
If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.
The page contains a list of all of the special characters with many useful examples.
To remove the special character except for space from a text, file or multiline string:
str.splitlines()
method to split the text into a list of lines.for
loop to iterate over the list.re.sub()
method to remove the special characters except for space
from each line.import re text = """b!o@b#b%y h^a*d&z dot c.o,m """ a_list = [] for line in text.splitlines(): updated_line = re.sub(r'[^a-zA-Z0-9\s]+', '', line) print(updated_line) a_list.append(updated_line) print(a_list) # ๐๏ธ ['bobby hadz', 'dot com'] updated_text = '\n'.join(a_list) # bobby hadz # dot com print(updated_text)
The str.splitlines()
method splits the string on newline characters and
returns a list containing the lines in the string.
multiline_str = """\ bobby hadz com""" lines = multiline_str.splitlines() print(lines) # ๐๏ธ ['bobby', 'hadz', 'com']
The last step is to use the str.join()
method to join the list of updated
lines with a newline (\n
) character separator.
The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.
The string the method is called on is used as the separator between the elements.
Alternatively, you can use the str.isalnum()
method.
This is a three-step process:
str.join()
method to join the matching characters into a string.a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = ''.join(char for char in a_string if char.isalnum() or char.isspace()) print(new_string) # ๐๏ธ bobby hadz com
We used a generator expression to iterate over the string.
On each iteration, we use the str.isalnum()
method to check if the current
characters is alphanumeric.
The str.isalnum()
method returns True
if all characters in the string are alphanumeric and the
string contains at least one character, otherwise, the method returns False
.
print('bobby123'.isalnum()) # ๐๏ธ True # ๐๏ธ contains space print('bobby hadz'.isalnum()) # ๐๏ธ False
We used the boolean OR operator to keep the whitespace characters as well.
a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = ''.join(char for char in a_string if char.isalnum() or char.isspace()) print(new_string) # ๐๏ธ bobby hadz com
The str.isspace
method returns True
if there are only whitespace characters in the string and
there is at least one character, otherwise False
is returned.
print(' '.isspace()) # ๐๏ธ True print(''.isspace()) # ๐๏ธ False print('b'.isspace()) # ๐๏ธ False
The last step is to use the str.join()
method to join the remaining characters
into a string.
Use the re.split()
method to split a string on all special characters.
The re.split()
method takes a pattern and a string and splits the string on
each occurrence of the pattern.
import re my_str = "hello<one!two>three.four!five'six" my_list = re.split(r'[`!@#$%^&*()_+\-=\[\]{};\':"\\|,.<>\/?~]', my_str) # ๐๏ธ ['hello', 'one', 'two', 'three', 'four', 'five', 'six'] print(my_list)
We used the re.split() method to split a string on all occurrences of a special character.
The square brackets are used to indicate a set of characters.
Make sure that all characters you consider special characters are in the set.
You can add or remove characters according to your use case.
Alternatively, you can use a regular expression that matches any character that is not a letter, a digit or a space.
import re my_str = "hello<one!two>three.four!five'six" my_list = re.split(r'[^a-zA-Z0-9\s]', my_str) # ๐๏ธ ['hello', 'one', 'two', 'three', 'four', 'five', 'six'] print(my_list)
The caret ^
at the beginning of the set means "NOT". In other words, match all
characters that are NOT lowercase letters a-z
, uppercase letters A-Z
, digits
0-9
or whitespace \s
characters.
You can tweak the regular expression according to your use case. This section of the docs has information regarding what each special character does.
You can learn more about the related topics by checking out the following tutorials: