Remove non-alphanumeric characters from a Python string

avatar

Borislav Hadzhiev

Last updated: Aug 19, 2022

banner

Photo from Unsplash

Remove non-alphanumeric characters from a Python string #

Use the re.sub() method to remove all non-alphanumeric characters from a string, e.g. new_str = re.sub(r'[\W_]', '', my_str). The re.sub() method will remove all non-alphanumeric characters from the string by replacing them with empty strings.

main.py
import re my_str = 'one !two@ three' # ✅ Remove all non-alphanumeric characters from string (re.sub()) new_str = re.sub(r'[\W_]', '', my_str) print(new_str) # 👉️ 'onetwothree' # ✅ Remove all non-alphanumeric characters from string, preserving whitespace new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # 👉️ 'one two three' # -------------------------------- # ✅ Remove all non-alphanumeric characters from string (generator expression) new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # 👉️ 'onetwothree' # ✅ Remove all non-alphanumeric characters from string, preserving whitespace new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ') print(new_str) # 👉️ 'one two three'

The first example uses the re.sub() method to remove all non-alphanumeric characters from a string.

The re.sub method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

main.py
import re my_str = 'one !two@ three' new_str = re.sub(r'[\W_]', '', my_str) print(new_str) # 👉️ 'onetwothree' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # 👉️ 'one two three'

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The \W (capital W) special character matches any character that is not a word character.

We remove all non-alphanumeric characters by replacing each with an empty string.

If you want to preserve the whitespace and remove all non-alphanumeric characters, use the following regular expression.

main.py
import re my_str = 'one !two@ three' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # 👉️ 'one two three'
The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT Unicode word characters, numbers, underscores or spaces.

The \w character is the opposite of the \W character and matches:

  • characters that can be part of a word in any language
  • numbers
  • the underscore character

The \s character matches unicode whitespace characters like [ \t\n\r\f\v].

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

If your string has multiple spaces next to one another, you might have to replace multiple consecutive spaces with a single space.

main.py
import re my_str = 'one !two@ three' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # 👉️ 'one two three' result = " ".join(new_str.split()) print(result) # 👉️ 'one two three'
The str.split() method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.

Alternatively, you can use a generator expression.

To remove all alphanumeric characters from a string:

  1. Use a generator expression to iterate over the string.
  2. Use the str.isalnum() method to check if each character is alphanumeric.
  3. Use the str.join() method to join the alphanumeric characters.
main.py
my_str = 'one !two@ three' new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # 👉️ 'onetwothree' new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ') print(new_str) # 👉️ 'one two three'

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalnum() method to check if the current character is alphanumeric and we return the result.

The str.isalnum method returns True if all characters in the string are alphanumeric and the string contains at least one character, otherwise the method returns False.

main.py
print('A'.isalnum()) # 👉️ True print('!'.isalnum()) # 👉️ False

The generator object contains only alphanumeric characters.

The last step is to use the str.join() method to join the alphanumeric characters into a string.

main.py
my_str = 'one !two@ three' new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # 👉️ 'onetwothree'

The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

For our purposes, we call the join() method on an empty string to join the alphanumeric characters without a separator.

If you want to remove the non-alphanumeric characters and preserve the whitespace, use the boolean or operator.

main.py
my_str = 'one !two@ three' new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ') print(new_str) # 👉️ 'one two three'

We used the boolean or operator, so for the character to be added to the generator object, one of the conditions has to be met.

The character has to be alphanumeric or it has to be a space.

If your string contains multiple spaces next to one another, you might have to replace one or more spaces with a single space.

main.py
my_str = 'one !two@ three' new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ') print(new_str) # 👉️ 'one two three' result = " ".join(new_str.split()) print(result) # 👉️ 'one two three'
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.