Remove non-alphanumeric characters from a Python string

avatar
Borislav Hadzhiev

Last updated: Apr 9, 2024
8 min

banner

# Table of Contents

  1. Remove non-alphanumeric characters from a Python string
  2. Remove all non-alphabetic characters from String in Python

# Remove non-alphanumeric characters from a Python string

Use the re.sub() method to remove all non-alphanumeric characters from a string.

The re.sub() method will remove all non-alphanumeric characters from the string by replacing them with empty strings.

main.py
import re my_str = 'bobby !hadz@ com 123' # โœ… Remove all non-alphanumeric characters from string new_str = re.sub(r'[\W_]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom123' # ----------------------------------------------- # โœ… Remove all non-alphanumeric characters from string, # preserving whitespace new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com 123'
The code for this article is available on GitHub

If you need to remove the non-alphabetic characters from a string, click on the following subheading.

The example uses the re.sub() method to remove all non-alphanumeric characters from a string.

The re.sub() method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The \W (capital W) special character matches any character that is not a word character.

We remove all non-alphanumeric characters by replacing each with an empty string.

# Remove non-alphanumeric characters but preserve the whitespace

If you want to preserve the whitespace and remove all non-alphanumeric characters, use the following regular expression.

main.py
import re my_str = 'bobyb !hadz@ com 123' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobyb hadz com 123'
The code for this article is available on GitHub
The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT Unicode word characters, numbers, underscores or spaces.

The \w character is the opposite of the \W character and matches:

  • characters that can be part of a word in any language
  • numbers
  • the underscore character

The \s character matches Unicode whitespace characters like [ \t\n\r\f\v].

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

If your string has multiple spaces next to one another, you might have to replace multiple consecutive spaces with a single space.

main.py
import re my_str = 'bobby !hadz@ com 123' new_str = re.sub(r'[^\w\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com 123' result = " ".join(new_str.split()) print(result) # ๐Ÿ‘‰๏ธ 'bobby hadz com 123'
The str.split() method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.

Alternatively, you can use a generator expression.

# Remove non-alphanumeric characters from a string using a generator expression

This is a three-step process:

  1. Use a generator expression to iterate over the string.
  2. Use the str.isalnum() method to check if each character is alphanumeric.
  3. Use the str.join() method to join the alphanumeric characters.
main.py
my_str = 'bobby !hadz@ com 123' new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom123' new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ') print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com 123'
The code for this article is available on GitHub

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalnum() method to check if the current character is alphanumeric and return the result.

The str.isalnum() method returns True if all characters in the string are alphanumeric and the string contains at least one character, otherwise, the method returns False.

main.py
print('A'.isalnum()) # ๐Ÿ‘‰๏ธ True print('!'.isalnum()) # ๐Ÿ‘‰๏ธ False print('5'.isalnum()) # ๐Ÿ‘‰๏ธ True

The generator object contains only alphanumeric characters.

The last step is to use the str.join() method to join the alphanumeric characters into a string.

main.py
my_str = 'bobby !hadz@ com 123' new_str = ''.join(char for char in my_str if char.isalnum()) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom123'

The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

For our purposes, we call the join() method on an empty string to join the alphanumeric characters without a separator.

If you want to remove the non-alphanumeric characters and preserve the whitespace, use the boolean OR operator.

main.py
my_str = 'bobby !hadz@ com 123' new_str = ''.join( char for char in my_str if char.isalnum() or char == ' ' ) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com 123'

We used the boolean or operator, so for the character to be added to the generator object, one of the conditions has to be met.

The character has to be alphanumeric or it has to be a space.

# Remove non-alphanumeric characters from a string using filter()

You can also use the filter() function to remove all non-alphanumeric characters from a string.

main.py
my_str = 'bobby !hadz@ com 123' new_str = ''.join(filter(str.isalnum, my_str)) print(new_str) # ๐Ÿ‘‰๏ธ bobbyhadzcom123
The code for this article is available on GitHub

The fnmatch.filter() method takes an iterable and a pattern and returns a new list containing only the elements of the iterable that match the provided pattern.

We passed the str.isalnum method to filter() so the method gets called with each character in the string.

The filter method returns a new object containing only the characters for which the str.isalnum() method returned True.

The last step is to use the str.join() method to join the filter object into a string.

# Remove all non-alphabetic characters from String in Python

The re.sub() method can also be used to remove all non-alphabetic characters from a string.

main.py
import re my_str = 'bobby! hadz@ com' # โœ… Remove all non-alphabetic characters from string (re.sub()) new_str = re.sub(r'[^a-zA-Z]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom' # ----------------------------------------------------- # โœ… Remove all non-alphabetic characters from string, preserving whitespace new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com'
The code for this article is available on GitHub

The example uses the re.sub() method to remove all non-alphabetic characters from a string.

The re.sub method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

main.py
import re my_str = 'bobby! hadz@ com' new_str = re.sub(r'[^a-zA-Z]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom' new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com'

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT letters.

The a-z and A-Z characters represent lowercase and uppercase letter ranges.

# Remove all non-alphabetic characters, but preserve the whitespace

If you need to remove all non-alphabetic characters and preserve the whitespace, use the following regular expression.

main.py
import re my_str = 'bobby! hadz@ com' new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com'
The code for this article is available on GitHub

The \s character matches Unicode whitespace characters like [ \t\n\r\f\v].

In its entirety, the regular expression matches all non-letters or whitespace characters.

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

If your string has multiple spaces next to one another, you might have to replace multiple consecutive spaces with a single space.

main.py
import re my_str = 'bobby! hadz@ com' new_str = re.sub(r'[^a-zA-Z\s]', '', my_str) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com' result = ' '.join(new_str.split()) print(result) # ๐Ÿ‘‰๏ธ 'bobby hadz com'
The str.split() method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.

Alternatively, you can use a generator expression.

# Remove all non-alphabetic characters from String using generator expression

This is a three-step process:

  1. Use a generator expression to iterate over the string.
  2. Use the str.isalpha() method to check if each character is alphabetic.
  3. Use the str.join() method to join the alphabetic characters.
main.py
my_str = 'bobby! hadz@ com' new_str = ''.join( char for char in my_str if char.isalpha() ) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom' new_str = ''.join( char for char in my_str if char.isalpha() or char == ' ' ) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com'
The code for this article is available on GitHub

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalpha() method to check if the current character is alphabetic and we return the result.

The str.isalpha method returns True if all characters in the string are alphabetic and there is at least one character, otherwise, the method returns False.

main.py
print('H'.isalpha()) # ๐Ÿ‘‰๏ธ True print('@'.isalpha()) # ๐Ÿ‘‰๏ธ False

The generator object contains only alphabetic characters.

main.py
my_str = 'bobby! hadz@ com' new_str = ''.join( char for char in my_str if char.isalpha() ) print(new_str) # ๐Ÿ‘‰๏ธ 'bobbyhadzcom'

The last step is to use the str.join() method to join the alphabetic characters into a string.

The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

For our purposes, we call the join() method on an empty string to join the alphabetic characters without a separator.

If you want to remove the non-alphabetic characters and preserve the whitespace, use the boolean or operator.

main.py
my_str = 'bobby! hadz@ com' new_str = ''.join( char for char in my_str if char.isalpha() or char == ' ' ) print(new_str) # ๐Ÿ‘‰๏ธ 'bobby hadz com'

We used the boolean or operator, so for the character to be added to the generator object, one of the conditions has to be met.

The character has to be alphabetic or it has to be a space.

# Remove all non-alphabetic characters from String using filter()

This is a three-step process:

  1. Pass the str.isalpha() method and the string to the filter() function.
  2. The str.isalpha() method will filter out all non-letter characters.
  3. Use the str.join() method to join the result into a string.
main.py
a_string = 'bobby123hadz456.com' only_letters = ''.join( filter( str.isalpha, a_string ) ) print(only_letters) # ๐Ÿ‘‰๏ธ bobbyhadzcom
The code for this article is available on GitHub

The filter() function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.

We passed the str.isalpha() method to the filter() function.

The str.isalpha() method gets called with each character in the string and returns True if the character is a letter.

The last step is to use the str.join() method to join all matching characters into a string.

Which approach you pick is a matter of personal preference. I'd use the str.isalpha() method with a generator expression because the approach is quite direct and intuitive.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev