Replace all non-alphanumeric characters in a Python string

avatar

Borislav Hadzhiev

Last updated: Aug 19, 2022

banner

Photo from Unsplash

Replace all non-alphanumeric characters in a Python string #

Use the re.sub() method to replace all non-alphanumeric characters in a string, e.g. new_str = re.sub(r'[^a-zA-Z0-9]', '|', my_str). The re.sub() method will return a new string where all occurrences of non-alphanumeric characters are replaced by the provided replacement.

main.py
import re my_str = 'apple, kiwi, banana' # ✅ Replace all non-alphanumeric characters in string (re.sub()) new_str = re.sub(r'[^a-zA-Z0-9]', '|', my_str) print(new_str) # 👉️ 'apple||kiwi||banana' # ✅ Replace one or more consecutive non-alphanumeric characters with a single character new_str = re.sub(r'[^a-zA-Z0-9]+', '|', my_str) print(new_str) # 👉️ 'apple|kiwi|banana' # ✅ Replace all non-alphanumeric characters in string, preserving whitespace new_str = re.sub(r'[^a-zA-Z0-9\s]', '|', my_str) print(new_str) # 👉️ 'apple| kiwi| banana' # ---------------------------------------------------------------- # ✅ Replace all non-alphanumeric characters in string (generator expression) new_str = ''.join(char if char.isalnum() else '|' for char in my_str) print(new_str) # 👉️ 'apple||kiwi||banana' # ✅ Preserve whitespace new_str = ''.join(char if char.isalnum() or char == ' ' else '|' for char in my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

The first example uses the re.sub() method to replace all non-alphanumeric characters in a string.

The re.sub method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

main.py
import re my_str = 'apple, kiwi, banana' new_str = re.sub(r'[^a-zA-Z0-9]', '|', my_str) print(new_str) # 👉️ 'apple||kiwi||banana' new_str = re.sub(r'[^a-zA-Z0-9]+', '|', my_str) print(new_str) # 👉️ 'apple|kiwi|banana' new_str = re.sub(r'[^a-zA-Z0-9\s]', '|', my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT lowercase or uppercase letters or numbers.

The a-z and A-Z character ranges match the lowercase and uppercase letters in the range.

If you need to replace multiple, consecutive non-alphanumeric characters with a single replacement string, add a plus + at the end of the regex.

main.py
import re my_str = 'apple, kiwi, banana' new_str = re.sub(r'[^a-zA-Z0-9]+', '|', my_str) print(new_str) # 👉️ 'apple|kiwi|banana'
The plus + matches the preceding character (any non-letter or non-number) 1 or more times.

We used a pipe | as the replacement character in the examples, however, you can use any other replacement string.

If you need to replace all non-alphanumeric characters in a string and preserve the whitespace, use the following regular expression.

main.py
import re my_str = 'apple, kiwi, banana' new_str = re.sub(r'[^a-zA-Z0-9\s]', '|', my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

The \s character matches unicode whitespace characters like [ \t\n\r\f\v].

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

Alternatively, you can use a generator expression.

To replace all non-alphanumeric characters in a string:

  1. Use a generator expression to iterate over the string.
  2. Return the character if it's alphanumeric, otherwise return the replacement.
  3. Use the join() method to join the characters into a string.
main.py
my_str = 'apple, kiwi, banana' new_str = ''.join(char if char.isalnum() else '|' for char in my_str) print(new_str) # 👉️ 'apple||kiwi||banana' # ✅ Preserve whitespace new_str = ''.join(char if char.isalnum() or char == ' ' else '|' for char in my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalnum() method to check if the current character is alphanumeric.

The str.isalnum method returns True if all characters in the string are alphanumeric and the string contains at least one character, otherwise the method returns False.

main.py
print('C'.isalnum()) # 👉️ True print('^'.isalnum()) # 👉️ False

If the character is alphanumeric, we return the character, otherwise we return the replacement string.

The last step is to join the list of characters into a string.

main.py
my_str = 'apple, kiwi, banana' new_str = ''.join(char if char.isalnum() else '|' for char in my_str) print(new_str) # 👉️ 'apple||kiwi||banana'

The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

For our purposes, we call the join() method on an empty string to join the characters without a separator.

If you need to preserve the whitespace, use the boolean or operator.

main.py
my_str = 'apple, kiwi, banana' new_str = ''.join(char if char.isalnum() or char == ' ' else '|' for char in my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

We used the boolean or operator, so for the character to be added to the generator object, one of the conditions has to be met.

The character has to be alphanumeric or it has to be a space.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.