Remove special characters except Space from String in Python

avatar
Borislav Hadzhiev

Last updated: Apr 10, 2024
5 min

banner

# Table of Contents

  1. Remove special characters except Space from String in Python
  2. Remove special characters except Space from text, file or multiline string
  3. Remove special characters except Space from String using str.isalnum()
  4. Split a string on all special characters in Python

# Remove special characters except Space from String in Python

Use the re.sub() method to remove all special characters except for space from a string.

The re.sub() method will remove all special characters except for space by replacing them with empty strings.

main.py
import re a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = re.sub(r'[^a-zA-Z0-9\s]+', '', a_string) print(new_string) # ๐Ÿ‘‰๏ธ 'bobby hadz com'

remove special characters except space from string

The code for this article is available on GitHub

The re.sub() method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

The first argument we passed to the re.sub() method is a regular expression.

The square [] brackets are used to indicate a set of characters.

main.py
import re a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = re.sub(r'[^a-zA-Z0-9\s]+', '', a_string) print(new_string) # ๐Ÿ‘‰๏ธ 'bobby hadz com'
The a-z and A-Z characters represent lowercase and uppercase ranges of letters.

The 0-9 characters match the digits in the range.

The \s character matches Unicode whitespace characters like [ \t\n\r\f\v].

The caret ^ at the beginning of the set means "NOT". In other words, match all non-letters, non-digits and non-spaces and replace them with empty strings (remove them).

You can add more characters between the square brackets if you want to keep them.

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

# Remove special characters except Space from text, file or multiline string

To remove the special character except for space from a text, file or multiline string:

  1. Use the str.splitlines() method to split the text into a list of lines.
  2. Use a for loop to iterate over the list.
  3. Use the re.sub() method to remove the special characters except for space from each line.
main.py
import re text = """b!o@b#b%y h^a*d&z dot c.o,m """ a_list = [] for line in text.splitlines(): updated_line = re.sub(r'[^a-zA-Z0-9\s]+', '', line) print(updated_line) a_list.append(updated_line) print(a_list) # ๐Ÿ‘‰๏ธ ['bobby hadz', 'dot com'] updated_text = '\n'.join(a_list) # bobby hadz # dot com print(updated_text)

remove special characters except space from text

The code for this article is available on GitHub

The str.splitlines() method splits the string on newline characters and returns a list containing the lines in the string.

main.py
multiline_str = """\ bobby hadz com""" lines = multiline_str.splitlines() print(lines) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

The last step is to use the str.join() method to join the list of updated lines with a newline (\n) character separator.

The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

Alternatively, you can use the str.isalnum() method.

# Remove special characters except Space from String using str.isalnum()

This is a three-step process:

  1. Use a generator expression to iterate over the string.
  2. Check if each character is an alphanumeric character or a space.
  3. Use the str.join() method to join the matching characters into a string.
main.py
a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = ''.join(char for char in a_string if char.isalnum() or char.isspace()) print(new_string) # ๐Ÿ‘‰๏ธ bobby hadz com

remove special characters except space from string using isalnum

The code for this article is available on GitHub

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalnum() method to check if the current characters is alphanumeric.

The str.isalnum() method returns True if all characters in the string are alphanumeric and the string contains at least one character, otherwise, the method returns False.

main.py
print('bobby123'.isalnum()) # ๐Ÿ‘‰๏ธ True # ๐Ÿ‘‡๏ธ contains space print('bobby hadz'.isalnum()) # ๐Ÿ‘‰๏ธ False

We used the boolean OR operator to keep the whitespace characters as well.

main.py
a_string = 'b!o@b#b$y% h^a&d*z( c.o,m' new_string = ''.join(char for char in a_string if char.isalnum() or char.isspace()) print(new_string) # ๐Ÿ‘‰๏ธ bobby hadz com

The str.isspace method returns True if there are only whitespace characters in the string and there is at least one character, otherwise False is returned.

main.py
print(' '.isspace()) # ๐Ÿ‘‰๏ธ True print(''.isspace()) # ๐Ÿ‘‰๏ธ False print('b'.isspace()) # ๐Ÿ‘‰๏ธ False

The last step is to use the str.join() method to join the remaining characters into a string.

# Split a string on all special characters in Python

Use the re.split() method to split a string on all special characters.

The re.split() method takes a pattern and a string and splits the string on each occurrence of the pattern.

main.py
import re my_str = "hello<one!two>three.four!five'six" my_list = re.split(r'[`!@#$%^&*()_+\-=\[\]{};\':"\\|,.<>\/?~]', my_str) # ๐Ÿ‘‡๏ธ ['hello', 'one', 'two', 'three', 'four', 'five', 'six'] print(my_list)
The code for this article is available on GitHub

We used the re.split() method to split a string on all occurrences of a special character.

The square brackets are used to indicate a set of characters.

Make sure that all characters you consider special characters are in the set.

You can add or remove characters according to your use case.

Alternatively, you can use a regular expression that matches any character that is not a letter, a digit or a space.

main.py
import re my_str = "hello<one!two>three.four!five'six" my_list = re.split(r'[^a-zA-Z0-9\s]', my_str) # ๐Ÿ‘‡๏ธ ['hello', 'one', 'two', 'three', 'four', 'five', 'six'] print(my_list)

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT lowercase letters a-z, uppercase letters A-Z, digits 0-9 or whitespace \s characters.

You can add any characters that you don't want to match between the square brackets of the regular expression.

You can tweak the regular expression according to your use case. This section of the docs has information regarding what each special character does.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev