Using a Regex to remove whitespace from String in Python

avatar

Borislav Hadzhiev

Last updated: Aug 20, 2022

banner

Photo from Unsplash

Using a Regex to remove whitespace from String in Python #

Use the re.sub() method to remove the whitespace from a string using a regex, e.g. result = re.sub(r'\s+', '', my_str). The re.sub() method will remove the whitespace from the string by replacing whitespace characters with empty strings.

main.py
import re my_str = 'a b c d e f' # ✅ Remove whitespace from string using regex result = re.sub(r'\s+', '', my_str) print(repr(result)) # 👉️ 'abcdef' # ------------------------------------------------- # ✅ Remove leading whitespace from multiline string using regex my_str = """\ a b c d""" result = re.sub(r'^\s+', '', my_str, flags=re.MULTILINE) print(repr(result)) # 👉️ 'a b \nc d' # ------------------------------------------------- # ✅ Replace multiple, consecutive spaces with single space using regex my_str = 'a b c d e f' result = re.sub(r'\s+', ' ', my_str) print(repr(result)) # 👉️ 'a b c d e f'

The first example uses the re.sub() method to remove the whitespace from a string.

The re.sub method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

main.py
import re my_str = 'a b c d e f' result = re.sub(r'\s+', '', my_str) print(repr(result)) # 👉️ 'abcdef'

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub method is a regular expression.

The \s character matches unicode whitespace characters like [ \t\n\r\f\v].

The plus + is used to match the preceding character (whitespace) 1 or more times.

We remove the whitespace from the string by replacing occurrences of whitespace characters with empty strings.

If you need to remove the leading whitespace from a multiline string using a regex, set the flags keyword argument to re.MULTILINE.

main.py
import re my_str = """\ a b c d""" result = re.sub(r'^\s+', '', my_str, flags=re.MULTILINE) print(repr(result)) # 👉️ 'a b \nc d'

The caret ^ matches the start of the string.

The re.MULTILINE flag changes the behavior of the ^ character to match the beginning of each line.

By default, the ^ character matches only at the beginning of the string.

If you need to replace multiple, consecutive spaces with a single space, use a string containing a space as the replacement.

main.py
import re my_str = 'a b c d e f' result = re.sub(r'\s+', ' ', my_str) print(repr(result)) # 👉️ 'a b c d e f'

The \s character matches unicode whitespace characters like [ \t\n\r\f\v].

The plus + is used to match the preceding character (whitespace) 1 or more times.

This helps us replace multiple whitespace characters with a single space.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.