Split a String and remove the Whitespace in Python

avatar
Borislav Hadzhiev

Last updated: Apr 8, 2024
3 min

banner

# Table of Contents

  1. Split string and remove whitespace in Python
  2. Split string and remove whitespace using map()
  3. Split string and remove whitespace using re.split()

# Split string and remove whitespace in Python

To split a string and remove whitespace:

  1. Use the str.split() method to split the string into a list.
  2. Use a list comprehension to iterate over the list.
  3. On each iteration, use the str.strip() method to remove the leading and trailing whitespace.
main.py
my_str = 'bobby, hadz, com' my_list = [word.strip() for word in my_str.split(',')] print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

split string and remove whitespace

The code for this article is available on GitHub

We used the str.split() method to split the string on each occurrence of a comma.

The str.split() method splits the string into a list of substrings using a delimiter.

The method takes the following 2 parameters:

NameDescription
separatorSplit the string into substrings on each occurrence of the separator
maxsplitAt most maxsplit splits are done (optional)

The only argument we passed to split() is the separator we want to split on.

main.py
my_str = 'bobby, hadz, com' l = my_str.split(', ') print(l) # ๐Ÿ‘‰๏ธ ['one', ' two', ' three', ' four']

The next step is to use a list comprehension to iterate over the list of strings.

main.py
my_str = 'bobby, hadz, com' my_list = [word.strip() for word in my_str.split(',')] print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']
List comprehensions are used to perform some operation for every element, or select a subset of elements that meet a condition.

On each iteration, we call the str.strip() method to remove any leading or trailing whitespace from the string.

main.py
example = ' bobbyhadz.com ' print(repr(example.strip())) # ๐Ÿ‘‰๏ธ 'bobbyhadz.com'

The str.strip() method returns a copy of the string with the leading and trailing whitespace removed.

# Split string and remove whitespace using map()

This is a three-step process:

  1. Call the str.split() method on the string to get a list of strings.
  2. Pass the str.strip method and the list to the map() function.
  3. The map function will call the str.strip method on each string in the list.
main.py
my_str = 'bobby, hadz, com' my_list = list(map(str.strip, my_str.split(','))) print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

split string and remove whitespace using map

The code for this article is available on GitHub

The map() function takes a function and an iterable as arguments and calls the function with each item of the iterable.

We used the str.strip method as the function, so the map function is going to call the str.strip() method on each item in the list.

The map function returns a map object (not a list). If you need to convert the value to a list, pass it to the list() class.

# Split string and remove whitespace using re.split()

You can also use a regular expression to split a string and remove the whitespace.

main.py
import re my_str = 'bobby, hadz, com' pattern = re.compile(r'^\s+|\s*,\s*|\s+$') my_list = [word for word in pattern.split(my_str) if word] print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

split string and remove whitespace using re split

The code for this article is available on GitHub

The pattern in the example is formatted as '^\s+|\s*{your_split_char}\s*|\s+$'.

Here is an example that uses an underscore as the split character.

main.py
import re my_str = 'bobby_ hadz_ com' pattern = re.compile(r'^\s+|\s*_\s*|\s+$') my_list = [word for word in pattern.split(my_str) if word] print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

The re.compile() method compiles a regular expression pattern into an object.

We used the re.split() method to split the string based on the provided regular expression.

The caret ^ matches the start of the string and the dollar sign $ matches the end of the string.

main.py
import re my_str = 'bobby, hadz, com' pattern = re.compile(r'^\s+|\s*,\s*|\s+$') my_list = [word for word in pattern.split(my_str) if word] print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']
The code for this article is available on GitHub

The \s character matches Unicode whitespace characters like [ \t\n\r\f\v].

The plus + matches the preceding character (whitespace) 1 or more times.

The pipe | special character means OR, e.g. X|Y matches X or Y.

The asterisk * matches the preceding regular expression (whitespace) zero or more times.

In its entirety, the regular expression splits on leading or trailing whitespace characters or commas.

Make sure to replace the comma in the regex with the character you need to split on.

The pattern in the example is formatted as '^\s+|\s*{your_split_char}\s*|\s+$'.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev