Split a string by whitespace in Python

avatar

Borislav Hadzhiev

Last updated: Jun 24, 2022

banner

Photo from Unsplash

Split a string by whitespace in Python #

Use the str.split() method without an argument to split a string by whitespace, e.g. my_list = my_str.split(). When the str.split() method is called without an argument, it considers consecutive whitespace characters as a single separator.

main.py
my_str = 'a b \nc d \r\ne' my_list = my_str.split() print(my_list) # 👉️ ['a', 'b', 'c', 'd', 'e']

We used the str.split() method to split a string by whitespace.

The str.split() method splits the string into a list of substrings using a delimiter.

The method takes the following 2 parameters:

NameDescription
separatorSplit the string into substrings on each occurrence of the separator (optional)
maxsplitAt most maxsplit splits are done (optional)
When the str.split() method is called without a separator, it considers consecutive whitespace characters as a single separator.

If the string starts or ends with a trailing whitespace, the list won't contain empty string elements.

main.py
my_str = ' a b \nc d \r\ne ' my_list = my_str.split() print(my_list) # 👉️ ['a', 'b', 'c', 'd', 'e']

This is different than passing a string containing a space for the separator to the split() method.

main.py
my_str = ' a b \nc d \r\ne ' my_list = my_str.split(' ') print(my_list) # 👉️ ['', '', 'a', '', 'b', '\nc', 'd', '', '\r\ne', '', '']

When we pass a separator to the split() method, a different algorithm is used.

The list in the example has both leading and trailing empty string items because the string starts and ends with a space.

This approach also doesn't split on all whitespace characters, e.g. \t, \n and \r\n, it only splits on spaces.

If we don't pass an argument to the split() method and split an empty string or one that only contains whitespace characters, we'd get an empty list.
main.py
my_str = ' \n \t \r\n ' my_list = my_str.split() print(my_list) # 👉️ []

This behavior is also different than passing a string containing a space as the separator to the split() method.

main.py
my_str = '' my_list = my_str.split(' ') print(my_list) # 👉️ ['']

Calling the split method with a separator on an empty string, returns a list containing an empty string element.

You can also use a regular expression to split a string by whitespace.

Use the re.split() method to split a string by whitespace, e.g. my_list = re.split(r'\s+', my_str). The re.split() method will split the string on each occurrence of a whitespace character and will return a list containing the results.

main.py
import re my_str = 'a b \nc d \r\ne' my_list = re.split(r'\s+', my_str) print(my_list) # 👉️ ['a', 'b', 'c', 'd', 'e']

The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.

The \s character matches unicode whitespace characters like [ \t\n\r\f\v].

The plus + is used to match the preceding character (whitespace) 1 or more times.

In its entirety, the regular expression matches one or more whitespace characters.

When using this approach, you would get empty string elements if your string starts with or ends with a whitespace.

main.py
import re my_str = ' a b \nc d \r\ne ' my_list = re.split(r'\s+', my_str) print(my_list) # 👉️ ['', 'a', 'b', 'c', 'd', 'e', '']

You can use the filter() function to remove any empty strings from the list.

main.py
import re my_str = ' a b \nc d \r\ne ' my_list = list(filter(None, re.split(r'\s+', my_str))) print(my_list) # 👉️ ['a', 'b', 'c', 'd', 'e']

The filter function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.

If you pass None for the function argument, all falsy elements of the iterable are removed.

All values that are not truthy are considered falsy. The falsy values in Python are:

  • constants defined to be falsy: None and False.
  • 0 (zero) of any numeric type
  • empty sequences and collections: "" (empty string), () (empty tuple), [] (empty list), {} (empty dictionary), set() (empty set), range(0) (empty range).

Note that the filter() function returns a filter object, so we have to use the list() class to convert the filter object to a list.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.