Borislav Hadzhiev
Last updated: Jun 25, 2022
Check out my new book
Use the str.split()
method without an argument to split a string by unknown
number of spaces, e.g. my_list = my_str.split()
. When the str.split()
method
is called without an argument, it considers consecutive whitespace characters as
a single separator.
my_str = 'one two three four' my_list = my_str.split() print(my_list) # 👉️ ['one', 'two', 'three', 'four']
We used the str.split()
method to split a string by unknown number of spaces.
The str.split() method splits the original string into a list of substrings using a delimiter.
The method takes the following 2 parameters:
Name | Description |
---|---|
separator | Split the string into substrings on each occurrence of the separator (optional) |
maxsplit | At most maxsplit splits are done (optional) |
str.split()
method is called without a separator, it considers consecutive whitespace characters as a single separator.If the string starts or ends with a trailing whitespace, the list won't contain empty string elements.
my_str = ' one two three four ' my_list = my_str.split() print(my_list) # 👉️ ['one', 'two', 'three', 'four']
split()
method and split an empty string or one that only contains whitespace characters, we'd get an empty list.my_str = ' ' my_list = my_str.split() print(my_list) # 👉️ []
You can also use a regular expression to split a string by unknown number of spaces.
Use the re.split()
method to split a string by unknown number of spaces,
e.g. my_list = re.split(r'\s+', my_str)
. The re.split()
method will split
the string on each occurrence of one or more spaces and will return a list
containing the results.
import re my_str = 'one two three four' my_list = re.split(r'\s+', my_str) print(my_list) # 👉️ ['one', 'two', 'three', 'four']
The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.
The \s
character matches unicode whitespace characters like [ \t\n\r\f\v]
.
The plus +
is used to match the preceding character (whitespace) 1 or more
times.
In its entirety, the regular expression matches one or more whitespace characters.
When using this approach, you would get empty string elements if your string starts with or ends with a whitespace.
import re my_str = ' one two three four ' my_list = re.split(r'\s+', my_str) print(my_list) # 👉️ ['', 'one', 'two', 'three', 'four', '']
You can use the filter()
function to remove any empty strings from the list.
import re my_str = ' one two three four ' my_list = list(filter(None, re.split(r'\s+', my_str))) print(my_list) # 👉️ ['one', 'two', 'three', 'four']
The filter function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.
None
for the function argument, all falsy elements of the iterable are removed.All values that are not truthy are considered falsy. The falsy values in Python are:
None
and False
.0
(zero) of any numeric type""
(empty string), ()
(empty tuple), []
(empty list), {}
(empty dictionary), set()
(empty set), range(0)
(empty
range).Note that the filter()
function returns a filter
object, so we have to use
the list()
class to convert the filter
object to a list.