Split a string into a list of words in Python

avatar

Borislav Hadzhiev

Last updated: Aug 31, 2022

banner

Photo from Unsplash

Split a string into a list of words in Python #

Use the str.split() method to split a string into a list of words, e.g. my_list = my_str.split(). The str.split() method will split the string on one or more whitespace characters and will return a list containing the words.

main.py
import re my_str = 'one two three four' # ✅ split string into list of words (str.split()) my_list = my_str.split() print(my_list) # 👉️ ['one', 'two', 'three', 'four'] # --------------------------------------------------- my_str = 'one,two,three,four' my_list = my_str.split(',') print(my_list) # 👉️ ['one', 'two', 'three', 'four'] # --------------------------------------------------- # ✅ split string with multiple delimiters into list of words (re.findall()) my_str = 'one two, three four. five' my_list = re.findall(r'[\w]+', my_str) print(my_list) # 👉️ ['one', 'two', 'three', 'four', 'five']

The first example uses the str.split() method to split a string into a list of words.

main.py
my_str = 'one two three four' my_list = my_str.split() print(my_list) # 👉️ ['one', 'two', 'three', 'four']

The str.split() method splits the string into a list of substrings using a delimiter.

The method takes the following 2 parameters:

NameDescription
separatorSplit the string into substrings on each occurrence of the separator
maxsplitAt most maxsplit splits are done (optional)
main.py
my_str = 'one,two,three,four' my_list = my_str.split(',') print(my_list) # 👉️ ['one', 'two', 'three', 'four']
When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.

If the separator is not found in the string, a list containing only 1 element is returned.

If you need to split a string based on multiple delimiters into a list of words, use the re.findall() method.

Split a string into a list of words using re.findall() #

Use the re.findall() method to split a string into a list of words, e.g. my_list = re.findall(r'[\w]+', my_str). The re.findall() method will split the string on each occurrence of a word and will return a list containing the words.

main.py
import re my_str = 'one two, three four. five' my_list = re.findall(r'[\w]+', my_str) print(my_list) # 👉️ ['one', 'two', 'three', 'four', 'five']

The re.findall method takes a pattern and a string as arguments and returns a list of strings containing all non-overlapping matches of the pattern in the string.

The first argument we passed to the re.findall() method is a regular expression.

The square [] brackets are used to indicate a set of characters.

The \w character matches Unicode word characters and includes most characters that can be part of a word in any language.

The plus + causes the regular expression to match 1 or more repetitions of the preceding character (the Unicode characters).

The re.findall() method returns a list containing the words in the string.

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

If you need a more flexible approach, you can use the str.replace() method to remove specific characters from the string before splitting.

Split a string into a list of words using str.replace() #

To split a string into a list of words:

  1. Use the str.replace() method to remove any punctuation from the string.
  2. Use the str.split() method to split the string on one or more whitespace characters.
  3. The str.split() method will return a list containing the words.
main.py
my_str = 'one two, three four. five' my_list = my_str.replace(',', '').replace('.', '').split() print(my_list) # 👉️ ['one', 'two', 'three', 'four', 'five']

We used the str.replace() method to remove the punctuation before splitting the string on whitespace characters.

The str.replace method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.

The method takes the following parameters:

NameDescription
oldThe substring we want to replace in the string
newThe replacement for each occurrence of old
countOnly the first count occurrences are replaced (optional)

The str.replace() method doesn't change the original string. Strings are immutable in Python.

We used an empty string for the replacement because we want to remove the specified characters.

You can chain as many calls to the str.replace() method as necessary.

The last step is to use the str.split() method to split the string into a list of words.

If you need to remove all punctuation when splitting the string into words, use the str.strip() method on each word.

main.py
import string # 👇️ !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ print(string.punctuation) my_str = 'one two, three four. five' my_list = [word.strip(string.punctuation) for word in my_str.split()] print(my_list) # 👉️ ['one', 'two', 'three', 'four', 'five']

We used the str.strip() method to strip the leading and trailing punctuation characters from each word.

The string.punctuation attribute returns a string that contains commonly used punctuation characters.

We used a list comprehension to iterate over the list of words and called the str.strip() method on each word.

List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.

The str.strip method returns a copy of the string with the specified leading and trailing characters removed.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.