Split string into list of words with multiple delimiters in Python

avatar

Borislav Hadzhiev

Last updated: Aug 31, 2022

banner

Photo from Unsplash

Split string into list of words with multiple delimiters in Python #

Use the re.findall() method to split a string into a list of words with multiple delimiters, e.g. my_list = re.findall(r'[\w]+', my_str). The re.findall() method will split the string on each occurrence of a word and will return a list containing the words.

main.py
import re # ✅ split string into list of words with multiple delimiters (re.findall()) my_str = 'apple banana, kiwi # melon. mango' my_list = re.findall(r'[\w]+', my_str) print(my_list) # 👉️ ['apple', 'banana', 'kiwi', 'melon', 'mango'] # --------------------------------------- # ✅ split string into list of words with multiple delimiters (str.replace()) my_list = my_str.replace(',', '').replace('#', '').replace('.', '').split() print(my_list) # 👉️ ['apple', 'banana', 'kiwi', 'melon', 'mango']

The first example uses the re.findall() method to split a string into a list of words with multiple delimiters.

The re.findall method takes a pattern and a string as arguments and returns a list of strings containing all non-overlapping matches of the pattern in the string.

The first argument we passed to the re.findall() method is a regular expression.

main.py
import re my_str = 'apple banana, kiwi # melon. mango' my_list = re.findall(r'[\w]+', my_str) print(my_list) # 👉️ ['apple', 'banana', 'kiwi', 'melon', 'mango']

The square [] brackets are used to indicate a set of characters.

The \w character matches Unicode word characters and includes most characters that can be part of a word in any language.

The plus + causes the regular expression to match 1 or more repetitions of the preceding character (the Unicode characters).

The re.findall() method returns a list containing the words in the string.

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

If you need a more flexible approach, use the str.replace() method to remove all delimiters but one before using the str.split() method.

Split string into list of words with multiple delimiters using str.replace() #

To split a string into a list of words with multiple delimiters:

  1. Use the str.replace() method to remove all of the delimiters but whitespace.
  2. Use the str.split() method to split the string on whitespace characters.
  3. The str.split() method will return a list containing the words.
main.py
my_str = 'apple banana, kiwi # melon. mango' my_list = my_str.replace(',', '').replace('#', '').replace('.', '').split() print(my_list) # 👉️ ['apple', 'banana', 'kiwi', 'melon', 'mango']

We used the str.replace() method to remove the punctuation before splitting the string on whitespace characters.

The str.replace method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.

The method takes the following parameters:

NameDescription
oldThe substring we want to replace in the string
newThe replacement for each occurrence of old
countOnly the first count occurrences are replaced (optional)

The str.replace() method doesn't change the original string. Strings are immutable in Python.

We used an empty string for the replacement because we want to remove the specified characters.

You can chain as many calls to the str.replace() method as necessary.

The last step is to use the str.split() method to split the string into a list of words.

The str.split() method splits the string into a list of substrings using a delimiter.

The method takes the following 2 parameters:

NameDescription
separatorSplit the string into substrings on each occurrence of the separator
maxsplitAt most maxsplit splits are done (optional)
When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.
main.py
my_str = 'apple banana kiwi' print(my_str.split()) # 👉️ ['apple', 'banana', 'kiwi']

If the separator is not found in the string, a list containing only 1 element is returned.

Alternatively, you can use the re.split() method.

Split string into list of words with multiple delimiters using re.split() #

Use the re.split() method to split a string into a list of words with multiple delimiters, e.g. re.split(r'\W+', my_str). The re.split() method will split the string into a list of words based on the provided delimiters.

main.py
import re my_str = 'apple banana, kiwi # melon. mango!' my_list = [item for item in re.split(r'\W+', my_str) if item] print(my_list) # 👉️ ['apple', 'banana', 'kiwi', 'melon', 'mango']

The re.split method splits a string on all occurrences of the provided pattern.

The first argument we passed to the method is a regular expression.

The \W (capital W) special character matches any character that is not a word character.

The plus + causes the regular expression to match 1 or more repetitions of the preceding character (any non-word characters).

We end up splitting the string on all occurrences of non-word characters.

We used a list comprehension to remove any empty strings from the result.

You might get empty string values in the list if the string starts with or ends with punctuation.

main.py
import re my_str = '.apple banana, kiwi # melon. mango!' my_list = re.split(r'\W+', my_str) print(my_list) # 👉️ ['', 'apple', 'banana', 'kiwi', 'melon', 'mango', '']

The list comprehension takes care of removing the empty strings from the list.

List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.