Split a string on punctuation marks in Python

avatar

Borislav Hadzhiev

Thu Jun 23 20222 min read

banner

Photo by Sam Coin

Split a string on punctuation marks in Python #

Use the re.split() method to split a string on punctuation marks, e.g. my_list = re.split('[,.!?]', my_str). The re.split method splits a string on all occurrences of the specified pattern.

main.py
import re my_str = """One, Two Three. Four! Five? I'm!""" my_list = re.split('[,.!?]', my_str) # 👇️ ['One', ' Two Three', ' Four', ' Five', " I'm", ''] print(my_list)

The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.

Notice that some of the items in the list contain spaces. If you need to remove the spaces, add a space between the square brackets of the regular expression.

main.py
import re my_str = """One, Two Three. Four! Five? I'm!""" my_list = re.split('[ ,.!?]', my_str) # 👇️ ['One', '', 'Two', 'Three', '', 'Four', '', 'Five', '', "I'm", ''] print(my_list)

Now our regex matches spaces as well. If you need to remove the empty strings from the list, use the filter() function.

main.py
import re my_str = """One, Two Three. Four! Five? I'm!""" my_list = list(filter(None, re.split('[ ,.!?]', my_str))) # 👇️ ['One', 'Two', 'Three', 'Four', 'Five', "I'm"] print(my_list)

The filter function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.

If you pass None for the function argument, all falsy elements of the iterable are removed.

The square brackets [] are used to indicate a set of characters.

The set of characters in the example includes a comma ,, a dot ., an exclamation mark ! and a question mark ?.

You can add any other punctuation marks between the square brackets, e.g. a colon :, a semicolon ;, brackets or parenthesis.

main.py
import re my_str = """One, Two: Three;. Four! Five? I'm!""" my_list = list(filter(None, re.split('[ :;,.!?]', my_str))) # 👇️ ['One', 'Two', 'Three', 'Four', 'Five', "I'm"] print(my_list)

Note that the filter() function returns a filter object (not a list). If you need to convert the filter object to a list, pass it to the list() class.

Use the search field on my Home Page to filter through my more than 1,000 articles.