Borislav Hadzhiev
Thu Jun 23 2022·2 min read
Photo by Sam Coin
Use the re.split()
method to split a string on punctuation marks, e.g.
my_list = re.split('[,.!?]', my_str)
. The re.split
method splits a string on
all occurrences of the specified pattern.
import re my_str = """One, Two Three. Four! Five? I'm!""" my_list = re.split('[,.!?]', my_str) # 👇️ ['One', ' Two Three', ' Four', ' Five', " I'm", ''] print(my_list)
The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.
Notice that some of the items in the list contain spaces. If you need to remove the spaces, add a space between the square brackets of the regular expression.
import re my_str = """One, Two Three. Four! Five? I'm!""" my_list = re.split('[ ,.!?]', my_str) # 👇️ ['One', '', 'Two', 'Three', '', 'Four', '', 'Five', '', "I'm", ''] print(my_list)
Now our regex matches spaces as well. If you need to remove the empty strings
from the list, use the filter()
function.
import re my_str = """One, Two Three. Four! Five? I'm!""" my_list = list(filter(None, re.split('[ ,.!?]', my_str))) # 👇️ ['One', 'Two', 'Three', 'Four', 'Five', "I'm"] print(my_list)
The filter function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.
None
for the function argument, all falsy elements of the iterable are removed.The square brackets []
are used to indicate a set of characters.
The set of characters in the example includes a comma ,
, a dot .
, an
exclamation mark !
and a question mark ?
.
You can add any other punctuation marks between the square brackets, e.g. a
colon :
, a semicolon ;
, brackets or parenthesis.
import re my_str = """One, Two: Three;. Four! Five? I'm!""" my_list = list(filter(None, re.split('[ :;,.!?]', my_str))) # 👇️ ['One', 'Two', 'Three', 'Four', 'Five', "I'm"] print(my_list)
Note that the filter()
function returns a filter object (not a list). If you
need to convert the filter
object to a list, pass it to the list()
class.