Last updated: Apr 9, 2024
Reading timeยท3 min
Use the re.findall()
method to extract strings between quotes.
The re.findall
method will match the provided pattern in the string and will
return a list containing the strings between the quotes.
import re my_str = 'Bobby "Hadz" Com "ABC"' my_list = re.findall(r'"([^"]*)"', my_str) print(my_list) # ๐๏ธ ['Hadz', 'ABC'] print(my_list[0]) # ๐๏ธ Hadz print(my_list[1]) # ๐๏ธ ABC
The example extracts a string between double quotes.
If you need to extract a string between single quotes, use the following code sample instead.
import re my_str = "Bobby 'Hadz' Com 'ABC'" my_list = re.findall(r"'([^']*)'", my_str) print(my_list) # ๐๏ธ ['Hadz', 'ABC']
The re.findall method takes a pattern and a string as arguments and returns a list of strings containing all non-overlapping matches of the pattern in the string.
Let's look at the regular expression in the first example.
import re my_str = 'Bobby "Hadz" Com "ABC"' my_list = re.findall(r'"([^"]*)"', my_str) print(my_list) # ๐๏ธ ['Hadz', 'ABC'] print(my_list[0]) # ๐๏ธ Hadz print(my_list[1]) # ๐๏ธ ABC
The parentheses ()
in the regular expression match whatever is inside and
indicate the start and end of a group.
The group's contents can still be retrieved after the match.
The square brackets []
are used to indicate a set of characters.
The caret ^
at the beginning of the set means "NOT". In other words, match all
characters that are NOT a double quote.
The asterisk *
matches the preceding regular expression (anything but double
quotes) zero or more times.
In its entirety, the regular expression matches zero or more characters that are not double quotes and are inside of double quotes.
You can also use this approach to extract strings from between single quotes.
import re my_str = "Bobby 'Hadz' Com 'ABC'" my_list = re.findall(r"'([^']*)'", my_str) print(my_list) # ๐๏ธ ['Hadz', 'ABC']
All we had to do is wrap the group in single quotes instead of double quotes and place a single quote in the set of characters.
In its entirety, the regex matches zero or more characters that are not single quotes and are inside of single quotes.
You can also use the str.split()
method to extract strings between quotes.
my_str = 'Bobby "Hadz" Com "ABC"' my_list = my_str.split('"')[1::2] print(my_list) # ๐๏ธ ['Hadz', 'ABC']
The str.split() method splits the string into a list of substrings using a delimiter.
The method takes the following 2 parameters:
Name | Description |
---|---|
separator | Split the string into substrings on each occurrence of the separator |
maxsplit | At most maxsplit splits are done (optional) |
my_str = 'Bobby "Hadz" Com "ABC"' # ๐๏ธ ['Bobby ', 'Hadz', ' Com ', 'ABC', ''] print(my_str.split('"'))
We split the string on each occurrence of a double quote and used list slicing.
The syntax for list slicing is a_list[start:stop:step]
.
start
index is inclusive and the stop
index is exclusive (up to, but not including).If the start
index is omitted, it is considered to be 0
, if the stop
index
is omitted, the slice goes to the end of the list.
Python indexes are zero-based, so the first item in a list has an index of 0
,
and the last item has an index of -1
or len(a_list) - 1
.
The slice list[1::2]
starts at the second list item and selects every 2nd list
item.
my_str = 'Bobby "Hadz" Com "ABC"' my_list = my_str.split('"')[1::2] print(my_list) # ๐๏ธ ['Hadz', 'ABC']
We start at the second list item to exclude the element before the first double quote.