Extract strings between quotes in Python

avatar

Borislav Hadzhiev

Last updated: Jun 26, 2022

banner

Photo from Unsplash

Extract strings between quotes in Python #

Use the re.findall() method to extract strings between quotes, e.g. my_list = re.findall(r'"([^"]*)"', my_str). The re.findall method will match the provided pattern in the string and will return a list containing the strings between the quotes.

main.py
import re # ✅ extract string between double quotes my_str = 'One "Two" Three "Four"' my_list = re.findall(r'"([^"]*)"', my_str) print(my_list) # 👉️ ['Two', 'Four'] print(my_list[0]) # 👉️ 'Two' print(my_list[1]) # 👉️ 'Four' # --------------------------------------------------- # ✅ extract string between single quotes my_str_2 = "One 'Two' Three 'Four'" my_list_2 = re.findall(r"'([^']*)'", my_str_2) print(my_list_2) # 👉️ ['Two', 'Four']

The first example in the code snippet extracts strings between double quotes, and the second extracts strings between single quotes.

The re.findall method takes a pattern and a string as arguments and returns a list of strings containing all non-overlapping matches of the pattern in the string.

Let's look at the regular expression in the first example.

main.py
import re # ✅ extract string between double quotes my_str = 'One "Two" Three "Four"' my_list = re.findall(r'"([^"]*)"', my_str) print(my_list) # 👉️ ['Two', 'Four'] print(my_list[0]) # 👉️ 'Two' print(my_list[1]) # 👉️ 'Four'
The regex starts and ends with double quotes because we want to match anything that is inside of double quotes in the string.

The parentheses () in the regular expression match whatever is inside and indicate the start and end of a group.

The group's contents can still be retrieved after the match.

The square brackets [] are used to indicate a set of characters.

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT a double quote.

The asterisk * matches the preceding regular expression (anything but double quotes) zero or more times.

In its entirety, the regular expression matches zero or more characters that are not double quotes and are inside of double quotes.
main.py
import re my_str_2 = "One 'Two' Three 'Four'" my_list_2 = re.findall(r"'([^']*)'", my_str_2) print(my_list_2) # 👉️ ['Two', 'Four'] print(my_list_2[0]) # 👉️ Two print(my_list_2[1]) # 👉️ Four

You can also use this approach to extract strings from between single quotes.

main.py
import re my_str_2 = "One 'Two' Three 'Four'" my_list_2 = re.findall(r"'([^']*)'", my_str_2) print(my_list_2) # 👉️ ['Two', 'Four'] print(my_list_2[0]) # 👉️ Two print(my_list_2[1]) # 👉️ Four

All we had to do is wrap the group in single quotes instead of double quotes and place a single quote in the set of characters.

In its entirety, the regex matches zero or more characters that are not single quotes and are inside of single quotes.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.