ValueError: pattern contains no capture groups [Solved]

avatar
Borislav Hadzhiev

Last updated: Apr 13, 2024
4 min

banner

# Table of Contents

  1. ValueError: pattern contains no capture groups
  2. Use parentheses to specify capture groups
  3. Using a named capture group to name the DataFrame column
  4. Getting the result of calling str.extract() as a Series

# ValueError: pattern contains no capture groups [Solved]

The Pandas "ValueError: pattern contains no capture groups" occurs when you use the DataFrame.str.extract() method with a regular expression that doesn't contain any capture groups.

To solve the error use parentheses to specify capture groups inside your regular expression.

Here is an example of how the error occurs.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) # ⛔️ ValueError: pattern contains no capture groups new_df = df['name'].str.extract(r'[a-zA-Z]+\d') print(new_df)

value error pattern contains no capture groups

The issue in the code sample is that we didn't use any capture groups with the str.extract() method.

# Use parentheses to specify capture groups

You have to use parentheses to specify capture group(s) when calling str.extract.

Each capture group in the regex pattern is a separate DataFrame column in the output.

Here in an example that uses 2 capture groups when calling str.extract().

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) new_df = df['name'].str.extract(r'([a-zA-Z]+)(\d)') print(new_df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
0 1 0 Alice 9 1 Bobby 8 2 Carl 7 3 Dan 6 4 Ethan 5

specify capture groups in your regular expression

Notice that we have 2 capture groups (2 sets of parentheses).

main.py
new_df = df['name'].str.extract(r'([a-zA-Z]+)(\d)') # 0 1 # 0 Alice 9 # 1 Bobby 8 # 2 Carl 7 # 3 Dan 6 # 4 Ethan 5 print(new_df)

The first column in the resulting DataFrame contains the values from the first capture group and the next column contains the values from the second capture group.

The square brackets [] are used to indicate a set of characters.

main.py
new_df = df['name'].str.extract(r'([a-zA-Z]+)(\d)')

The a-z and A-Z characters represent the lowercase and uppercase letter ranges.

The + character matches the preceding character one or more times.

The \d special character matches any digit in the range of 0 to 9.

If you only need to get the name of each person, you would only use one capture group ().

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) new_df = df['name'].str.extract(r'([a-zA-Z]+)\d') # 0 # 0 Alice # 1 Bobby # 2 Carl # 3 Dan # 4 Ethan print(new_df)

only using one capture group

The code for this article is available on GitHub

Similarly, if you only need to get the digit after each name, you would wrap the \d special character in parentheses.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) new_df = df['name'].str.extract(r'[a-zA-Z]+(\d)') # 0 # 0 9 # 1 8 # 2 7 # 3 6 # 4 5 print(new_df)

only wrapping d in parentheses

# Using a named capture group to name the DataFrame column

If you need to name the DataFrame columns in the output, use named capture groups.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) new_df = df['name'].str.extract(r'(?P<first_name>[a-zA-Z]+)\d') # first_name # 0 Alice # 1 Bobby # 2 Carl # 3 Dan # 4 Ethan print(new_df)

using named capture group

The code for this article is available on GitHub

The syntax for a named capture group is ?P<GROUP_NAME>.

We set the name of the capture group to first_name in the example.

We could also repeat the process to name the capture group of the digit after each name.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) new_df = df['name'].str.extract( r'(?P<first_name>[a-zA-Z]+)(?P<id>\d)' ) # first_name id # 0 Alice 9 # 1 Bobby 8 # 2 Carl 7 # 3 Dan 6 # 4 Ethan 5 print(new_df)

using multiple named capture groups

We named the first capture group first_name and the second id.

# Getting the result of calling str.extract() as a Series

If you need to get the output of the str.extract() method as a Series, set the expand argument to False.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) series = df['name'].str.extract( r'([a-zA-Z]+)\d', expand=False ) print(series) print('-' * 50) print(type(series)) print('-' * 50) print(series[0])
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
0 Alice 1 Bobby 2 Carl 3 Dan 4 Ethan Name: name, dtype: object -------------------------------------------------- <class 'pandas.core.series.Series'> -------------------------------------------------- Alice

getting the result as a series

Notice that we set the expand argument to False when calling str.extract().

main.py
series = df['name'].str.extract( r'([a-zA-Z]+)\d', expand=False )

When the expand argument is set to True, the method returns a DataFrame with a separate column for each capture group.

When the argument is set to False:

  • the method returns a Series if there is one capture group.
  • the method returns a DataFrame if there are multiple capture groups.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.