ValueError: pattern contains no capture groups [Solved]

# Table of Contents

# ValueError: pattern contains no capture groups [Solved]

The Pandas "ValueError: pattern contains no capture groups" occurs when you use the DataFrame.str.extract() method with a regular expression that doesn't contain any capture groups.

To solve the error use parentheses to specify capture groups inside your regular expression.

Here is an example of how the error occurs.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

# ⛔️ ValueError: pattern contains no capture groups
new_df = df['name'].str.extract(r'[a-zA-Z]+\d')

print(new_df)

value error pattern contains no capture groups

The issue in the code sample is that we didn't use any capture groups with the str.extract() method.

# Use parentheses to specify capture groups

You have to use parentheses to specify capture group(s) when calling str.extract.

Each capture group in the regex pattern is a separate DataFrame column in the output.

Here in an example that uses 2 capture groups when calling str.extract().

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

new_df = df['name'].str.extract(r'([a-zA-Z]+)(\d)')

print(new_df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
       0  1
0  Alice  9
1  Bobby  8
2   Carl  7
3    Dan  6
4  Ethan  5

specify capture groups in your regular expression

Notice that we have 2 capture groups (2 sets of parentheses).

main.py

Copied!
new_df = df['name'].str.extract(r'([a-zA-Z]+)(\d)')

#        0  1
# 0  Alice  9
# 1  Bobby  8
# 2   Carl  7
# 3    Dan  6
# 4  Ethan  5
print(new_df)

The first column in the resulting DataFrame contains the values from the first capture group and the next column contains the values from the second capture group.

The square brackets [] are used to indicate a set of characters.

main.py

Copied!
new_df = df['name'].str.extract(r'([a-zA-Z]+)(\d)')

The a-z and A-Z characters represent the lowercase and uppercase letter ranges.

The + character matches the preceding character one or more times.

The \d special character matches any digit in the range of 0 to 9.

If you only need to get the name of each person, you would only use one capture group ().

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

new_df = df['name'].str.extract(r'([a-zA-Z]+)\d')

#        0
# 0  Alice
# 1  Bobby
# 2   Carl
# 3    Dan
# 4  Ethan
print(new_df)

only using one capture group

The code for this article is available on GitHub

Similarly, if you only need to get the digit after each name, you would wrap the \d special character in parentheses.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

new_df = df['name'].str.extract(r'[a-zA-Z]+(\d)')

#    0
# 0  9
# 1  8
# 2  7
# 3  6
# 4  5
print(new_df)

only wrapping d in parentheses

# Using a named capture group to name the `DataFrame` column

If you need to name the DataFrame columns in the output, use named capture groups.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

new_df = df['name'].str.extract(r'(?P<first_name>[a-zA-Z]+)\d')

#   first_name
# 0      Alice
# 1      Bobby
# 2       Carl
# 3        Dan
# 4      Ethan
print(new_df)

using named capture group

The code for this article is available on GitHub

The syntax for a named capture group is ?P<GROUP_NAME>.

We set the name of the capture group to first_name in the example.

We could also repeat the process to name the capture group of the digit after each name.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

new_df = df['name'].str.extract(
    r'(?P<first_name>[a-zA-Z]+)(?P<id>\d)'
)

#   first_name id
# 0      Alice  9
# 1      Bobby  8
# 2       Carl  7
# 3        Dan  6
# 4      Ethan  5
print(new_df)

using multiple named capture groups

We named the first capture group first_name and the second id.

# Getting the result of calling `str.extract()` as a `Series`

If you need to get the output of the str.extract() method as a Series, set the expand argument to False.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice9', 'Bobby8', 'Carl7', 'Dan6', 'Ethan5'],
    'salary': [175.1, 180.2, 190.3, 205.4, 210.5],
})

series = df['name'].str.extract(
    r'([a-zA-Z]+)\d',
    expand=False
)

print(series)

print('-' * 50)

print(type(series))

print('-' * 50)

print(series[0])

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
0    Alice
1    Bobby
2     Carl
3      Dan
4    Ethan
Name: name, dtype: object
--------------------------------------------------
<class 'pandas.core.series.Series'>
--------------------------------------------------
Alice

getting the result as a series

Notice that we set the expand argument to False when calling str.extract().

main.py

Copied!
series = df['name'].str.extract(
    r'([a-zA-Z]+)\d',
    expand=False
)

When the expand argument is set to True, the method returns a DataFrame with a separate column for each capture group.

When the argument is set to False:

the method returns a Series if there is one capture group.
the method returns a DataFrame if there are multiple capture groups.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

ValueError: pattern contains no capture groups [Solved]

# Table of Contents