Filter rows in a Pandas DataFrame using Regex

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Filter rows in a Pandas DataFrame using Regex
  2. Getting the rows with the values that DON'T match the regex
  3. Filtering rows with a built-in method
  4. Selecting the rows that contain a certain substring using Regex
  5. Selecting the rows that end with a certain substring using Regex

# Filter rows in a Pandas DataFrame using Regex

To filter rows in a Pandas DataFrame using a regex:

  1. Use the str.contains() method to test if a regex matches each value in a specific column.
  2. Use bracket notation to filter the rows by the Series of boolean values.
main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bobby', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'^Al' starting_with = df[df['name'].str.contains(regex)] # name sales salary # 0 Alice 1 175.1 # 1 Alex 3 180.2 print(starting_with)

filter rows in pandas dataframe using regex

The code for this article is available on GitHub

The code sample selects the rows in the name column that start with "Al".

The str.contains() method tests if a regex matches each value in the given column.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bobby', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'^Al' # 0 True # 1 True # 2 False # 3 False # 4 False # Name: name, dtype: bool print(df['name'].str.contains(regex))
The code for this article is available on GitHub

The last step is to use bracket notation to select the matching rows.

main.py
regex = r'^Al' starting_with = df[df['name'].str.contains(regex)]

By default, the str.contains method is case-sensitive.

If you want to make it case-insensitive, set the case argument to False.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bobby', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'^al' starting_with = df[df['name'].str.contains(regex, case=False)] # name sales salary # 0 Alice 1 175.1 # 1 Alex 3 180.2 print(starting_with)

filter dataframe rows by regex case insensitive

The code for this article is available on GitHub

When the case argument is set to False, the str.contains method becomes case-insensitive.

You can also set the flags argument when calling the method.

main.py
import re import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bobby', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'^al' starting_with = df[df['name'].str.contains( regex, case=False, flags=re.IGNORECASE)] # name sales salary # 0 Alice 1 175.1 # 1 Alex 3 180.2 print(starting_with)

The flags argument represents flags that are passed to the re module.

# Getting the rows with the values that DON'T match the regex

If you need to get the rows with the values that don't match the regex, use the tilde ~ operator.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bobby', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'^Al' starting_with = df[~df['name'].str.contains(regex)] # name sales salary # 2 Bobby 5 190.3 # 3 Tony 7 205.4 # 4 Ethan 7 210.5 print(starting_with)

getting rows with values not matching regex in dataframe

The code for this article is available on GitHub

The code sample uses the tilde ~ operator to invert the boolean values in the Series before filtering the DataFrame based on the regex.

# Filtering rows with a built-in method

In some cases, you might be able to use a built-in method to filter the rows (instead of a regex).

For example, if you need to select the rows that start with a specific substring, you can use the str.startswith() method.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bobby', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) starting_with = df[df['name'].str.startswith('Al')] # name sales salary # 0 Alice 1 175.1 # 1 Alex 3 180.2 print(starting_with)

using built in method instead of regex when filtering

The code for this article is available on GitHub

There are many built-in methods on the str attribute that you might be able to use instead of a regex.

# Selecting the rows that contain a certain substring using Regex

You can also use the str.contains() method if you need to select the rows that contain a certain substring using a regex.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bony', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'.*on.*' containing = df[df['name'].str.contains(regex)] # name sales salary # 2 Bony 5 190.3 # 3 Tony 7 205.4 print(containing)

select rows containing substring using regex

The code for this article is available on GitHub

The code sample selects the rows with values that contain the substring "on".

If you need to perform the test in a case-insensitive manner, set the case or flags argument, depending on whether you use a regular expression.

main.py
import re import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bony', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'.*ON.*' containing = df[df['name'].str.contains( regex, case=False, flags=re.IGNORECASE)] # name sales salary # 2 Bony 5 190.3 # 3 Tony 7 205.4 print(containing)

You could also select the rows that contain a given substring without a regular expression.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Alex', 'Bony', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) substring = 'on' containing = df[df['name'].str.contains(substring, case=False, regex=False)] # name sales salary # 2 Bony 5 190.3 # 3 Tony 7 205.4 print(containing)

select rows containing substring without regex

The code for this article is available on GitHub

Notice that we set the regex argument to False.

If the regex argument is set to False, the supplied pattern (the first argument) is treated as a literal string.

By default, the pattern argument is treated as a regular expression.

# Selecting the rows that end with a certain substring using Regex

The str.contains() method can also be used if you need to select the rows that end with a certain substring using a regex.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Yoni', 'Bony', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'y$' ending_with = df[df['name'].str.contains(regex)] # name sales salary # 2 Bony 5 190.3 # 3 Tony 7 205.4 print(ending_with)

select rows ending with substring using regex

The code for this article is available on GitHub

The code sample selects the rows whose name values end with y.

If you need to perform the test in a case-insensitive manner, set the case or flags arguments when calling str.contains().

main.py
import re import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Yoni', 'Bony', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) regex = r'Y$' ending_with = df[df['name'].str.contains( regex, case=False, flags=re.IGNORECASE)] # name sales salary # 2 Bony 5 190.3 # 3 Tony 7 205.4 print(ending_with)

filter rows ending with substring using regex ignoring case

You can also select the rows that end with a certain substring using the built-in str.endswith() method instead of using a regex.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Yoni', 'Bony', 'Tony', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) ending_with = df[df['name'].str.endswith('y')] # name sales salary # 2 Bony 5 190.3 # 3 Tony 7 205.4 print(ending_with)

select rows ending with substring using endswith method

The code for this article is available on GitHub

I've also written articles on:

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.