Pandas: How to Filter a DataFrame by value counts

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: How to Filter a DataFrame by value counts
  2. Pandas: How to Filter a DataFrame by value counts using transform()
  3. Pandas: How to Filter a DataFrame by value counts using DataFrame.map

# Pandas: How to Filter a DataFrame by value counts

To filter a Pandas DataFrame by value counts:

  1. Use the groupby() method to group the DataFrame by the specific column.
  2. Use the filter() method to filter the DataFrame by value counts.
main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df.groupby('sales').filter(lambda x: len(x) > 1) print(result)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5

filter dataframe by value counts in pandas

We used the DataFrame.groupby() method to group the DataFrame by the sales column.

main.py
result = df.groupby('sales').filter(lambda x: len(x) > 1)

The next step is to use the DataFrameGroupBy.filter() method to filter out the elements from the group that don't satisfy the condition.

Only values in the sales column that occur more than once get included in the resulting DataFrame.

# Pandas: How to Filter a DataFrame by value counts using transform()

You can also use the DataFrame.transform method to filter a DataFrame by value counts.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df[df.groupby('sales')['sales'].transform('size') > 1] print(result)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5

filter dataframe by value counts using transform

The DataFrame.transform() method calls the supplied function and produces a DataFrame with the same axis shape as the DataFrame the method was called on.

main.py
result = df[df.groupby('sales')['sales'].transform('size') > 1]

We used the size function to get an integer that represents the number of elements in the object and compared it to 1.

# Pandas: How to Filter a DataFrame by value counts using DataFrame.map

You can also use the DataFrame.map() method to filter a DataFrame by value counts.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df[df['sales'].map(df['sales'].value_counts()) > 1] print(result)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5

filter dataframe by value counts using map

The DataFrame.map() method applies a function to a DataFrame element-wise.

main.py
result = df[df['sales'].map(df['sales'].value_counts()) > 1]

We used the DataFrame.value_counts() method to return a Series containing counts of unique rows in the DataFrame.

The same approach can be used if you need to add a column with the counts.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) df['sales_count'] = df['sales'] df['sales_count'] = df['sales_count'].map(dict(df['sales'].value_counts())) print(df) print('-' * 50) print(df[df.sales_count > 1])
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary sales_count 0 Alice 1 175.1 1 1 Bobby 3 180.2 1 2 Carl 5 190.3 1 3 Dan 7 205.4 2 4 Ethan 7 210.5 2 -------------------------------------------------- name sales salary sales_count 3 Dan 7 205.4 2 4 Ethan 7 210.5 2

The new sales_count column stores the value counts for the sales column.

I've also written articles on:

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.