Last updated: Apr 12, 2024
Reading time·4 min

To filter a Pandas DataFrame by value counts:
groupby() method to group the DataFrame by the specific column.filter() method to filter the DataFrame by value counts.import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df.groupby('sales').filter(lambda x: len(x) > 1) print(result)
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5

We used the
DataFrame.groupby()
method to group the DataFrame by the sales column.
result = df.groupby('sales').filter(lambda x: len(x) > 1)
The next step is to use the DataFrameGroupBy.filter() method to filter out the elements from the group that don't satisfy the condition.
Only values in the sales column that occur more than once get included in the
resulting DataFrame.
transform()You can also use the
DataFrame.transform
method to filter a DataFrame by value counts.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df[df.groupby('sales')['sales'].transform('size') > 1] print(result)
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5

The DataFrame.transform() method calls the supplied function and produces a
DataFrame with the same axis shape as the DataFrame the method was called
on.
result = df[df.groupby('sales')['sales'].transform('size') > 1]
We used the size function to get an integer that represents the number of
elements in the object and compared it to 1.
DataFrame.mapYou can also use the
DataFrame.map()
method to filter a DataFrame by value counts.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df[df['sales'].map(df['sales'].value_counts()) > 1] print(result)
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5

The DataFrame.map() method applies a function to a DataFrame element-wise.
result = df[df['sales'].map(df['sales'].value_counts()) > 1]
We used the
DataFrame.value_counts()
method to return a Series containing counts of unique rows in the DataFrame.
The same approach can be used if you need to add a column with the counts.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) df['sales_count'] = df['sales'] df['sales_count'] = df['sales_count'].map(dict(df['sales'].value_counts())) print(df) print('-' * 50) print(df[df.sales_count > 1])
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary sales_count 0 Alice 1 175.1 1 1 Bobby 3 180.2 1 2 Carl 5 190.3 1 3 Dan 7 205.4 2 4 Ethan 7 210.5 2 -------------------------------------------------- name sales salary sales_count 3 Dan 7 205.4 2 4 Ethan 7 210.5 2
The new sales_count column stores the value counts for the sales column.
I've also written articles on:
You can learn more about the related topics by checking out the following tutorials:
pd.read_json()