Last updated: Apr 12, 2024
Reading time·4 min
To filter a Pandas DataFrame by value counts:
groupby()
method to group the DataFrame
by the specific column.filter()
method to filter the DataFrame
by value counts.import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df.groupby('sales').filter(lambda x: len(x) > 1) print(result)
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5
We used the
DataFrame.groupby()
method to group the DataFrame
by the sales
column.
result = df.groupby('sales').filter(lambda x: len(x) > 1)
The next step is to use the DataFrameGroupBy.filter() method to filter out the elements from the group that don't satisfy the condition.
Only values in the sales
column that occur more than once get included in the
resulting DataFrame
.
transform()
You can also use the
DataFrame.transform
method to filter a DataFrame
by value counts.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df[df.groupby('sales')['sales'].transform('size') > 1] print(result)
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5
The DataFrame.transform()
method calls the supplied function and produces a
DataFrame
with the same axis shape as the DataFrame
the method was called
on.
result = df[df.groupby('sales')['sales'].transform('size') > 1]
We used the size
function to get an integer that represents the number of
elements in the object
and compared it to 1
.
DataFrame.map
You can also use the
DataFrame.map()
method to filter a DataFrame
by value counts.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) result = df[df['sales'].map(df['sales'].value_counts()) > 1] print(result)
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary 3 Dan 7 205.4 4 Ethan 7 210.5
The DataFrame.map()
method applies a function to a DataFrame
element-wise.
result = df[df['sales'].map(df['sales'].value_counts()) > 1]
We used the
DataFrame.value_counts()
method to return a Series
containing counts of unique rows in the DataFrame
.
The same approach can be used if you need to add a column with the counts.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'sales': [1, 3, 5, 7, 7], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5], }) print(df) print('-' * 50) df['sales_count'] = df['sales'] df['sales_count'] = df['sales_count'].map(dict(df['sales'].value_counts())) print(df) print('-' * 50) print(df[df.sales_count > 1])
Running the code sample produces the following output.
name sales salary 0 Alice 1 175.1 1 Bobby 3 180.2 2 Carl 5 190.3 3 Dan 7 205.4 4 Ethan 7 210.5 -------------------------------------------------- name sales salary sales_count 0 Alice 1 175.1 1 1 Bobby 3 180.2 1 2 Carl 5 190.3 1 3 Dan 7 205.4 2 4 Ethan 7 210.5 2 -------------------------------------------------- name sales salary sales_count 3 Dan 7 205.4 2 4 Ethan 7 210.5 2
The new sales_count
column stores the value counts for the sales
column.
I've also written articles on:
You can learn more about the related topics by checking out the following tutorials:
pd.read_json()