Pandas: Count the unique combinations of two Columns

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: Count the unique combinations of two Columns
  2. Getting the result as a Series
  3. Pandas: Count the unique combinations of two Columns using DataFrame.value_counts()

# Pandas: Count the unique combinations of two Columns

To count the unique combinations of two columns in Pandas:

  1. Use the DataFrame.groupby() method to group the DataFrame by the two columns.
  2. Use the size() method to compute the group sizes.
  3. Reset the index and optionally rename the column.
main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) result = df.groupby(['name', 'task']).size().reset_index().rename( columns={0: 'count'} ) print(result)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name task count 0 Alice dev 2 1 Alice test 2 2 Bobby dev 1 3 Bobby test 1

pandas count unique combinations of two columns

We used the DataFrame.groupby() method to group the DataFrame by the name and task columns.

main.py
result = df.groupby(['name', 'task']).size().reset_index().rename( columns={0: 'count'} )

By grouping the DataFrame by the two columns, we get the rows where the name and task column values are the same.

The next step is to use the DataFrameGroupBy.size() method to compute the group sizes.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) # name task # Alice dev 2 # test 2 # Bobby dev 1 # test 1 # dtype: int64 print(df.groupby(['name', 'task']).size())

once we have the size of each group, we can reset the index.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) # name task 0 # 0 Alice dev 2 # 1 Alice test 2 # 2 Bobby dev 1 # 3 Bobby test 1 print(df.groupby(['name', 'task']).size().reset_index())

The last step is to rename the 0 column to "count".

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) result = df.groupby(['name', 'task']).size().reset_index().rename( columns={0: 'count'} ) # name task count # 0 Alice dev 2 # 1 Alice test 2 # 2 Bobby dev 1 # 3 Bobby test 1 print(result)

The DataFrame.rename() method renames the columns or index labels of a DataFrame.

# Getting the result as a Series

If you set the as_index argument to False in the call to DataFrame.groupby(), you will get the result as a Series with a size column.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) result = df.groupby(['name', 'task'], as_index=False).size() # name task size # 0 Alice dev 2 # 1 Alice test 2 # 2 Bobby dev 1 # 3 Bobby test 1 print(result)

getting the result as series with size column

The code for this article is available on GitHub

The as_index argument defaults to True and returns an object with group labels as the index.

When it is set to False, SQL-style grouped output is returned.

# Pandas: Count the unique combinations of two Columns using DataFrame.value_counts()

You can also use the DataFrame.value_counts() method to count the unique combinations of two columns.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) result = df.value_counts() print(result)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name task Alice dev 2 test 2 Bobby dev 1 test 1 Name: count, dtype: int64

count unique combination of two columns using value counts

The DataFrame.value_counts() method returns a Series containing counts of unique rows in the DataFrame.

You can use the reset_index() method to get the result as a DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) result = df.value_counts().reset_index(name='count') # name task count # 0 Alice dev 2 # 1 Alice test 2 # 2 Bobby dev 1 # 3 Bobby test 1 print(result)

getting the result as a dataframe

By default, the value_counts() method returns the values sorted in descending order.

If you want to sort the values in ascending order, set the ascending argument to True.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', 'test', 'dev', 'test', 'dev'], }) result = df.value_counts(ascending=True).reset_index(name='count') # name task count # 0 Bobby dev 1 # 1 Bobby test 1 # 2 Alice dev 2 # 3 Alice test 2 print(result)

return the dataframe sorted in ascending order

The code for this article is available on GitHub

When the ascending argument is set to True, the results are sorted in ascending order.

The argument defaults to False.

By default, the value_counts() method doesn't include counts of rows that contain NA values.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', None, 'dev', 'test', None], }) result = df.value_counts().reset_index(name='count') # name task count # 0 Alice dev 1 # 1 Alice test 1 # 2 Bobby dev 1 # 3 Bobby test 1 print(result)

If you also want to include counts of rows that contain NA values, set the dropna argument to False.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Alice', 'Bobby', 'Alice', 'Alice'], 'task': ['dev', 'test', None, 'dev', 'test', None], }) result = df.value_counts(dropna=False).reset_index(name='count') # name task count # 0 Alice NaN 2 # 1 Alice dev 1 # 2 Alice test 1 # 3 Bobby dev 1 # 4 Bobby test 1 print(result)
The code for this article is available on GitHub

When the dropna argument is set to False, counts of rows that contain NA values are included.

By default, the argument is set to True.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.