How to add a Count Column to a Pandas DataFrame

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
6 min

banner

# Table of Contents

  1. How to add a Count Column to a Pandas DataFrame
  2. Adding a Count Column to a Pandas DataFrame by grouping multiple columns
  3. Adding a Count Column to a Pandas DataFrame by using np.zeros()
  4. Creating a new DataFrame that has a count column

# How to add a Count Column to a Pandas DataFrame

To add a count column to a Pandas DataFrame:

  1. Use the DataFrame.groupby() method to group the DataFrame by a given column.
  2. Use the DataFrame.transform() method to call the count function on each row.
  3. Assign the result to a column using bracket notation.
main.py
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['job_count'] = df.groupby(['job'])['job'].transform('count') print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2

add count column to dataframe

We used the DataFrame.groupby() method to group the DataFrame by the job column.

main.py
df['job_count'] = df.groupby(['job'])['job'].transform('count')

The method returns a groupby object that contains information about the groups.

The last step is to call the count method on each row using the DataFrame.transform() method.

The DataFrame.count() method counts the non-NA cells for each row.

The output means that:

  • The "job" column has 2 rows with the "frontend" value.
  • The "job" column has 1 row with the "backend" value.
  • The "job" column has 2 rows with the "tester" value.
shell
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2

# Adding a Count Column to a Pandas DataFrame by grouping multiple columns

In some cases, you might have to group your DataFrame by multiple columns.

You can pass as many column names as necessary to the list you call groupby() with.

main.py
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['job_count'] = df.groupby( ['job', 'experience'] )['job'].transform('count') print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2

add count column to dataframe

The output means that:

  • There are 2 rows where the job column has a value of frontend and the experience column has a value of 5.
  • There is 1 row where the job column has a value of backend and the experience column has a value of 2.
  • There are 2 rows where the job column has a value of tester and the experience column has a value of 6.

A list of column labels can be passed to group by as many columns as necessary.

# Adding a Count Column to a Pandas DataFrame by using np.zeros()

You can also create a count column that stores dummy initial values and later gets set to the real results.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['count'] = np.zeros(len(df)) grp_df = df.groupby(['experience', 'job']).count().reset_index() print(grp_df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job count 0 2 backend 1 1 5 frontend 2 2 6 tester 2

add count column to pandas dataframe using np zeros

Make sure you have the numpy module installed to be able to run the code sample.

shell
pip install numpy pip3 install numpy

We used the numpy.zeros() method to create a count column with zeros as initial values.

shell
experience job count 0 5 frontend 0.0 1 5 frontend 0.0 2 2 backend 0.0 3 6 tester 0.0 4 6 tester 0.0

The real values get set from the groupby().count() call.

main.py
grp_df = df.groupby(['experience', 'job']).count().reset_index() print(grp_df)

# Creating a new DataFrame that has a count column

If you simply need to create a new DataFrame that has a count column, instead of changing the original DataFrame, call count() on the groupby object.

main.py
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df2 = df.groupby(['job']).count() print(df2)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job backend 1 frontend 2 tester 2

The output means that:

  • The "job" column has 1 row with the "backend" value.
  • The "job" column has 2 rows with the "frontend" value.
  • The "job" column has 2 rows with the "tester" value.

You can also use the value_counts() method.

main.py
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) # experience job # 0 5 frontend # 1 5 frontend # 2 2 backend # 3 6 tester # 4 6 tester print(df) print('-' * 50) df2 = df[['job', 'experience']].value_counts().reset_index(name='count') # job experience count # 0 frontend 5 2 # 1 tester 6 2 # 2 backend 2 1 print(df2)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- job experience count 0 frontend 5 2 1 tester 6 2 2 backend 2 1

creating new dataframe with count column

The output means that:

  • There are 2 rows where the job column has a value of frontend and the experience column has a value of 5.
  • There are 2 rows where the job column has a value of tester and the experience column has a value of 6.
  • There is 1 row where the job column has a value of backend and the experience column has a value of 2.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.