Last updated: Apr 12, 2024
Reading time·6 min

To add a count column to a Pandas DataFrame:
DataFrame.groupby() method to group the DataFrame by a given
column.DataFrame.transform() method to call the count function on each
row.import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['job_count'] = df.groupby(['job'])['job'].transform('count') print(df)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2

We used the
DataFrame.groupby()
method to group the DataFrame by the job column.
df['job_count'] = df.groupby(['job'])['job'].transform('count')
The method returns a groupby object that contains information about the
groups.
The last step is to call the count method on each row using the
DataFrame.transform()
method.
The DataFrame.count() method counts the non-NA cells for each row.
The output means that:
"job" column has 2 rows with the "frontend" value."job" column has 1 row with the "backend" value."job" column has 2 rows with the "tester" value.experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2
In some cases, you might have to group your DataFrame by multiple columns.
You can pass as many column names as necessary to the list you call groupby()
with.
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['job_count'] = df.groupby( ['job', 'experience'] )['job'].transform('count') print(df)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2

The output means that:
job column has a value of frontend and the
experience column has a value of 5.job column has a value of backend and the
experience column has a value of 2.job column has a value of tester and the
experience column has a value of 6.A list of column labels can be passed to group by as many columns as necessary.
np.zeros()You can also create a count column that stores dummy initial values and later
gets set to the real results.
import pandas as pd import numpy as np df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['count'] = np.zeros(len(df)) grp_df = df.groupby(['experience', 'job']).count().reset_index() print(grp_df)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job count 0 2 backend 1 1 5 frontend 2 2 6 tester 2

Make sure you
have the numpy module installed to be
able to run the code sample.
pip install numpy pip3 install numpy
We used the numpy.zeros() method to create a count column with zeros as
initial values.
experience job count 0 5 frontend 0.0 1 5 frontend 0.0 2 2 backend 0.0 3 6 tester 0.0 4 6 tester 0.0
The real values get set from the groupby().count() call.
grp_df = df.groupby(['experience', 'job']).count().reset_index() print(grp_df)
If you simply need to create a new DataFrame that has a count column, instead
of changing the original DataFrame, call
count() on
the groupby object.
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df2 = df.groupby(['job']).count() print(df2)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job backend 1 frontend 2 tester 2
The output means that:
"job" column has 1 row with the "backend" value."job" column has 2 rows with the "frontend" value."job" column has 2 rows with the "tester" value.You can also use the value_counts() method.
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) # experience job # 0 5 frontend # 1 5 frontend # 2 2 backend # 3 6 tester # 4 6 tester print(df) print('-' * 50) df2 = df[['job', 'experience']].value_counts().reset_index(name='count') # job experience count # 0 frontend 5 2 # 1 tester 6 2 # 2 backend 2 1 print(df2)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- job experience count 0 frontend 5 2 1 tester 6 2 2 backend 2 1

The output means that:
job column has a value of frontend and the
experience column has a value of 5.job column has a value of tester and the
experience column has a value of 6.job column has a value of backend and the
experience column has a value of 2.You can learn more about the related topics by checking out the following tutorials:
pd.read_json()