Last updated: Apr 12, 2024
Reading time·6 min
To add a count column to a Pandas DataFrame
:
DataFrame.groupby()
method to group the DataFrame by a given
column.DataFrame.transform()
method to call the count
function on each
row.import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['job_count'] = df.groupby(['job'])['job'].transform('count') print(df)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2
We used the
DataFrame.groupby()
method to group the DataFrame
by the job
column.
df['job_count'] = df.groupby(['job'])['job'].transform('count')
The method returns a groupby
object that contains information about the
groups.
The last step is to call the count
method on each row using the
DataFrame.transform()
method.
The DataFrame.count() method counts the non-NA cells for each row.
The output means that:
"job"
column has 2 rows with the "frontend"
value."job"
column has 1 row with the "backend"
value."job"
column has 2 rows with the "tester"
value.experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2
In some cases, you might have to group your DataFrame
by multiple columns.
You can pass as many column names as necessary to the list you call groupby()
with.
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['job_count'] = df.groupby( ['job', 'experience'] )['job'].transform('count') print(df)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job job_count 0 5 frontend 2 1 5 frontend 2 2 2 backend 1 3 6 tester 2 4 6 tester 2
The output means that:
job
column has a value of frontend
and the
experience
column has a value of 5
.job
column has a value of backend
and the
experience
column has a value of 2
.job
column has a value of tester
and the
experience
column has a value of 6
.A list of column labels can be passed to group by as many columns as necessary.
np.zeros()
You can also create a count
column that stores dummy initial values and later
gets set to the real results.
import pandas as pd import numpy as np df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df['count'] = np.zeros(len(df)) grp_df = df.groupby(['experience', 'job']).count().reset_index() print(grp_df)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job count 0 2 backend 1 1 5 frontend 2 2 6 tester 2
Make sure you
have the numpy
module installed to be
able to run the code sample.
pip install numpy pip3 install numpy
We used the numpy.zeros()
method to create a count
column with zeros as
initial values.
experience job count 0 5 frontend 0.0 1 5 frontend 0.0 2 2 backend 0.0 3 6 tester 0.0 4 6 tester 0.0
The real values get set from the groupby().count()
call.
grp_df = df.groupby(['experience', 'job']).count().reset_index() print(grp_df)
If you simply need to create a new DataFrame
that has a count column, instead
of changing the original DataFrame
, call
count() on
the groupby
object.
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) print(df) print('-' * 50) df2 = df.groupby(['job']).count() print(df2)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- experience job backend 1 frontend 2 tester 2
The output means that:
"job"
column has 1 row with the "backend"
value."job"
column has 2 rows with the "frontend"
value."job"
column has 2 rows with the "tester"
value.You can also use the value_counts()
method.
import pandas as pd df = pd.DataFrame({ 'experience': [5, 5, 2, 6, 6], 'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'], }) # experience job # 0 5 frontend # 1 5 frontend # 2 2 backend # 3 6 tester # 4 6 tester print(df) print('-' * 50) df2 = df[['job', 'experience']].value_counts().reset_index(name='count') # job experience count # 0 frontend 5 2 # 1 tester 6 2 # 2 backend 2 1 print(df2)
Running the code sample produces the following output.
experience job 0 5 frontend 1 5 frontend 2 2 backend 3 6 tester 4 6 tester -------------------------------------------------- job experience count 0 frontend 5 2 1 tester 6 2 2 backend 2 1
The output means that:
job
column has a value of frontend
and the
experience
column has a value of 5
.job
column has a value of tester
and the
experience
column has a value of 6
.job
column has a value of backend
and the
experience
column has a value of 2
.You can learn more about the related topics by checking out the following tutorials:
pd.read_json()