How to add a Count Column to a Pandas DataFrame

# Table of Contents

# How to add a Count Column to a Pandas DataFrame

To add a count column to a Pandas DataFrame:

Use the DataFrame.groupby() method to group the DataFrame by a given column.
Use the DataFrame.transform() method to call the count function on each row.
Assign the result to a column using bracket notation.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    'experience': [5, 5, 2, 6, 6],
    'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'],
})

print(df)

print('-' * 50)

df['job_count'] = df.groupby(['job'])['job'].transform('count')
print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
   experience       job
0           5  frontend
1           5  frontend
2           2   backend
3           6    tester
4           6    tester
--------------------------------------------------
   experience       job  job_count
0           5  frontend          2
1           5  frontend          2
2           2   backend          1
3           6    tester          2
4           6    tester          2

add count column to dataframe

We used the DataFrame.groupby() method to group the DataFrame by the job column.

main.py

Copied!
df['job_count'] = df.groupby(['job'])['job'].transform('count')

The method returns a groupby object that contains information about the groups.

The last step is to call the count method on each row using the DataFrame.transform() method.

The DataFrame.count() method counts the non-NA cells for each row.

The output means that:

The "job" column has 2 rows with the "frontend" value.
The "job" column has 1 row with the "backend" value.
The "job" column has 2 rows with the "tester" value.

shell

Copied!
   experience       job
0           5  frontend
1           5  frontend
2           2   backend
3           6    tester
4           6    tester
--------------------------------------------------
   experience       job  job_count
0           5  frontend          2
1           5  frontend          2
2           2   backend          1
3           6    tester          2
4           6    tester          2

# Adding a Count Column to a Pandas DataFrame by grouping multiple columns

In some cases, you might have to group your DataFrame by multiple columns.

You can pass as many column names as necessary to the list you call groupby() with.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    'experience': [5, 5, 2, 6, 6],
    'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'],
})

print(df)

print('-' * 50)

df['job_count'] = df.groupby(
    ['job', 'experience']
)['job'].transform('count')

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
   experience       job
0           5  frontend
1           5  frontend
2           2   backend
3           6    tester
4           6    tester
--------------------------------------------------
   experience       job  job_count
0           5  frontend          2
1           5  frontend          2
2           2   backend          1
3           6    tester          2
4           6    tester          2

add count column to dataframe

The output means that:

There are 2 rows where the job column has a value of frontend and the experience column has a value of 5.
There is 1 row where the job column has a value of backend and the experience column has a value of 2.
There are 2 rows where the job column has a value of tester and the experience column has a value of 6.

A list of column labels can be passed to group by as many columns as necessary.

# Adding a Count Column to a Pandas DataFrame by using `np.zeros()`

You can also create a count column that stores dummy initial values and later gets set to the real results.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'experience': [5, 5, 2, 6, 6],
    'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'],
})

print(df)

print('-' * 50)

df['count'] = np.zeros(len(df))

grp_df = df.groupby(['experience', 'job']).count().reset_index()

print(grp_df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
   experience       job
0           5  frontend
1           5  frontend
2           2   backend
3           6    tester
4           6    tester
--------------------------------------------------
   experience       job  count
0           2   backend      1
1           5  frontend      2
2           6    tester      2

add count column to pandas dataframe using np zeros

Make sure you have the numpy module installed to be able to run the code sample.

shell

Copied!
pip install numpy

pip3 install numpy

We used the numpy.zeros() method to create a count column with zeros as initial values.

shell

Copied!
   experience       job  count
0           5  frontend    0.0
1           5  frontend    0.0
2           2   backend    0.0
3           6    tester    0.0
4           6    tester    0.0

The real values get set from the groupby().count() call.

main.py

Copied!
grp_df = df.groupby(['experience', 'job']).count().reset_index()

print(grp_df)

# Creating a new DataFrame that has a count column

If you simply need to create a new DataFrame that has a count column, instead of changing the original DataFrame, call count() on the groupby object.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    'experience': [5, 5, 2, 6, 6],
    'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'],
})

print(df)

print('-' * 50)

df2 = df.groupby(['job']).count()

print(df2)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
   experience       job
0           5  frontend
1           5  frontend
2           2   backend
3           6    tester
4           6    tester
--------------------------------------------------
          experience
job
backend            1
frontend           2
tester             2

The output means that:

The "job" column has 1 row with the "backend" value.
The "job" column has 2 rows with the "frontend" value.
The "job" column has 2 rows with the "tester" value.

You can also use the value_counts() method.

main.py

Copied!
import pandas as pd


df = pd.DataFrame({
    'experience': [5, 5, 2, 6, 6],
    'job': ['frontend', 'frontend', 'backend', 'tester', 'tester'],
})

#    experience       job
# 0           5  frontend
# 1           5  frontend
# 2           2   backend
# 3           6    tester
# 4           6    tester
print(df)

print('-' * 50)

df2 = df[['job', 'experience']].value_counts().reset_index(name='count')

#         job  experience  count
# 0  frontend           5      2
# 1    tester           6      2
# 2   backend           2      1
print(df2)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
   experience       job
0           5  frontend
1           5  frontend
2           2   backend
3           6    tester
4           6    tester
--------------------------------------------------
        job  experience  count
0  frontend           5      2
1    tester           6      2
2   backend           2      1

creating new dataframe with count column

The output means that:

There are 2 rows where the job column has a value of frontend and the experience column has a value of 5.
There are 2 rows where the job column has a value of tester and the experience column has a value of 6.
There is 1 row where the job column has a value of backend and the experience column has a value of 2.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

How to add a Count Column to a Pandas DataFrame

# Table of Contents