Pandas: Find the percentage of Missing values in each Column

# Table of Contents

# Pandas: Find the percentage of Missing values in each Column

To find the percentage of missing values in each column in a Pandas DataFrame:

Use the DataFrame.isnull() method to detect the missing values in the DataFrame.
Sum the missing values, multiply the sum by 100 and divide the result by the length of the DataFrame.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().sum() * 100 / len(df)
print(percent_missing)

print('-' * 50)

df2 = pd.DataFrame({
    'col_name': df.columns,
    'percent_missing': percent_missing
})

print(df2)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
name          50.0
experience    75.0
salary        25.0
dtype: float64
--------------------------------------------------
              col_name  percent_missing
name              name             50.0
experience  experience             75.0
salary          salary             25.0

find percentage of missing values in each column

The DataFrame.isnull() method detects the missing values in the DataFrame.

The method returns True for each missing value and False otherwise.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

#     name  experience  salary
# 0  False        True    True
# 1  False       False   False
# 2   True        True   False
# 3   True        True   False
print(df.isnull())

detect missing values in dataframe

The code for this article is available on GitHub

To get the number of missing values in each column as a percentage, multiply the sum by 100 and divide the result by the DataFrame's length.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().sum() * 100 / len(df)

# name          50.0
# experience    75.0
# salary        25.0
# dtype: float64
print(percent_missing)

get percentage of missing values

# Getting the results in a new DataFrame

If you need to get the percentage of missing values in each column in a DataFrame, use the pd.DataFrame() class.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().sum() * 100 / len(df)

df2 = pd.DataFrame({
    'col_name': df.columns,
    'percent_missing': percent_missing
})

#               col_name  percent_missing
# name              name             50.0
# experience  experience             75.0
# salary          salary             25.0
print(df2)

get percentage of missing values in new dataframe

The code for this article is available on GitHub

# Sorting the new `DataFrame` by the `percent_missing` column

If you need to sort the new DataFrame by the percent_missing column, use the sort_values() method.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().sum() * 100 / len(df)

df2 = pd.DataFrame({
    'col_name': df.columns,
    'percent_missing': percent_missing
})

print(df2)
print('-' * 50)

df2.sort_values('percent_missing', inplace=True)
print(df2)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
              col_name  percent_missing
name              name             50.0
experience  experience             75.0
salary          salary             25.0
--------------------------------------------------
              col_name  percent_missing
salary          salary             25.0
name              name             50.0
experience  experience             75.0

sort new dataframe by percent missing column

The DataFrame.sort_values method sorts the values in the DataFrame along a given axis.

# Find the percentage of Missing values in each Column using `mean()`

You can also use the isnull() and mean() methods to find the percentage of missing values in each column in a Pandas DataFrame.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().mean() * 100
print(percent_missing)

print('-' * 50)

df2 = pd.DataFrame({
    'col_name': df.columns,
    'percent_missing': percent_missing
})

print(df2)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
name          50.0
experience    75.0
salary        25.0
dtype: float64
--------------------------------------------------
              col_name  percent_missing
name              name             50.0
experience  experience             75.0
salary          salary             25.0

find percentage of missing values using mean

We used the isnull() method to detect the missing values in each column and then called the DataFrame.mean method on the result.

main.py

Copied!
percent_missing = df.isnull().mean() * 100

# name          50.0
# experience    75.0
# salary        25.0
# dtype: float64
print(percent_missing)

The DataFrame.mean method returns the mean of the values over the requested axis.

The last step is to multiply the result by 100 to get the percentage of missing values.

If you need to convert the result to a DataFrame, use the pd.DataFrame class.

main.py

Copied!
df2 = pd.DataFrame({
    'col_name': df.columns,
    'percent_missing': percent_missing
})

#               col_name  percent_missing
# name              name             50.0
# experience  experience             75.0
# salary          salary             25.0
print(df2)

You can also use the mul() method for the multiplication.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().mean().round(4).mul(100)

# name          50.0
# experience    75.0
# salary        25.0
# dtype: float64
print(percent_missing)

The code for this article is available on GitHub

If you need to sort the values, use the sort_values() method and set the ascending argument to True or False.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = df.isnull().mean().round(
    4).mul(100).sort_values(ascending=True)

# salary        25.0
# name          50.0
# experience    75.0
# dtype: float64
print(percent_missing)

The code sample sorts the values in ascending order, but you can also sort them in descending order by setting ascending to False.

You can also pass the result to the pd.DataFrame() class to convert it to a DataFrame.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', None, None],
    'experience': [None, 5, None, None],
    'salary': [None, 180.2, 190.3, 205.4],
})

percent_missing = pd.DataFrame(
    df.isnull().mean().round(4).mul(100).sort_values(ascending=True),
    columns=['percent_missing']
)

#             percent_missing
# salary                 25.0
# name                   50.0
# experience             75.0
print(percent_missing)

The code for this article is available on GitHub

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Find the percentage of Missing values in each Column

# Table of Contents