Last updated: Apr 12, 2024
Reading time·5 min

To find the percentage of missing values in each column in a Pandas DataFrame:
DataFrame.isnull() method to detect the missing values in the
DataFrame.100 and divide the result by
the length of the DataFrame.import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().sum() * 100 / len(df) print(percent_missing) print('-' * 50) df2 = pd.DataFrame({ 'col_name': df.columns, 'percent_missing': percent_missing }) print(df2)
Running the code sample produces the following output.
name 50.0 experience 75.0 salary 25.0 dtype: float64 -------------------------------------------------- col_name percent_missing name name 50.0 experience experience 75.0 salary salary 25.0

The
DataFrame.isnull()
method detects the missing values in the DataFrame.
The method returns True for each missing value and False otherwise.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 False True True # 1 False False False # 2 True True False # 3 True True False print(df.isnull())

To get the number of missing values in each column as a percentage, multiply the
sum by 100 and divide the result by the DataFrame's length.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().sum() * 100 / len(df) # name 50.0 # experience 75.0 # salary 25.0 # dtype: float64 print(percent_missing)

If you need to get the percentage of missing values in each column in a
DataFrame, use the pd.DataFrame() class.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().sum() * 100 / len(df) df2 = pd.DataFrame({ 'col_name': df.columns, 'percent_missing': percent_missing }) # col_name percent_missing # name name 50.0 # experience experience 75.0 # salary salary 25.0 print(df2)

DataFrame by the percent_missing columnIf you need to sort the new DataFrame by the percent_missing column, use the
sort_values() method.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().sum() * 100 / len(df) df2 = pd.DataFrame({ 'col_name': df.columns, 'percent_missing': percent_missing }) print(df2) print('-' * 50) df2.sort_values('percent_missing', inplace=True) print(df2)
Running the code sample produces the following output.
col_name percent_missing name name 50.0 experience experience 75.0 salary salary 25.0 -------------------------------------------------- col_name percent_missing salary salary 25.0 name name 50.0 experience experience 75.0

The
DataFrame.sort_values
method sorts the values in the DataFrame along a given axis.
mean()You can also use the isnull() and mean() methods to find the percentage of
missing values in each column in a Pandas DataFrame.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().mean() * 100 print(percent_missing) print('-' * 50) df2 = pd.DataFrame({ 'col_name': df.columns, 'percent_missing': percent_missing }) print(df2)
Running the code sample produces the following output.
name 50.0 experience 75.0 salary 25.0 dtype: float64 -------------------------------------------------- col_name percent_missing name name 50.0 experience experience 75.0 salary salary 25.0

We used the isnull() method to detect the missing values in each column and
then called the DataFrame.mean method on the
result.
percent_missing = df.isnull().mean() * 100 # name 50.0 # experience 75.0 # salary 25.0 # dtype: float64 print(percent_missing)
The DataFrame.mean method returns the mean of the values over the requested
axis.
The last step is to multiply the result by 100 to get the percentage of
missing values.
If you need to convert the result to a DataFrame, use the pd.DataFrame
class.
df2 = pd.DataFrame({ 'col_name': df.columns, 'percent_missing': percent_missing }) # col_name percent_missing # name name 50.0 # experience experience 75.0 # salary salary 25.0 print(df2)
You can also use the mul() method for the multiplication.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().mean().round(4).mul(100) # name 50.0 # experience 75.0 # salary 25.0 # dtype: float64 print(percent_missing)
If you need to sort the values, use the sort_values() method and set the
ascending argument to True or False.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = df.isnull().mean().round( 4).mul(100).sort_values(ascending=True) # salary 25.0 # name 50.0 # experience 75.0 # dtype: float64 print(percent_missing)
The code sample sorts the values in ascending order, but you can also sort them
in descending order by setting ascending to False.
You can also pass the result to the pd.DataFrame() class to convert it to a
DataFrame.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) percent_missing = pd.DataFrame( df.isnull().mean().round(4).mul(100).sort_values(ascending=True), columns=['percent_missing'] ) # percent_missing # salary 25.0 # name 50.0 # experience 75.0 print(percent_missing)
You can learn more about the related topics by checking out the following tutorials: