Count number of non-NaN values in each column of DataFrame

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
5 min

banner

# Table of Contents

  1. Count number of non-NaN values in each column of DataFrame
  2. Get the total number of the non-NaN values in the DataFrame
  3. Counting the empty strings as NA values
  4. Count number of non-missing values using notna()

# Count number of non-NaN values in each column of DataFrame

Use the count() method to count the number of non-NaN (or non-missing) values in each column of a DataFrame.

The method counts the non-NA cells for each column or row.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) print(df) print('-' * 50) print(df.count())
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience salary 0 Alice NaN NaN 1 Bobby 5.0 180.2 2 None NaN 190.3 3 None NaN 205.4 -------------------------------------------------- name 2 experience 1 salary 3 dtype: int64

count number of non nan values in every column on dataframe

We used the pandas.DataFrame() method to create a DataFrame object.

shell
name experience salary 0 Alice NaN NaN 1 Bobby 5.0 180.2 2 None NaN 190.3 3 None NaN 205.4

You can use the DataFrame.count() method to count the non-NA (or non-missing) cells for each column or row of the DataFrame.

main.py
print(df.count())

The method produces the following output.

shell
name 2 experience 1 salary 3 dtype: int64
  • The name column has 2 non-missing values
  • The experience column has 1 non-missing value
  • The salary column has 3 non-missing values

If you need to get the number of non-NaN (or non-missing) values in each row, set the axis parameter to 1 when calling count().

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) print(df) print('-' * 50) print(df.count()) print('-' * 50) print(df.count(axis=1))
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience salary 0 Alice NaN NaN 1 Bobby 5.0 180.2 2 None NaN 190.3 3 None NaN 205.4 -------------------------------------------------- name 2 experience 1 salary 3 dtype: int64 -------------------------------------------------- 0 1 1 3 2 1 3 1 dtype: int64

count number of non nan values is each row of dataframe

  • The row at index 0 (first row) has 1 non-missing value.
  • The row at index 1 (second row) has 3 non-missing values
  • The row at index 2 (third row) has 1 non-missing value.
  • The row at index 3 (fourth row) has 1 non-missing value.

Technically, the DataFrame.count() method counts the non-NA cells for each column or row.

The NA values are:

  • None
  • NaN
  • NaT
  • optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na).

The method takes an optional axis parameter.

By default, the parameter is set to 0, which means that non-NA counts are generated for each column.

If the axis parameter is set to 1, then non-NA counts are generated for each row.

# Get the total number of the non-NaN values in the DataFrame

If you need to get the total number of the non-NaN (or non-missing) values in the DataFrame, use the numpy.sum() method.

First, make sure you have the numpy module installed by running the following command from your terminal.

shell
pip install numpy pip3 install numpy

Now import numpy and use the numpy.sum() method as follows.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) print(df) print('-' * 50) print(np.sum(df.count())) # 👉️ 6

get total number of non missing values in dataframe

The code for this article is available on GitHub

The example shows that the DataFrame contains 6 non-missing values.

The numpy.sum() method returns the sum of the array elements over a given axis.

# Counting the empty strings as NA values

In some cases, you might also want to count the empty strings as NA values.

You can use the DataFrame.replace() method to replace the empty strings with numpy.nan before calling count().

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, np.nan], 'experience': [np.nan, 5, '', None], 'salary': [None, 180.2, '', 205.4], }) print(df) print('-' * 50) print(df.replace('', np.nan).count())
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience salary 0 Alice NaN None 1 Bobby 5 180.2 2 None 3 NaN None 205.4 -------------------------------------------------- name 2 experience 1 salary 2 dtype: int64

count empty strings as na values

We used the DataFrame.replace() method to replace each empty string with an np.nan value.

We directly called the count() method on the result, so the empty strings are counted as NA values.

# Count number of non-missing values using notna()

You can also use the DataFrame.notna() method to count the number of non-missing values.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) print('-' * 50) # name 2 # experience 1 # salary 3 # dtype: int64 print(df.notna().sum())

count number of not missing values using notna

The code for this article is available on GitHub

The pandas.DataFrame.notna() method detects the non-missing values in the DataFrame.

The method returns a DataFrame mask of boolean values for each element that indicates whether an element is not an NA value.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) print('-' * 50) # name experience salary # 0 True False False # 1 True True True # 2 False False True # 3 False False True print(df.notna())
The code for this article is available on GitHub

You can call the sum() method on the result to get the count of the non-missing values in each column.

main.py
# name 2 # experience 1 # salary 3 # dtype: int64 print(df.notna().sum())

You might also see the DataFrame.notnull() method being used.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) print('-' * 50) # name 2 # experience 1 # salary 3 # dtype: int64 print(df.notnull().sum())
The code for this article is available on GitHub

The DataFrame.notnull() method is an alias for the DataFrame.notna() method.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.