Pandas: Find first and last non-NaN values in a DataFrame

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: Find first and last non-NaN values in a DataFrame
  2. Pandas: Find first and last non-NaN values in a DataFrame using idxmax()
  3. Pandas: Find first and last non-NaN values in Series

# Pandas: Find first and last non-NaN values in a DataFrame

Use the Series.first_valid_index() and Series.last_valid_index() to find the first and last non-NaN values in a DataFrame.

The methods return the index of the first or last non-NA value or None if non-NA value is found.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) print(first_non_nan) print('-' * 50) last_non_nan = df.apply(pd.Series.last_valid_index) print(last_non_nan)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
A B C 0 NaN NaN NaN 1 Bobby 20.0 NaN 2 Carl NaN 30.0 3 NaN 50.0 NaN -------------------------------------------------- A 1 B 1 C 2 dtype: int64 -------------------------------------------------- A 2 B 3 C 2 dtype: int64

find first and last non nan values in dataframe

The pandas.Series.first_valid_index() method returns the index of the first non-NA value or None if no non-NA value is found.

Conversely, the pandas.Series.last_valid_index() method returns the index of the last non-NA value or None if no non-NA values is found.

You can then use the index to get the first or last non-NaN values.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) # A 1 # B 1 # C 2 # dtype: int64 print(first_non_nan) print('-' * 50) print(df['C'][2]) # ๐Ÿ‘‰๏ธ 30.0
The code for this article is available on GitHub

The code sample hard-codes the index (2), however, you can access the index directly from the newly created DataFrame.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) # A 1 # B 1 # C 2 # dtype: int64 print(first_non_nan) print('-' * 50) print(df['C'][first_non_nan['C']]) # ๐Ÿ‘‰๏ธ 30.0

access index from newly created dataframe

The first_non_nan() and last_non_nan() methods return None if all values in the column are NaN.

Therefore, you might want to use a try/except statement to handle the potential error.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, np.nan, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) print(first_non_nan) print('-' * 50) try: df['C'][first_non_nan['C']] except KeyError: print('All values in the C column are NaN')

using try except to handle potential error

# Pandas: Find first and last non-NaN values in a DataFrame using idxmax()

You can also use the DataFrame.notna() and the DataFrame.idxmax() method to achieve the same result.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.notna().idxmax() print(first_non_nan) print('-' * 50) last_non_nan = df.notna()[::-1].idxmax() print(last_non_nan)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
A B C 0 NaN NaN NaN 1 Bobby 20.0 NaN 2 Carl NaN 30.0 3 NaN 50.0 NaN -------------------------------------------------- A 1 B 1 C 2 dtype: int64 -------------------------------------------------- A 2 B 3 C 2 dtype: int64

find first and last non nan values in dataframe using idxmax

The DataFrame.notna method detects non-missing values.

main.py
first_non_nan = df.notna().idxmax() print(first_non_nan) last_non_nan = df.notna()[::-1].idxmax() print(last_non_nan)

The DataFrame.idxmax method returns the index of the first occurrence of the max value over the requested axis.

You can access the existing DataFrame using the index of the first or last non-NaN value.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.notna().idxmax() print(first_non_nan) print('-' * 50) print(df['C'][first_non_nan['C']]) # ๐Ÿ‘‰๏ธ 30.0

access first or last non nan value in dataframe by index

The code for this article is available on GitHub

# Pandas: Find first and last non-NaN values in Series

You can also use the first_valid_index() and last_valid_index() methods to find the first and last non-NaN values in a Series.

main.py
import pandas as pd import numpy as np series = pd.Series([np.nan, 5, np.nan, 10, np.nan, 15, np.nan]) first_non_nan = series.first_valid_index() print(first_non_nan) # ๐Ÿ‘‰๏ธ 1 print('-' * 50) print(series[first_non_nan]) # ๐Ÿ‘‰๏ธ 5 print('-' * 50) last_non_nan = series.last_valid_index() print(last_non_nan) # ๐Ÿ‘‰๏ธ 5 print('-' * 50) print(series[last_non_nan]) # ๐Ÿ‘‰๏ธ 15
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
1 -------------------------------------------------- 5.0 -------------------------------------------------- 5 -------------------------------------------------- 15.0

find first and last non nan values in series

We used the first_valid_index and last_valid_index methods to get the indices of the first and last non-NaN values in the Series.

You can then use bracket notation [] to access the Series at the two indices.

main.py
print(series[first_non_nan]) # ๐Ÿ‘‰๏ธ 5 print(series[last_non_nan]) # ๐Ÿ‘‰๏ธ 15

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev