Pandas: Find first and last non-NaN values in a DataFrame

# Table of Contents

# Pandas: Find first and last non-NaN values in a DataFrame

Use the Series.first_valid_index() and Series.last_valid_index() to find the first and last non-NaN values in a DataFrame.

The methods return the index of the first or last non-NA value or None if non-NA value is found.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'A': [np.nan, 'Bobby', 'Carl', np.nan],
    'B': [np.nan, 20, np.nan, 50],
    'C': [np.nan, np.nan, 30, np.nan],
})

print(df)

print('-' * 50)

first_non_nan = df.apply(pd.Series.first_valid_index)
print(first_non_nan)

print('-' * 50)

last_non_nan = df.apply(pd.Series.last_valid_index)
print(last_non_nan)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
       A     B     C
0    NaN   NaN   NaN
1  Bobby  20.0   NaN
2   Carl   NaN  30.0
3    NaN  50.0   NaN
--------------------------------------------------
A    1
B    1
C    2
dtype: int64
--------------------------------------------------
A    2
B    3
C    2
dtype: int64

find first and last non nan values in dataframe

The pandas.Series.first_valid_index() method returns the index of the first non-NA value or None if no non-NA value is found.

Conversely, the pandas.Series.last_valid_index() method returns the index of the last non-NA value or None if no non-NA values is found.

You can then use the index to get the first or last non-NaN values.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'A': [np.nan, 'Bobby', 'Carl', np.nan],
    'B': [np.nan, 20, np.nan, 50],
    'C': [np.nan, np.nan, 30, np.nan],
})

print(df)

print('-' * 50)

first_non_nan = df.apply(pd.Series.first_valid_index)

# A    1
# B    1
# C    2
# dtype: int64
print(first_non_nan)

print('-' * 50)

print(df['C'][2])  # 👉️ 30.0

The code for this article is available on GitHub

The code sample hard-codes the index (2), however, you can access the index directly from the newly created DataFrame.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'A': [np.nan, 'Bobby', 'Carl', np.nan],
    'B': [np.nan, 20, np.nan, 50],
    'C': [np.nan, np.nan, 30, np.nan],
})

print(df)

print('-' * 50)

first_non_nan = df.apply(pd.Series.first_valid_index)

# A    1
# B    1
# C    2
# dtype: int64
print(first_non_nan)

print('-' * 50)

print(df['C'][first_non_nan['C']])  # 👉️ 30.0

access index from newly created dataframe

The first_non_nan() and last_non_nan() methods return None if all values in the column are NaN.

Therefore, you might want to use a try/except statement to handle the potential error.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'A': [np.nan, 'Bobby', 'Carl', np.nan],
    'B': [np.nan, 20, np.nan, 50],
    'C': [np.nan, np.nan, np.nan, np.nan],
})

print(df)

print('-' * 50)

first_non_nan = df.apply(pd.Series.first_valid_index)

print(first_non_nan)

print('-' * 50)

try:
    df['C'][first_non_nan['C']]
except KeyError:
    print('All values in the C column are NaN')

using try except to handle potential error

# Pandas: Find first and last non-NaN values in a DataFrame using `idxmax()`

You can also use the DataFrame.notna() and the DataFrame.idxmax() method to achieve the same result.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'A': [np.nan, 'Bobby', 'Carl', np.nan],
    'B': [np.nan, 20, np.nan, 50],
    'C': [np.nan, np.nan, 30, np.nan],
})

print(df)

print('-' * 50)

first_non_nan = df.notna().idxmax()
print(first_non_nan)

print('-' * 50)

last_non_nan = df.notna()[::-1].idxmax()
print(last_non_nan)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
       A     B     C
0    NaN   NaN   NaN
1  Bobby  20.0   NaN
2   Carl   NaN  30.0
3    NaN  50.0   NaN
--------------------------------------------------
A    1
B    1
C    2
dtype: int64
--------------------------------------------------
A    2
B    3
C    2
dtype: int64

find first and last non nan values in dataframe using idxmax

The DataFrame.notna method detects non-missing values.

main.py

Copied!
first_non_nan = df.notna().idxmax()
print(first_non_nan)

last_non_nan = df.notna()[::-1].idxmax()
print(last_non_nan)

The DataFrame.idxmax method returns the index of the first occurrence of the max value over the requested axis.

You can access the existing DataFrame using the index of the first or last non-NaN value.

main.py

Copied!
import pandas as pd
import numpy as np


df = pd.DataFrame({
    'A': [np.nan, 'Bobby', 'Carl', np.nan],
    'B': [np.nan, 20, np.nan, 50],
    'C': [np.nan, np.nan, 30, np.nan],
})

print(df)

print('-' * 50)

first_non_nan = df.notna().idxmax()
print(first_non_nan)

print('-' * 50)

print(df['C'][first_non_nan['C']])  # 👉️ 30.0

access first or last non nan value in dataframe by index

The code for this article is available on GitHub

# Pandas: Find first and last non-NaN values in Series

You can also use the first_valid_index() and last_valid_index() methods to find the first and last non-NaN values in a Series.

main.py

Copied!
import pandas as pd
import numpy as np


series = pd.Series([np.nan, 5, np.nan, 10, np.nan, 15, np.nan])

first_non_nan = series.first_valid_index()
print(first_non_nan)  # 👉️ 1

print('-' * 50)

print(series[first_non_nan])  # 👉️ 5

print('-' * 50)

last_non_nan = series.last_valid_index()
print(last_non_nan)  # 👉️ 5

print('-' * 50)

print(series[last_non_nan])  # 👉️ 15

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
1
--------------------------------------------------
5.0
--------------------------------------------------
5
--------------------------------------------------
15.0

find first and last non nan values in series

We used the first_valid_index and last_valid_index methods to get the indices of the first and last non-NaN values in the Series.

You can then use bracket notation [] to access the Series at the two indices.

main.py

Copied!
print(series[first_non_nan])  # 👉️ 5

print(series[last_non_nan])  # 👉️ 15

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Find first and last non-NaN values in a DataFrame

# Table of Contents

# Pandas: Find first and last non-NaN values in a DataFrame

# Pandas: Find first and last non-NaN values in a DataFrame using idxmax()

# Pandas: Find first and last non-NaN values in Series

# Additional Resources

Borislav Hadzhiev

# Pandas: Find first and last non-NaN values in a DataFrame using `idxmax()`