Last updated: Apr 12, 2024
Reading timeยท4 min
Use the Series.first_valid_index()
and Series.last_valid_index()
to find
the first and last non-NaN values in a DataFrame
.
The methods return the index of the first or last non-NA value or None if non-NA value is found.
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) print(first_non_nan) print('-' * 50) last_non_nan = df.apply(pd.Series.last_valid_index) print(last_non_nan)
Running the code sample produces the following output.
A B C 0 NaN NaN NaN 1 Bobby 20.0 NaN 2 Carl NaN 30.0 3 NaN 50.0 NaN -------------------------------------------------- A 1 B 1 C 2 dtype: int64 -------------------------------------------------- A 2 B 3 C 2 dtype: int64
The
pandas.Series.first_valid_index()
method returns the index of the first non-NA value or None
if no non-NA value
is found.
Conversely, the
pandas.Series.last_valid_index()
method returns the index of the last non-NA value or None
if no non-NA values
is found.
You can then use the index to get the first or last non-NaN values.
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) # A 1 # B 1 # C 2 # dtype: int64 print(first_non_nan) print('-' * 50) print(df['C'][2]) # ๐๏ธ 30.0
The code sample hard-codes the index (2), however, you can access the index
directly from the newly created DataFrame
.
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) # A 1 # B 1 # C 2 # dtype: int64 print(first_non_nan) print('-' * 50) print(df['C'][first_non_nan['C']]) # ๐๏ธ 30.0
The first_non_nan()
and last_non_nan()
methods return None
if all values
in the column are NaN.
Therefore, you might want to use a try/except statement to handle the potential error.
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, np.nan, np.nan], }) print(df) print('-' * 50) first_non_nan = df.apply(pd.Series.first_valid_index) print(first_non_nan) print('-' * 50) try: df['C'][first_non_nan['C']] except KeyError: print('All values in the C column are NaN')
idxmax()
You can also use the DataFrame.notna() and the DataFrame.idxmax() method to achieve the same result.
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.notna().idxmax() print(first_non_nan) print('-' * 50) last_non_nan = df.notna()[::-1].idxmax() print(last_non_nan)
Running the code sample produces the following output.
A B C 0 NaN NaN NaN 1 Bobby 20.0 NaN 2 Carl NaN 30.0 3 NaN 50.0 NaN -------------------------------------------------- A 1 B 1 C 2 dtype: int64 -------------------------------------------------- A 2 B 3 C 2 dtype: int64
The DataFrame.notna method detects non-missing values.
first_non_nan = df.notna().idxmax() print(first_non_nan) last_non_nan = df.notna()[::-1].idxmax() print(last_non_nan)
The DataFrame.idxmax method returns the index of the first occurrence of the max value over the requested axis.
You can access the existing DataFrame
using the index of the first or last
non-NaN value.
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [np.nan, 'Bobby', 'Carl', np.nan], 'B': [np.nan, 20, np.nan, 50], 'C': [np.nan, np.nan, 30, np.nan], }) print(df) print('-' * 50) first_non_nan = df.notna().idxmax() print(first_non_nan) print('-' * 50) print(df['C'][first_non_nan['C']]) # ๐๏ธ 30.0
You can also use the first_valid_index()
and last_valid_index()
methods to
find the first and last non-NaN values in a Series
.
import pandas as pd import numpy as np series = pd.Series([np.nan, 5, np.nan, 10, np.nan, 15, np.nan]) first_non_nan = series.first_valid_index() print(first_non_nan) # ๐๏ธ 1 print('-' * 50) print(series[first_non_nan]) # ๐๏ธ 5 print('-' * 50) last_non_nan = series.last_valid_index() print(last_non_nan) # ๐๏ธ 5 print('-' * 50) print(series[last_non_nan]) # ๐๏ธ 15
Running the code sample produces the following output.
1 -------------------------------------------------- 5.0 -------------------------------------------------- 5 -------------------------------------------------- 15.0
We used the first_valid_index
and last_valid_index
methods to get the
indices of the first and last non-NaN values in the Series
.
You can then use bracket notation []
to access the Series
at the two
indices.
print(series[first_non_nan]) # ๐๏ธ 5 print(series[last_non_nan]) # ๐๏ธ 15
You can learn more about the related topics by checking out the following tutorials: