Pandas: GroupBy columns with NaN (missing) values

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
3 min

banner

# Table of Contents

  1. Pandas: GroupBy columns with NaN (missing) values
  2. Pandas: GroupBy columns with NaN (missing) values using fillna()
  3. Pandas: GroupBy columns with NaN (missing) values using astype()

# Pandas: GroupBy columns with NaN (missing) values

To GroupBy columns with NaN (missing) values in a Pandas DataFrame:

  1. Call the groupby() method on the DataFrame.
  2. By default, the method will exclude the NaN values from the result.
  3. If you want to include the NaN values in the result, set the dropna argument to False.
main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'ID': [1, 1, 1, 2, 2, 2], 'Animal': ['Cat', 'Cat', np.nan, 'Dog', 'Dog', np.nan], 'Max Speed': [25, 35, 45, 55, 65, 75] }) # ๐Ÿ‘‡๏ธ without NA values print(df.groupby(['Animal']).mean()) print('-' * 50) # ๐Ÿ‘‡๏ธ with NA values print(df.groupby(['Animal'], dropna=False).mean())
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
ID Max Speed Animal Cat 1.0 30.0 Dog 2.0 60.0 -------------------------------------------------- ID Max Speed Animal Cat 1.0 30.0 Dog 2.0 60.0 NaN 1.5 60.0

groupby columns with missing nan values in pandas dataframe

The DataFrame.groupby() method groups the DataFrame by one or more columns.

By default, the dropna argument is set to True.

If the group keys contain NA values and dropna is set to True, the NA values with the row/column are dropped.

main.py
# ๐Ÿ‘‡๏ธ without NA values # ID Max Speed # Animal # Cat 1.0 30.0 # Dog 2.0 60.0 print(df.groupby(['Animal']).mean())

When the dropna argument is set to False, the NA values are treated as a key in the groups.

main.py
# ๐Ÿ‘‡๏ธ with NA values # ID Max Speed # Animal # Cat 1.0 30.0 # Dog 2.0 60.0 # NaN 1.5 60.0 print(df.groupby(['Animal'], dropna=False).mean())

# Pandas: GroupBy columns with NaN (missing) values using fillna()

Alternatively, you can group the columns with NaN (missing) values by using the fillna() method to replace them before calling groupby().

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'ID': [1, 1, 1, 2, 2, 2], 'Animal': ['Cat', 'Cat', np.nan, 'Dog', 'Dog', np.nan], 'Max Speed': [25, 35, 45, 55, 65, 75] }) # ID Max Speed # Animal # Cat 1.0 30.0 # Dog 2.0 60.0 # NOT_AVAILABLE 1.5 60.0 print(df.fillna('NOT_AVAILABLE').groupby(['Animal']).mean())

groupby columns with missing nan values using fillna

The code for this article is available on GitHub

The DataFrame.fillna() method fills the NA/NaN values.

The value we called the method with can be any placeholder that suits your use case (it doesn't have to be the "NOT_AVAILABLE" string).

For example, if the column is numeric, you can call the fillna() method with a value such as -1.

Once we've filled in the NA values, we can call the groupby() method without running into any issues.

You can also call the fillna() method separately, before calling groupby().

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'ID': [1, 1, 1, 2, 2, 2], 'Animal': ['Cat', 'Cat', np.nan, 'Dog', 'Dog', np.nan], 'Max Speed': [25, 35, 45, 55, 65, 75] }) df.fillna('NOT_AVAILABLE', inplace=True) # ID Max Speed # Animal # Cat 1.0 30.0 # Dog 2.0 60.0 # NOT_AVAILABLE 1.5 60.0 print(df.groupby(['Animal']).mean())

calling fillna method separately before calling groupby

The code for this article is available on GitHub

When the inplace argument is set to True, the fillna() method replaces the NaN values in place (mutates the original DataFrame).

# Pandas: GroupBy columns with NaN (missing) values using astype()

Alternatively, you can use the DataFrame.astype() method to convert the values in the column to strings (including the NaN values) before calling groupby().

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'ID': [1, 1, 1, 2, 2, 2], 'Animal': ['Cat', 'Cat', np.nan, 'Dog', 'Dog', np.nan], 'Max Speed': [25, 35, 45, 55, 65, 75] }) df['Animal'] = df['Animal'].astype(str) # ID Max Speed # Animal # Cat 1.0 30.0 # Dog 2.0 60.0 # nan 1.5 60.0 print(df.groupby(['Animal']).mean())

convert values in column to strings before calling groupby

The code for this article is available on GitHub

We used the DataFrame.astype() method to convert all values in the "Animal" column to strings (including the NaN values).

Once the column contains string values, we can call the groupby() method on it.

I've also written an article on how to replace None with NaN in a Pandas DataFrame.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev