Get the first Row of each Group in a Pandas DataFrame

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Get the first row of each group in a Pandas DataFrame
  2. Getting the first N rows of each group in a Pandas DataFrame
  3. Get the first row of each group in a Pandas DataFrame by using nth()
  4. Using first() vs using nth(0) to get the first row of each group
  5. Get the first row of each column in a Pandas DataFrame by using drop_duplicates()

# Get the first row of each group in a Pandas DataFrame

To get the first row of each group in a Pandas DataFrame:

  1. Use the DataFrame.groupby() method to group the DataFrame.
  2. Use the DataFrameGroupBy.first method to get the first non-null entry of each column.
main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Max Speed # Animal # Cat 25 # Dog 45 print(df.groupby('Animal').first())

get first row of each group in pandas dataframe

The code for this article is available on GitHub

The DataFrame.groupby() method groups a DataFrame using one or more columns.

The method returns a DataFrameGroupBy object that contains information about the groups.

The last step is to call the DataFrameGroupBy.first() method.

The method computes the first non-null entry of each column.

The first() method returns a Series or a DataFrame containing the first non-null value within each group.

If you need to reset the index, call the DataFrame.reset_index method.

main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Animal Max Speed # 0 Cat 25 # 1 Dog 45 print(df.groupby('Animal').first().reset_index())

reset the index after getting first row of each group

The code for this article is available on GitHub

The DataFrame.reset_index() method resets the index of the DataFrame, so the default index is used.

# Getting the first N rows of each group in a Pandas DataFrame

If you need to get the first N rows of each group in a Pandas DataFrame:

  1. Use the DataFrame.head() method to return the first N rows of each group.
  2. Optionally reset the index.
main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Animal Max Speed # 0 Cat 25 # 1 Cat 35 # 2 Dog 45 # 3 Dog 55 print(df.groupby('Animal').head(2).reset_index(drop=True))

get first n rows of each group in pandas dataframe

The code for this article is available on GitHub

The DataFrame.head() method returns the first N rows of the DataFrame based on position.

The only argument the method takes is the number of rows to be selected.

We also called the reset_index method with drop=True in the example.

When the drop argument is set to True, the index is reset to the default integer index.

The argument defaults to False.

However, calling the reset_index() method on the result is optional.

main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Animal Max Speed # 0 Cat 25 # 1 Cat 35 # 3 Dog 45 # 4 Dog 55 print(df.groupby('Animal').head(2))

get first n rows of each group without calling reset index

# Get the first row of each group in a Pandas DataFrame by using nth()

You can also use the DataFrameGroupBy.nth method to get the first row of each group in a DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Animal Max Speed # 0 Cat 25 # 3 Dog 45 print(df.groupby('Animal').nth(0))

get first row of each group in dataframe using nth

The code for this article is available on GitHub

The DataFrameGroupBy.nth() method returns the Nth row from each group.

The only parameter we passed to the method is the row index to be returned.

Notice that indices are zero-based, so the first row has an index of 0, the second an index of 1, etc.

main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Animal Max Speed # 1 Cat 35 # 4 Dog 55 print(df.groupby('Animal').nth(1))

get second row of each group in dataframe

We used an index of 1, so the second row of each group in the DataFrame is returned.

# Using first() vs using nth(0) to get the first row of each group

Note that there are some minor differences between using first() and nth(0) to get the first row of each group.

The nth(0) approach returns the first row of each group regardless of what the values in the row are.

On the other hand, the first() method returns the first non-null (or non-NaN) value in each column.

Here is an example that better illustrates how this works.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [np.nan, 35, 40, 45, 55, 65] }) # Max Speed # Animal # Cat 35.0 # Dog 45.0 print(df.groupby('Animal').first()) print('-' * 50) # Animal Max Speed # 0 Cat NaN # 3 Dog 45.0 print(df.groupby('Animal').nth(0))
The code for this article is available on GitHub

The first row in the Max Speed column is a NaN value.

Calling the first() method returned the first not-NaN row of each group.

On the other hand, calling nth(0) simply returned the first row of each group, without checking for NaN.

# Get the first row of each column in a Pandas DataFrame by using drop_duplicates()

You can also use the DataFrame.drop_duplicates() method to get the first row of each column in a Pandas DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) # Animal Max Speed # 0 Cat 25 # 3 Dog 45 print(df.drop_duplicates('Animal'))
The code for this article is available on GitHub

The drop_duplicates() method returns a DataFrame object with the duplicate rows removed.

The first argument the method takes is a column label or a sequence of labels that should be considered when identifying duplicates.

By default, all of the columns are considered when identifying duplicates.

If you want to modify the original DataFrame to only contain the first row of each group, set the inplace argument to True.

main.py
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 35, 40, 45, 55, 65] }) df.drop_duplicates('Animal', inplace=True) # Animal Max Speed # 0 Cat 25 # 3 Dog 45 print(df)

When the inplace argument is set to True, the method returns None and modifies the original DataFrame rather than creating a new one.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.