Pandas: Find common Rows (intersection) between 2 DataFrames

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: Find common Rows (intersection) between 2 DataFrames
  2. Specifying multiple columns in the call to merge()

# Pandas: Find common Rows (intersection) between 2 DataFrames

Use the pandas.merge() method to find the common rows (intersection) between 2 DataFrames.

The method can be passed an on argument that stores a list of column names to join on.

main.py
import pandas as pd df1 = pd.DataFrame({ 'ID': ['a', 'b', 'c'], 'A': ['x', 'y', 'z'], 'B': [1, 2, 3] }) df2 = pd.DataFrame({ 'ID': ['a', 'b', 'Z'], 'A': ['a', 's', 'd'], 'B': [5, 6, 7] }) common_rows = pd.merge(df1, df2, how='inner', on=['ID']) # ID A_x B_x A_y B_y # 0 a x 1 a 5 # 1 b y 2 s 6 print(common_rows)

find common rows intersection between 2 dataframes

The code for this article is available on GitHub

Note: if you need to drop the NA values (if any) after finding the common rows, use the DataFrame.dropna() method.

main.py
common_rows = pd.merge(df1, df2, how='inner', on=['ID']) common_rows.dropna(inplace=True)

The code sample finds the common rows between the two DataFrames based on the ID column.

In other words, if the ID value in df1 matches the ID value in df2, the corresponding row is returned.

Here are the two DataFrames from the example.

main.py
import pandas as pd df1 = pd.DataFrame({ 'ID': ['a', 'b', 'c'], 'A': ['x', 'y', 'z'], 'B': [1, 2, 3] }) # ID A B # 0 a x 1 # 1 b y 2 # 2 c z 3 print(df1) print('-' * 50) df2 = pd.DataFrame({ 'ID': ['a', 'b', 'Z'], 'A': ['a', 's', 'd'], 'B': [5, 6, 7] }) # ID A B # 0 a a 5 # 1 b s 6 # 2 Z d 7 print(df2)

the two dataframes from the example

The code for this article is available on GitHub

We passed the following arguments to the DataFrame.merge() method:

  1. The first DataFrame.
  2. The second DataFrame
  3. The type of merge to be performed. The inner value means "use an intersection of keys from both frames, similar to a SQL inner join and preserve the order of the left keys".
  4. The on argument is a list of column names to join on. These column names must be contained in both DataFrames.
main.py
common_rows = pd.merge(df1, df2, how='inner', on=['ID']) # ID A_x B_x A_y B_y # 0 a x 1 a 5 # 1 b y 2 s 6 print(common_rows)

If the on argument is None or not provided, then the method defaults to the intersection of the columns in both DataFrames.

main.py
import pandas as pd df1 = pd.DataFrame({ 'ID': ['a', 'b', 'c'], 'A': ['x', 'y', 'z'], 'B': [1, 2, 3] }) df2 = pd.DataFrame({ 'ID': ['a', 'b', 'c'], 'A': ['x', 'y', 'z'], 'B': [1, 2, 3] }) common_rows = pd.merge(df1, df2, how='inner') # ID A B # 0 a x 1 # 1 b y 2 # 2 c z 3 print(common_rows)

without passing value for none argument

The code for this article is available on GitHub

# Specifying multiple columns in the call to merge()

You can specify multiple columns in the on list when calling pandas.merge().

main.py
import pandas as pd df1 = pd.DataFrame({ 'ID': ['a', 'b', 'c'], 'A': ['x', 'y', 'z'], 'B': [1, 2, 3] }) print(df1) print('-' * 50) df2 = pd.DataFrame({ 'ID': ['a', 'b', 'Z'], 'A': ['x', 'y', 'z'], 'B': [5, 6, 7] }) print(df2) print('-' * 50) common_rows = pd.merge(df1, df2, how='inner', on=['ID', 'A']) # 👇️ Optionally drop NA values common_rows.dropna(inplace=True) # ID A B_x B_y # 0 a x 1 5 # 1 b y 2 6 print(common_rows)

specify multiple columns when calling merge

The code for this article is available on GitHub

The code sample selects the rows where the ID and A columns in both DataFrames have matching values.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.