Pandas: Find common Rows (intersection) between 2 DataFrames

# Table of Contents

# Pandas: Find common Rows (intersection) between 2 DataFrames

Use the pandas.merge() method to find the common rows (intersection) between 2 DataFrames.

The method can be passed an on argument that stores a list of column names to join on.

main.py

Copied!
import pandas as pd


df1 = pd.DataFrame({
    'ID': ['a', 'b', 'c'],
    'A': ['x', 'y', 'z'],
    'B': [1, 2, 3]
})

df2 = pd.DataFrame({
    'ID': ['a', 'b', 'Z'],
    'A': ['a', 's', 'd'],
    'B': [5, 6, 7]
})


common_rows = pd.merge(df1, df2, how='inner', on=['ID'])

#   ID A_x  B_x A_y  B_y
# 0  a   x    1   a    5
# 1  b   y    2   s    6
print(common_rows)

find common rows intersection between 2 dataframes

The code for this article is available on GitHub

Note: if you need to drop the NA values (if any) after finding the common rows, use the DataFrame.dropna() method.

main.py

Copied!
common_rows = pd.merge(df1, df2, how='inner', on=['ID'])

common_rows.dropna(inplace=True)

The code sample finds the common rows between the two DataFrames based on the ID column.

In other words, if the ID value in df1 matches the ID value in df2, the corresponding row is returned.

Here are the two DataFrames from the example.

main.py

Copied!
import pandas as pd


df1 = pd.DataFrame({
    'ID': ['a', 'b', 'c'],
    'A': ['x', 'y', 'z'],
    'B': [1, 2, 3]
})

#   ID  A  B
# 0  a  x  1
# 1  b  y  2
# 2  c  z  3
print(df1)

print('-' * 50)

df2 = pd.DataFrame({
    'ID': ['a', 'b', 'Z'],
    'A': ['a', 's', 'd'],
    'B': [5, 6, 7]
})

#   ID  A  B
# 0  a  a  5
# 1  b  s  6
# 2  Z  d  7
print(df2)

the two dataframes from the example

The code for this article is available on GitHub

We passed the following arguments to the DataFrame.merge() method:

The first DataFrame.
The second DataFrame
The type of merge to be performed. The inner value means "use an intersection of keys from both frames, similar to a SQL inner join and preserve the order of the left keys".
The on argument is a list of column names to join on. These column names must be contained in both DataFrames.

main.py

Copied!
common_rows = pd.merge(df1, df2, how='inner', on=['ID'])

#   ID A_x  B_x A_y  B_y
# 0  a   x    1   a    5
# 1  b   y    2   s    6
print(common_rows)

If the on argument is None or not provided, then the method defaults to the intersection of the columns in both DataFrames.

main.py

Copied!
import pandas as pd


df1 = pd.DataFrame({
    'ID': ['a', 'b', 'c'],
    'A': ['x', 'y', 'z'],
    'B': [1, 2, 3]
})

df2 = pd.DataFrame({
    'ID': ['a', 'b', 'c'],
    'A': ['x', 'y', 'z'],
    'B': [1, 2, 3]
})


common_rows = pd.merge(df1, df2, how='inner')

#   ID  A  B
# 0  a  x  1
# 1  b  y  2
# 2  c  z  3
print(common_rows)

without passing value for none argument

The code for this article is available on GitHub

# Specifying multiple columns in the call to `merge()`

You can specify multiple columns in the on list when calling pandas.merge().

main.py

Copied!
import pandas as pd


df1 = pd.DataFrame({
    'ID': ['a', 'b', 'c'],
    'A': ['x', 'y', 'z'],
    'B': [1, 2, 3]
})

print(df1)

print('-' * 50)

df2 = pd.DataFrame({
    'ID': ['a', 'b', 'Z'],
    'A': ['x', 'y', 'z'],
    'B': [5, 6, 7]
})

print(df2)

print('-' * 50)


common_rows = pd.merge(df1, df2, how='inner', on=['ID', 'A'])

# 👇️ Optionally drop NA values
common_rows.dropna(inplace=True)

#   ID  A  B_x  B_y
# 0  a  x    1    5
# 1  b  y    2    6
print(common_rows)

specify multiple columns when calling merge

The code for this article is available on GitHub

The code sample selects the rows where the ID and A columns in both DataFrames have matching values.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Find common Rows (intersection) between 2 DataFrames

# Table of Contents