Pandas: Merge only specific DataFrame columns [4 Ways]

# Table of Contents

# Pandas: Merge only specific DataFrame columns

To merge only specific DataFrame columns in Pandas:

Call the merge() method on the first DataFrame.
Select the columns you want to include in the result DataFrame from the second DataFrame.

main.py

Copied!
import pandas as pd

df1 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'profit': [1500, 2500, 3500, 4500],
})


df2 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'employees': [10, 15, 20, 25],
    'CEO': ['A', 'B', 'C', 'D'],
    'CFO': ['Q', 'W', 'E', 'R']
})


df3 = df1.merge(
    df2[['year', 'employees']],
    on='year',
    how='left'
)

#    year  profit  employees
# 0  2020    1500         10
# 1  2021    2500         15
# 2  2022    3500         20
# 3  2023    4500         25
print(df3)

merge only specific dataframe columns

The code for this article is available on GitHub

The DataFrame.merge() method merges DataFrames with a database-style join.

Notice that we used two sets of square brackets [[]] to select the columns from df2 that we want to include in the resulting DataFrame.

main.py

Copied!
df3 = df1.merge(
    df2[['year', 'employees']],
    on='year',
    how='left'
)

#    year  profit  employees
# 0  2020    1500         10
# 1  2021    2500         15
# 2  2022    3500         20
# 3  2023    4500         25
print(df3)

The new DataFrame only has the year, profit and employees columns.

You can select as many columns from df2 as necessary by specifying the column names between the two sets of square brackets.

main.py

Copied!
import pandas as pd

df1 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'profit': [1500, 2500, 3500, 4500],
})


df2 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'employees': [10, 15, 20, 25],
    'CEO': ['A', 'B', 'C', 'D'],
    'CFO': ['Q', 'W', 'E', 'R']
})


df3 = df1.merge(
    df2[['year', 'employees', 'CEO']],
    on='year',
    how='left'
)

#    year  profit  employees CEO
# 0  2020    1500         10   A
# 1  2021    2500         15   B
# 2  2022    3500         20   C
# 3  2023    4500         25   D
print(df3)

merge only specific 3 dataframe columns

The code for this article is available on GitHub

The resulting DataFrame now has the year, profit, employees and CEO columns.

In other words, we include all columns from df1 and only the specified columns from d2 in the resulting DataFrame.

# Specifying which columns to include in the merge from both DataFrames

If you want to specify which columns to include in the merge operation from both DataFrames, use two sets of square brackets.

main.py

Copied!
import pandas as pd

df1 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'profit': [1500, 2500, 3500, 4500],
    'CFO': ['Q', 'W', 'E', 'R']
})


df2 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'employees': [10, 15, 20, 25],
    'CEO': ['A', 'B', 'C', 'D'],
})


df3 = df1[['year', 'profit']].merge(
    df2[['year', 'employees']],
    on='year',
    how='left'
)

#    year  profit  employees
# 0  2020    1500         10
# 1  2021    2500         15
# 2  2022    3500         20
# 3  2023    4500         25
print(df3)

specifying which columns to include in merge from both dataframes

The code for this article is available on GitHub

We first used two sets of square brackets [[]] to select the columns from df1 we want to include in the merge operation and then used two sets of square brackets to select the specific columns from df2.

# Pandas: Merge only specific DataFrame columns using `DataFrame.drop()`

In some cases, you might need the columns for the merge operation but might want to remove the columns from the resulting DataFrame.

If that's the case, include the columns in the merge() call and use the DataFrame.drop() method to remove them after merging.

main.py

Copied!
import pandas as pd

df1 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'profit': [1500, 2500, 3500, 4500],
})


df2 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'employees': [10, 15, 20, 25],
    'CEO': ['A', 'B', 'C', 'D'],
    'CFO': ['Q', 'W', 'E', 'R']
})


df3 = df1.merge(
    df2[['year', 'employees', 'CEO']],
    on='year',
    how='left'
).drop(columns=['CEO'])

#    year  profit  employees
# 0  2020    1500         10
# 1  2021    2500         15
# 2  2022    3500         20
# 3  2023    4500         25
print(df3)

merge specific dataframe columns using drop

The code for this article is available on GitHub

We included the CEO column in the merge() call but wanted to exclude it from the result, so we used the drop() method.

The DataFrame.drop() method drops the specified columns from the DataFrame.

In other words, the method returns the DataFrame without the specified columns.

This approach should only be used when you want to remove columns that you need for the join operation after merging.

# Pandas: Merge only specific DataFrame columns using `DataFrame.loc`

You can also use the DataFrame.loc label-based indexer to select the columns you want to include in the merge operation.

main.py

Copied!
import pandas as pd

df1 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'profit': [1500, 2500, 3500, 4500],
})


df2 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'employees': [10, 15, 20, 25],
    'CEO': ['A', 'B', 'C', 'D'],
    'CFO': ['Q', 'W', 'E', 'R']
})


df3 = df1.merge(
    df2.loc[:, ['year', 'employees']],
    on='year',
    how='left'
)

#    year  profit  employees
# 0  2020    1500         10
# 1  2021    2500         15
# 2  2022    3500         20
# 3  2023    4500         25
print(df3)

merge only specific dataframe columns using loc

The code for this article is available on GitHub

Make sure to specify the names of the columns you want to include in the resulting DataFrame in the list after the comma.

main.py

Copied!
df3 = df1.merge(
    df2.loc[:, ['year', 'employees']],
    on='year',
    how='left'
)

You can also use the loc indexer to select the specific columns from df1 in case you don't want to include all columns from df1 in the resulting DataFrame.

main.py

Copied!
import pandas as pd

df1 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'profit': [1500, 2500, 3500, 4500],
    'CFO': ['Q', 'W', 'E', 'R']
})


df2 = pd.DataFrame({
    'year': [2020, 2021, 2022, 2023],
    'employees': [10, 15, 20, 25],
    'CEO': ['A', 'B', 'C', 'D'],
})


df3 = df1.loc[:, ['year', 'profit']].merge(
    df2.loc[:, ['year', 'employees']],
    on='year',
    how='left'
)

#    year  profit  employees
# 0  2020    1500         10
# 1  2021    2500         15
# 2  2022    3500         20
# 3  2023    4500         25
print(df3)