Last updated: Apr 12, 2024
Reading time·6 min

To merge only specific DataFrame columns in Pandas:
merge() method on the first DataFrame.DataFrame from the
second DataFrame.import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2[['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

The DataFrame.merge() method merges DataFrames with a database-style join.
Notice that we used two sets of square brackets [[]] to select the columns
from df2 that we want to include in the resulting DataFrame.
df3 = df1.merge( df2[['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)
The new DataFrame only has the year, profit and employees columns.
You can select as many columns from df2 as necessary by specifying the column
names between the two sets of square brackets.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2[['year', 'employees', 'CEO']], on='year', how='left' ) # year profit employees CEO # 0 2020 1500 10 A # 1 2021 2500 15 B # 2 2022 3500 20 C # 3 2023 4500 25 D print(df3)

The resulting DataFrame now has the year, profit, employees and CEO
columns.
In other words, we include all columns from df1 and only the specified columns
from d2 in the resulting DataFrame.
If you want to specify which columns to include in the merge operation from
both DataFrames, use two sets of square brackets.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], 'CFO': ['Q', 'W', 'E', 'R'] }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], }) df3 = df1[['year', 'profit']].merge( df2[['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

We first used two sets of square brackets [[]] to select the columns from
df1 we want to include in the merge operation and then used two sets of square
brackets to select the specific columns from df2.
DataFrame.drop()In some cases, you might need the columns for the merge operation but might
want to remove the columns from the resulting DataFrame.
If that's the case, include the columns in the merge() call and use the
DataFrame.drop() method to remove
them after merging.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2[['year', 'employees', 'CEO']], on='year', how='left' ).drop(columns=['CEO']) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

We included the CEO column in the merge() call but wanted to exclude it from
the result, so we used the drop() method.
The DataFrame.drop() method drops the
specified columns from the DataFrame.
DataFrame without the specified columns.This approach should only be used when you want to remove columns that you need
for the join operation after merging.
DataFrame.locYou can also use the
DataFrame.loc label-based
indexer to select the columns you want to include in the merge operation.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2.loc[:, ['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

Make sure to specify the names of the columns you want to include in the
resulting DataFrame in the list after the comma.
df3 = df1.merge( df2.loc[:, ['year', 'employees']], on='year', how='left' )
You can also use the loc indexer to select the specific columns from df1 in
case you don't want to include all columns from df1 in the resulting
DataFrame.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], 'CFO': ['Q', 'W', 'E', 'R'] }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], }) df3 = df1.loc[:, ['year', 'profit']].merge( df2.loc[:, ['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

You can learn more about the related topics by checking out the following tutorials: