Pandas: Merge only specific DataFrame columns [4 Ways]

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
6 min

banner

# Table of Contents

  1. Pandas: Merge only specific DataFrame columns
  2. Specifying which columns to include in the merge from both DataFrames
  3. Pandas: Merge only specific DataFrame columns using DataFrame.drop()
  4. Pandas: Merge only specific DataFrame columns using DataFrame.loc

# Pandas: Merge only specific DataFrame columns

To merge only specific DataFrame columns in Pandas:

  1. Call the merge() method on the first DataFrame.
  2. Select the columns you want to include in the result DataFrame from the second DataFrame.
main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2[['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

merge only specific dataframe columns

The code for this article is available on GitHub

The DataFrame.merge() method merges DataFrames with a database-style join.

Notice that we used two sets of square brackets [[]] to select the columns from df2 that we want to include in the resulting DataFrame.

main.py
df3 = df1.merge( df2[['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

The new DataFrame only has the year, profit and employees columns.

You can select as many columns from df2 as necessary by specifying the column names between the two sets of square brackets.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2[['year', 'employees', 'CEO']], on='year', how='left' ) # year profit employees CEO # 0 2020 1500 10 A # 1 2021 2500 15 B # 2 2022 3500 20 C # 3 2023 4500 25 D print(df3)

merge only specific 3 dataframe columns

The code for this article is available on GitHub

The resulting DataFrame now has the year, profit, employees and CEO columns.

In other words, we include all columns from df1 and only the specified columns from d2 in the resulting DataFrame.

# Specifying which columns to include in the merge from both DataFrames

If you want to specify which columns to include in the merge operation from both DataFrames, use two sets of square brackets.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], 'CFO': ['Q', 'W', 'E', 'R'] }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], }) df3 = df1[['year', 'profit']].merge( df2[['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

specifying which columns to include in merge from both dataframes

The code for this article is available on GitHub

We first used two sets of square brackets [[]] to select the columns from df1 we want to include in the merge operation and then used two sets of square brackets to select the specific columns from df2.

# Pandas: Merge only specific DataFrame columns using DataFrame.drop()

In some cases, you might need the columns for the merge operation but might want to remove the columns from the resulting DataFrame.

If that's the case, include the columns in the merge() call and use the DataFrame.drop() method to remove them after merging.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2[['year', 'employees', 'CEO']], on='year', how='left' ).drop(columns=['CEO']) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

merge specific dataframe columns using drop

The code for this article is available on GitHub

We included the CEO column in the merge() call but wanted to exclude it from the result, so we used the drop() method.

The DataFrame.drop() method drops the specified columns from the DataFrame.

In other words, the method returns the DataFrame without the specified columns.

This approach should only be used when you want to remove columns that you need for the join operation after merging.

# Pandas: Merge only specific DataFrame columns using DataFrame.loc

You can also use the DataFrame.loc label-based indexer to select the columns you want to include in the merge operation.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], 'CFO': ['Q', 'W', 'E', 'R'] }) df3 = df1.merge( df2.loc[:, ['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

merge only specific dataframe columns using loc

The code for this article is available on GitHub

Make sure to specify the names of the columns you want to include in the resulting DataFrame in the list after the comma.

main.py
df3 = df1.merge( df2.loc[:, ['year', 'employees']], on='year', how='left' )

You can also use the loc indexer to select the specific columns from df1 in case you don't want to include all columns from df1 in the resulting DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], 'CFO': ['Q', 'W', 'E', 'R'] }) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], 'CEO': ['A', 'B', 'C', 'D'], }) df3 = df1.loc[:, ['year', 'profit']].merge( df2.loc[:, ['year', 'employees']], on='year', how='left' ) # year profit employees # 0 2020 1500 10 # 1 2021 2500 15 # 2 2022 3500 20 # 3 2023 4500 25 print(df3)

using loc indexer on both dataframes

The code for this article is available on GitHub

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.