Pandas: How to keep the Index when merging DataFrames

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: How to keep the Index when merging DataFrames
  2. Pandas: Keep the Index when merging DataFrames using an index copy
  3. Pandas: Keep the Index when merging DataFrames using join()

# Pandas: How to keep the Index when merging DataFrames

To keep the index when merging DataFrames in Pandas:

  1. Use the reset_index() to reset the index of the first DataFrame.
  2. Call the merge() method on the result.
  3. Use the set_index() method to set the index on the merged DataFrame.
main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df3 = df1.reset_index().merge(df2, how='left').set_index('index') # year profit employees # index # A 2020 1500 10 # B 2021 2500 15 # C 2022 3500 20 # D 2023 4500 25 print(df3)

keep index when merging dataframes in pandas

The code for this article is available on GitHub

By default, when merging DataFrames, the new DataFrame will have a brand-new index, starting at 0.

However, you will often have to change the default behavior.

We first used the reset_index() method to reset the index of the first DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) # index year profit # 0 A 2020 1500 # 1 B 2021 2500 # 2 C 2022 3500 # 3 D 2023 4500 print(df1.reset_index())

Once its index is reset, we can merge the two DataFrames.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) # index year profit employees # 0 A 2020 1500 10 # 1 B 2021 2500 15 # 2 C 2022 3500 20 # 3 D 2023 4500 25 print(df1.reset_index().merge(df2, how='left'))
The code for this article is available on GitHub

The last step is to set the index of the merged DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) # year profit employees # index # A 2020 1500 10 # B 2021 2500 15 # C 2022 3500 20 # D 2023 4500 25 print(df1.reset_index().merge(df2, how='left').set_index('index'))

The DataFrame.set_index() method sets the DataFrame index (row labels) using existing columns.

If you need to remove the duplicates after merging the DataFrames, use the DataFrame.drop_duplicates method.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df3 = df1.reset_index().merge(df2, how='left').set_index('index') df3.drop_duplicates(inplace=True) # year profit employees # index # A 2020 1500 10 # B 2021 2500 15 # C 2022 3500 20 # D 2023 4500 25 print(df3)

keep index when merging dataframes and remove duplicates

The code for this article is available on GitHub

# Pandas: Keep the Index when merging DataFrames using an index copy

You can also create a copy of the index column in the first DataFrame before merging to keep the index.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df1['index_copy'] = df1.index df3 = df1.merge(df2, how='left') # year profit index_copy employees # 0 2020 1500 A 10 # 1 2021 2500 B 15 # 2 2022 3500 C 20 # 3 2023 4500 D 25 print(df3)

keep index when merging dataframes using index copy

The code for this article is available on GitHub

The DataFrame.index attribute returns the index (row labels) of the DataFrame.

We used the attribute to add a new "index_copy" column to the first DataFrame

You can then call the merge() method and the "index_copy" column will be contained in the resulting DataFrame.

# Pandas: Keep the Index when merging DataFrames using join()

You can also use the DataFrame.join() method to keep the index when merging DataFrames.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df3 = df1.join(df2.set_index('year'), on='year') print(df3) print(df3.index)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
year profit employees A 2020 1500 10 B 2021 2500 15 C 2022 3500 20 D 2023 4500 25 Index(['A', 'B', 'C', 'D'], dtype='object')

keep index when merging dataframes using join

The DataFrame.join() method joins the columns of two DataFrames, either on the index or on a key column.

Make sure to call the set_index() method and to set the on argument to the column you want to merge on (year in the example).

main.py
df3 = df1.join(df2.set_index('year'), on='year')

I've also written an article on how to Start the Index of a Pandas DataFrame at 1.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.