Last updated: Apr 12, 2024
Reading time·4 min

To keep the index when merging DataFrames in Pandas:
reset_index() to reset the index of the first DataFrame.merge() method on the result.set_index() method to set the index on the merged DataFrame.import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df3 = df1.reset_index().merge(df2, how='left').set_index('index') # year profit employees # index # A 2020 1500 10 # B 2021 2500 15 # C 2022 3500 20 # D 2023 4500 25 print(df3)

By default, when merging DataFrames, the new DataFrame will have a brand-new
index, starting at 0.
However, you will often have to change the default behavior.
We first used the
reset_index()
method to reset the index of the first DataFrame.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) # index year profit # 0 A 2020 1500 # 1 B 2021 2500 # 2 C 2022 3500 # 3 D 2023 4500 print(df1.reset_index())
Once its index is reset, we can merge the two DataFrames.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) # index year profit employees # 0 A 2020 1500 10 # 1 B 2021 2500 15 # 2 C 2022 3500 20 # 3 D 2023 4500 25 print(df1.reset_index().merge(df2, how='left'))
The last step is to set the index of the merged DataFrame.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) # year profit employees # index # A 2020 1500 10 # B 2021 2500 15 # C 2022 3500 20 # D 2023 4500 25 print(df1.reset_index().merge(df2, how='left').set_index('index'))
The DataFrame.set_index()
method sets the DataFrame index (row labels) using existing columns.
If you need to remove the duplicates after merging the DataFrames, use the DataFrame.drop_duplicates method.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df3 = df1.reset_index().merge(df2, how='left').set_index('index') df3.drop_duplicates(inplace=True) # year profit employees # index # A 2020 1500 10 # B 2021 2500 15 # C 2022 3500 20 # D 2023 4500 25 print(df3)

You can also create a
copy of the index column in
the first DataFrame before merging to keep the index.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df1['index_copy'] = df1.index df3 = df1.merge(df2, how='left') # year profit index_copy employees # 0 2020 1500 A 10 # 1 2021 2500 B 15 # 2 2022 3500 C 20 # 3 2023 4500 D 25 print(df3)

The DataFrame.index attribute returns
the index (row labels) of the DataFrame.
We used the attribute to add a new "index_copy" column to the first
DataFrame
You can then call the merge() method and the "index_copy" column will be
contained in the resulting DataFrame.
join()You can also use the DataFrame.join() method to keep the index when merging DataFrames.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['A', 'B', 'C', 'D']) df2 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'employees': [10, 15, 20, 25], }) df3 = df1.join(df2.set_index('year'), on='year') print(df3) print(df3.index)
Running the code sample produces the following output.
year profit employees A 2020 1500 10 B 2021 2500 15 C 2022 3500 20 D 2023 4500 25 Index(['A', 'B', 'C', 'D'], dtype='object')

The DataFrame.join() method joins the columns of two DataFrames, either on the
index or on a key column.
Make sure to call the set_index() method and to set the on argument to the
column you want to merge on (year in the example).
df3 = df1.join(df2.set_index('year'), on='year')
I've also written an article on how to Start the Index of a Pandas DataFrame at 1.
You can learn more about the related topics by checking out the following tutorials:
pd.read_json()