Reindexing only valid with uniquely valued Index objects

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Reindexing only valid with uniquely valued Index objects
  2. Using the reset_index() method to reset the index of each DataFrame
  3. Removing the rows with duplicate indices
  4. The error also occurs when you have duplicate column names
  5. Removing the duplicate columns before calling pd.concat()

# Reindexing only valid with uniquely valued Index objects

The "pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects" occurs for 2 main reasons:

  1. When your index contains duplicate values before calling pandas.concat().

In this case, you have to use the DataFrame.reset_index() method to resolve it.

  1. When you have duplicate column names before calling pandas.concat().

Here is an example of how the error occurs.

main.py
import pandas as pd df1 = pd.DataFrame(index=[1, 0, 1], columns=['A'], data=[1, 2, 3]) df2 = pd.DataFrame(index=[0, 1, 1], columns=['B'], data=[4, 5, 6]) df1 = df1.reset_index() df2 = df2.reset_index() # โ›”๏ธ pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects df3 = pd.concat([df1, df2], axis=1) print(df3)

invalid index error reindexing only valid with uniquely value index objects

Notice that the index lists of the DataFrames contain duplicates.

You can also verify that this is the case by accessing the index.is_unique attribute.

main.py
import pandas as pd df1 = pd.DataFrame(index=[1, 0, 1], columns=['A'], data=[1, 2, 3]) print(df1.index.is_unique) # ๐Ÿ‘‰๏ธ False df2 = pd.DataFrame(index=[0, 1, 1], columns=['B'], data=[4, 5, 6]) print(df2.index.is_unique) # ๐Ÿ‘‰๏ธ False
The code for this article is available on GitHub

The index.is_unique attribute returns True if the index has unique values and False otherwise.

# Using the reset_index() method to reset the index of each DataFrame

One way to solve the error is to use the DataFrame.reset_index() method.

main.py
import pandas as pd df1 = pd.DataFrame(index=[1, 0, 1], columns=['A'], data=[1, 2, 3]) df2 = pd.DataFrame(index=[0, 1, 1], columns=['B'], data=[4, 5, 6]) df1 = df1.reset_index() df2 = df2.reset_index() df3 = pd.concat([df1, df2], axis=1) # index A index B # 0 1 1 0 4 # 1 0 2 1 5 # 2 1 3 1 6 print(df3)

using reset index method to solve the error

The code for this article is available on GitHub

By default, the method keeps the old index as a column in the new DataFrame.

However, you can change this behavior by setting the drop argument to True when calling reset_index().

main.py
import pandas as pd df1 = pd.DataFrame(index=[1, 0, 1], columns=['A'], data=[1, 2, 3]) df2 = pd.DataFrame(index=[0, 1, 1], columns=['B'], data=[4, 5, 6]) df1 = df1.reset_index(drop=True) df2 = df2.reset_index(drop=True) df3 = pd.concat([df1, df2], axis=1) # A B # 0 1 4 # 1 2 5 # 2 3 6 print(df3)

remove old index column when concatenating

When the drop argument is set to True, the index is reset to the default integer index.

The argument defaults to False.

If you want to avoid reassigning the variables, set the inplace argument to True.

main.py
import pandas as pd df1 = pd.DataFrame(index=[1, 0, 1], columns=['A'], data=[1, 2, 3]) df2 = pd.DataFrame(index=[0, 1, 1], columns=['B'], data=[4, 5, 6]) df1.reset_index(drop=True, inplace=True) df2.reset_index(drop=True, inplace=True) df3 = pd.concat([df1, df2], axis=1) # A B # 0 1 4 # 1 2 5 # 2 3 6 print(df3)

reset index in place

The code for this article is available on GitHub

The pandas.concat() method requires that the indices and column names be unique.

# Removing the rows with duplicate indices

If you'd rather just remove the rows with duplicate indices, use the df.loc label-based indexer with df.index.duplicated().

main.py
import pandas as pd df1 = pd.DataFrame(index=[1, 0, 1], columns=['A'], data=[1, 2, 3]) df2 = pd.DataFrame(index=[0, 1, 1], columns=['B'], data=[4, 5, 6]) df1 = df1.loc[~df1.index.duplicated(keep='first')] df2 = df2.loc[~df2.index.duplicated(keep='first')] df3 = pd.concat([df1, df2], axis=1) # A B # 0 1 4 # 1 2 5 # 2 3 6 print(df3)
The code for this article is available on GitHub

We used the index.duplicated() method to get an array containing the duplicate index values.

The expression then removes the rows with duplicate indices.

# The error also occurs when you have duplicate column names

The error also occurs if you have duplicate column names.

Here is an example.

main.py
import pandas as pd df1 = pd.DataFrame([[1, 2, 3]], columns=['A', 'A', 'C']) df2 = pd.DataFrame([[4, 5, 6]], columns=['A', 'B', 'C']) # # โ›”๏ธ pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects df3 = pd.concat([df1, df2])

Notice that the first DataFrame has duplicate column names (A * 2).

The concat() method doesn't know which of the two columns with a value of "A" from the first DataFrame should be aligned with the "A" column from the second DataFrame.

One way to solve the error is to pass unique column names when instantiating the DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame([[1, 2, 3]], columns=['A', 'B', 'C']) df2 = pd.DataFrame([[4, 5, 6]], columns=['A', 'B', 'C']) df3 = pd.concat([df1, df2]) # A B C # 0 1 2 3 # 0 4 5 6 print(df3)

pass unique column names to pd dataframe

The code for this article is available on GitHub

We passed unique column names to pd.DataFrame(), so everything works as expected.

If you need to find the duplicate columns in your DataFrame, use the columns.duplicated() method.

main.py
import pandas as pd df1 = pd.DataFrame([[1, 2, 3]], columns=['A', 'A', 'C']) # ๐Ÿ‘‡๏ธ Index(['A', 'A'], dtype='object') print(df1.columns[df1.columns.duplicated(keep=False)]) df2 = pd.DataFrame([[4, 5, 6]], columns=['A', 'B', 'C']) # ๐Ÿ‘‡๏ธ Index([], dtype='object') print(df2.columns[df2.columns.duplicated(keep=False)])

As shown in the code sample, the first DataFrame has a duplicate "A" column name, whereas the second DataFrame doesn't have any duplicate column names.

# Removing the duplicate columns before calling pd.concat()

You can also solve the error by removing the duplicate columns before calling pd.concat().

main.py
import pandas as pd df1 = pd.DataFrame([[1, 2, 3]], columns=['A', 'A', 'C']) print(df1) print('-' * 50) df1 = df1.loc[:, ~df1.columns.duplicated()].copy() print(df1) print('-' * 50) df2 = pd.DataFrame([[4, 5, 6]], columns=['A', 'B', 'C']) df3 = pd.concat([df1, df2]) # A C B # 0 1 3 NaN # 0 4 6 5.0 print(df3)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
A A C 0 1 2 3 -------------------------------------------------- A C 0 1 3 -------------------------------------------------- A C B 0 1 3 NaN 0 4 6 5.0

We used the following line to remove the duplicate columns from the DataFrame.

main.py
df1 = df1.loc[:, ~df1.columns.duplicated()].copy()

Repeat the process if both of your DataFrames contain duplicate columns.

main.py
df1 = df1.loc[:, ~df1.columns.duplicated()].copy() df2 = df2.loc[:, ~df2.columns.duplicated()].copy()

The pandas.concat() method requires that the indices and column names be unique.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev