You are trying to merge on int64 and object columns [Fixed]

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
4 min

banner

# Table of Contents

  1. You are trying to merge on int64 and object columns
  2. If your DataFrames contain None or missing values, use Int64 instead
  3. You might also get the error when using DataFrame.join()

# You are trying to merge on int64 and object columns [Fixed]

The pandas "ValueError: You are trying to merge on int64 and object columns" occurs when you try to merge two DataFrames on a column that has a type of int64 in one DataFrame and type object in the other.

To solve the error, convert the object type to an integer before merging the DataFrames.

you are trying to merge on int64 and object columns

Here is an example of how the error occurs.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ⛔️ ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat df3 = df1.merge(df2, on=['year'], how='left')

The year column has a type of int in the first DataFrame and type of string in the second DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) print(df1['year']) print(df2['year'])

trying to merge dataframes on columns of different type

The code for this article is available on GitHub

Trying to merge the two DataFrame on a column that has incompatible types causes the error.

To solve the error, convert the year column to an integer in the second DataFrame before joining.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Convert the year to an integer df2['year'] = df2['year'].astype(int) print(df1['year']) print(df2['year'])

convert the column to an integer

The code for this article is available on GitHub

The code sample uses the DataFrame.astype method to convert the values in the year column to integers.

You can also use dot notation to convert the column to an integer.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Convert the year to an integer (dot notation) df2.year = df2.year.astype(int) print(df1['year']) print(df2['year'])

Once the column is converted to an integer, you can safely merge the DataFrames.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # 1) Convert the year to an integer df2['year'] = df2['year'].astype(int) # 2) Merge the DataFrames df3 = df1.merge(df2, on=['year'], how='left') print(df3)

dataframes merged without error

The code for this article is available on GitHub

Once the year column is converted to an integer, we can safely call the pandas.merge() method.

# If your DataFrames contain None or missing values, use Int64 instead

If your DataFrames might contain None or missing values, use the Int64 type instead.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', None], 'employees': [10, 15, 20, 25], }) # 1) Convert the year to Int64 df2['year'] = df2['year'].astype('Int64') # 2) Merge the DataFrames df3 = df1.merge(df2, on=['year'], how='left') print(df3)

using int64 type when values are missing or none

The code for this article is available on GitHub

We passed the string Int64 instead of the int class to the astype() method.

This is necessary because the second DataFrame contains None values in the year column.

If your DataFrame column contains None values and you try to convert to int, you would get an error.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', None], 'employees': [10, 15, 20, 25], }) # ⛔️ TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' df2['year'] = df2['year'].astype(int) df3 = df1.merge(df2, on=['year'], how='left') print(df3)

Pandas can represent integer data with possibly missing values, however, the dtype has to be set to Int64.

main.py
# ✅ works as expected df2['year'] = df2['year'].astype('Int64')

You can read more on the topic in this section of the docs.

# You might also get the error when using DataFrame.join()

You might also get the error when using the DataFrame.join() method.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ⛔️ ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat df3 = df1.join(df2, on=['year'], how='left') print(df3)

Notice that the year columns in both DataFrames are of type string, however, using the DataFrame.join() method still causes the error.

To solve the error, use the DataFrame.merge() method instead.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Works as expected df3 = df1.merge(df2, on=['year'], how='left') print(df3)

use merge instead of join

The code for this article is available on GitHub

Replacing the call to DataFrame.join() with DataFrame.merge() resolved the issue.

The df1.join(df2) method always merges via the index of df2.

On the other hand, df1.merge(df2) merges on the column.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.