Last updated: Apr 11, 2024
Reading time·4 min

DataFrame.join()The pandas "ValueError: You are trying to merge on int64 and object columns"
occurs when you try to merge two DataFrames on a column that has a type of
int64 in one DataFrame and type object in the other.
To solve the error, convert the object type to an integer before merging the
DataFrames.

Here is an example of how the error occurs.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ⛔️ ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat df3 = df1.merge(df2, on=['year'], how='left')
The year column has a type of int in the first
DataFrame
and type of string in the second DataFrame.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) print(df1['year']) print(df2['year'])

Trying to merge the two DataFrame on a column that has incompatible types
causes the error.
To solve the error, convert the year column to an integer in the second
DataFrame before joining.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Convert the year to an integer df2['year'] = df2['year'].astype(int) print(df1['year']) print(df2['year'])

The code sample uses the
DataFrame.astype
method to convert the values in the year column to integers.
You can also use dot notation to convert the column to an integer.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Convert the year to an integer (dot notation) df2.year = df2.year.astype(int) print(df1['year']) print(df2['year'])
Once the column is converted to an integer, you can safely merge the DataFrames.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # 1) Convert the year to an integer df2['year'] = df2['year'].astype(int) # 2) Merge the DataFrames df3 = df1.merge(df2, on=['year'], how='left') print(df3)

Once the year column is converted to an integer, we can safely call the
pandas.merge()
method.
If your DataFrames might contain None or missing values, use the Int64 type instead.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', None], 'employees': [10, 15, 20, 25], }) # 1) Convert the year to Int64 df2['year'] = df2['year'].astype('Int64') # 2) Merge the DataFrames df3 = df1.merge(df2, on=['year'], how='left') print(df3)

We passed the string Int64 instead of the int class to the astype()
method.
This is necessary because the second DataFrame contains None values in the
year column.
If your DataFrame column contains None values and you try to convert to int,
you would get an error.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', None], 'employees': [10, 15, 20, 25], }) # ⛔️ TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' df2['year'] = df2['year'].astype(int) df3 = df1.merge(df2, on=['year'], how='left') print(df3)
Pandas can represent integer data with possibly missing values, however, the
dtype has to be
set to Int64.
# ✅ works as expected df2['year'] = df2['year'].astype('Int64')
You can read more on the topic in this section of the docs.
DataFrame.join()You might also get the error when using the DataFrame.join() method.
import pandas as pd df1 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ⛔️ ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat df3 = df1.join(df2, on=['year'], how='left') print(df3)
Notice that the year columns in both DataFrames are of type string, however,
using the DataFrame.join() method still causes the error.
To solve the error, use the DataFrame.merge() method instead.
import pandas as pd df1 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Works as expected df3 = df1.merge(df2, on=['year'], how='left') print(df3)

Replacing the call to DataFrame.join() with DataFrame.merge() resolved the
issue.
The df1.join(df2) method always merges via the index of df2.
On the other hand, df1.merge(df2) merges on the column.
You can learn more about the related topics by checking out the following tutorials: