Last updated: Apr 11, 2024
Reading time·4 min
DataFrame.join()
The pandas "ValueError: You are trying to merge on int64 and object columns"
occurs when you try to merge two DataFrames on a column that has a type of
int64
in one DataFrame
and type object
in the other.
To solve the error, convert the object
type to an integer before merging the
DataFrames.
Here is an example of how the error occurs.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ⛔️ ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat df3 = df1.merge(df2, on=['year'], how='left')
The year
column has a type of int
in the first
DataFrame
and type of string in the second DataFrame
.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) print(df1['year']) print(df2['year'])
Trying to merge the two DataFrame
on a column that has incompatible types
causes the error.
To solve the error, convert the year
column to an integer in the second
DataFrame before joining.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Convert the year to an integer df2['year'] = df2['year'].astype(int) print(df1['year']) print(df2['year'])
The code sample uses the
DataFrame.astype
method to convert the values in the year
column to integers.
You can also use dot notation to convert the column to an integer.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Convert the year to an integer (dot notation) df2.year = df2.year.astype(int) print(df1['year']) print(df2['year'])
Once the column is converted to an integer, you can safely merge the DataFrames.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # 1) Convert the year to an integer df2['year'] = df2['year'].astype(int) # 2) Merge the DataFrames df3 = df1.merge(df2, on=['year'], how='left') print(df3)
Once the year
column is converted to an integer, we can safely call the
pandas.merge()
method.
If your DataFrames might contain None or missing values, use the Int64 type instead.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', None], 'employees': [10, 15, 20, 25], }) # 1) Convert the year to Int64 df2['year'] = df2['year'].astype('Int64') # 2) Merge the DataFrames df3 = df1.merge(df2, on=['year'], how='left') print(df3)
We passed the string Int64
instead of the int
class to the astype()
method.
This is necessary because the second DataFrame
contains None
values in the
year
column.
If your DataFrame column contains None
values and you try to convert to int
,
you would get an error.
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', None], 'employees': [10, 15, 20, 25], }) # ⛔️ TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' df2['year'] = df2['year'].astype(int) df3 = df1.merge(df2, on=['year'], how='left') print(df3)
Pandas can represent integer data with possibly missing values, however, the
dtype has to be
set to Int64
.
# ✅ works as expected df2['year'] = df2['year'].astype('Int64')
You can read more on the topic in this section of the docs.
DataFrame.join()
You might also get the error when using the DataFrame.join()
method.
import pandas as pd df1 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ⛔️ ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat df3 = df1.join(df2, on=['year'], how='left') print(df3)
Notice that the year
columns in both DataFrames are of type string, however,
using the DataFrame.join()
method still causes the error.
To solve the error, use the DataFrame.merge() method instead.
import pandas as pd df1 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'year': ['2020', '2021', '2022', '2023'], 'employees': [10, 15, 20, 25], }) # ✅ Works as expected df3 = df1.merge(df2, on=['year'], how='left') print(df3)
Replacing the call to DataFrame.join()
with DataFrame.merge()
resolved the
issue.
The df1.join(df2)
method always merges via the index of df2
.
On the other hand, df1.merge(df2)
merges on the column.
You can learn more about the related topics by checking out the following tutorials: