Last updated: Apr 11, 2024
Reading time·4 min
The error "TypeError: Invalid comparison between dtype=datetime64[ns] and
date" occurs when you try to compare a dataframe
date with a native Python
date
type.
Access the dt.date
attributes on the dataframe
date to compare objects of
type date
to solve the error.
Here is an example of how the error occurs.
from datetime import date import pandas as pd df = pd.DataFrame({'date': ['06/12/2023', '07/15/2023', '11/16/2023'], 'value': [2, 3, 4]}) df['date'] = pd.to_datetime(df['date']) print(df) # ⛔️ TypeError: Invalid comparison between dtype=datetime64[ns] and date df2 = df.loc[df['date'] >= date.today()] print(df2)
Running the code sample above produces the following error:
raise InvalidComparison(other) pandas.errors.InvalidComparison: 2023-06-14 During handling of the above exception, another exception occurred: TypeError: Invalid comparison between dtype=datetime64[ns] and date
We are trying to compare a datetime64
object with a native Python date
which
caused the error.
To solve the error, access the dt.date
attribute on the dataframe
dates to
compare native date
Python objects instead.
from datetime import date import pandas as pd df = pd.DataFrame({'date': ['06/12/2023', '07/15/2023', '11/16/2023'], 'value': [2, 3, 4]}) # ✅ Access dt.date attribute df['date'] = pd.to_datetime(df['date']).dt.date print(df) print('-' * 50) df2 = df.loc[df['date'] >= date.today()] print(df2)
I replaced the following line:
# ⛔️ Incorrect df['date'] = pd.to_datetime(df['date'])
With the following line:
df['date'] = pd.to_datetime(df['date']).dt.date
The dt.date
attribute converts the dataframe
series to datetime.date
objects.
dt.date
attribute enables us to compare native Python date
objects instead of datetime64
objects with date
objects.You can also access the dt.date
attribute directly in the comparison.
from datetime import date import pandas as pd df = pd.DataFrame({'date': ['06/12/2023', '07/15/2023', '11/16/2023'], 'value': [2, 3, 4]}) df['date'] = pd.to_datetime(df['date']) print(df) print('-' * 50) # ✅ Access dt.date attribute df2 = df.loc[df['date'].dt.date >= date.today()] print(df2)
The code sample converts the datetime64
object to a date
object directly
when comparing with the current date.
# ✅ Access dt.date attribute df2 = df.loc[df['date'].dt.date >= date.today()]
pandas.Timestamp()
class to solve the errorYou can also use the pandas.Timestamp class to solve the error.
import pandas as pd df = pd.DataFrame({'date': ['06/12/2023', '07/15/2023', '11/16/2023'], 'value': [2, 3, 4]}) df['date'] = pd.to_datetime(df['date']) print(df) print('-' * 50) df2 = df.loc[df['date'] >= pd.Timestamp('today')] print(df2)
Notice that we used the pandas.Timestamp()
class to create a Timestamp
object.
You can compare Timestamp
objects with datetime64
objects, so everything
works as expected.
The pandas.Timestamp()
constructor is a replacement for the Python
datetime.datetime
object.
The constructor takes a value and converts it to a Timestamp
object.
Here are some examples of instantiating the constructor with different arguments.
import pandas as pd # 👇️ 2023-06-24 12:00:00 print(pd.Timestamp('2023-06-24T12')) # 👇️ 2023-08-30 11:02:35.500000 print(pd.Timestamp(1693393355.5, unit='s')) # 👇️ 2023-08-30 04:02:35.500000-07:00 print(pd.Timestamp(1693393355.5, unit='s', tz='US/Pacific')) # 👇️ 2023-09-24 12:00:00 print(pd.Timestamp(2023, 9, 24, 12)) # 👇️ 2023-09-24 12:00:00 print(pd.Timestamp(year=2023, month=9, day=24, hour=12))
The pandas.Timestamp() constructor can be instantiated with different arguments.
As shown in the following code sample, comparing a Timestamp
object to a
datetime64
object works as expected.
import pandas as pd df = pd.DataFrame({'date': ['06/12/2023', '07/15/2023', '11/16/2023'], 'value': [2, 3, 4]}) df['date'] = pd.to_datetime(df['date']) print(df) print('-' * 50) df2 = df.loc[df['date'] >= pd.Timestamp('2023-06-14')] print(df2)
You can also use the pandas.Timestamp.floor()
method to floor the Timestamp
object to the specified frequency.
import pandas as pd # 👇️ 2023-06-24 00:00:00 (day) print(pd.Timestamp('2023-06-24T12').floor(freq='D')) # 👇️ 2023-08-30 11:00:00 (hour) print(pd.Timestamp(1693393355.5, unit='s').floor(freq='H')) # 👇️ 2023-08-30 04:02:00-07:00 (minute) print(pd.Timestamp(1693393355.5, unit='s', tz='US/Pacific').floor(freq='T')) # 👇️ 2023-08-30 11:02:35 (seconds) print(pd.Timestamp(1693393355.5, unit='s').floor(freq='S'))
The floor()
method takes a freq
string that indicates the flooring
resolution.
The examples floor the timestamp to days, hours, minutes and seconds.
Make sure your date is formatted correctly.
For example, if your date strings are formatted as YYYY-DD-MM
, you have to
pass a format
keyword argument to the pandas.to_datetime()
method.
from datetime import date import pandas as pd df = pd.DataFrame({'date': ['2023-12-06', '2023-15-08', '2023-16-11'], 'value': [2, 3, 4]}) # ✅ Pass format argument df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m").dt.date print(df) print('-' * 50) df2 = df.loc[df['date'] >= date.today()] print(df2)
The strings in the date
column are formatted as YYYY-MM-DD
in the example.
This is not a standard format, so we had to pass the format
argument to the
pd.to_datetime
method.
You can view all of the special %
characters in
this table
in the strftime()
and strptime()
docs.
If you run into any issues when specifying the format string, check out the following article.
You can learn more about the related topics by checking out the following tutorials: