Last updated: Apr 11, 2024
Reading time·5 min

dtype to objectThe Pandas warning "Columns have mixed types. Specify dtype option on import or set low_memory=False." occurs when your CSV file contains columns that have mixed types and cannot be inferred reliably.
To resolve the issue, specify the data type of each column explicitly by
supplying the dtype argument.
For example, suppose you have the following employees.csv file.
first_name,last_name,date,salary Alice,Smith,01/21/1995,1500 Bobby,Hadz,04/14/1998,abc Carl,Lemon,06/11/1994,3000 Dean,Berry,06/11/1996,xyz
Notice that the salary column contains integers and strings.
The best way of solving the issue where pandas tries to guess the data type of
the column is to explicitly specify it.
import pandas as pd dtype = { 'first_name': str, 'last_name': str, 'date': str, 'salary': str } df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', dtype=dtype ) # first_name last_name date salary # 0 Alice Smith 01/21/1995 1500 # 1 Bobby Hadz 04/14/1998 abc # 2 Carl Lemon 06/11/1994 3000 # 3 Dean Berry 06/11/1996 xyz print(df)

The dtype argument of the
pandas.read_csv() method
is a dictionary of column name -> type.
Here is another example.
import numpy as np dtype={'a': np.float64, 'b': np.int32, 'c': 'Int64'}
You can use the str or object types to preserve and not interpret the
dtype.
Pandas can only infer the data type of a column once the entire column is read.
In other words, pandas can't start parsing the data in the column until all of
it is read which is very inefficient.
employee.csv file with a salary column that only contains numbers, pandas can't know that the column only contains numbers until it has read all 5 million rows.Note that the salary column in the CSV file doesn't contain only integers.
first_name,last_name,date,salary Alice,Smith,01/21/1995,1500 Bobby,Hadz,04/14/1998,abc Carl,Lemon,06/11/1994,3000 Dean,Berry,06/11/1996,xyz
So if you try to set the dtype of the column to int, you'd get an error:
import pandas as pd dtype = { 'first_name': str, 'last_name': str, 'date': str, 'salary': int # 👈️ set dtype to int } # ⛔️ ValueError: invalid literal for int() with base 10: 'abc' df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', dtype=dtype )
dtype to objectIf you just want to silence the warning, set the dtype of the column to
object.
import pandas as pd dtype = { 'first_name': str, 'last_name': str, 'date': str, 'salary': object } df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', dtype=dtype ) # first_name last_name date salary # 0 Alice Smith 01/21/1995 1500 # 1 Bobby Hadz 04/14/1998 abc # 2 Carl Lemon 06/11/1994 3000 # 3 Dean Berry 06/11/1996 xyz print(df)
Setting the dtype of the column to object silences the warning but doesn't
make the read_csv() method more memory efficient.
low_memory to FalseYou can also silence the warning by setting low_memory to False in the call
to pandas.read_csv().
import pandas as pd df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', low_memory=False ) # first_name last_name date salary # 0 Alice Smith 01/21/1995 1500 # 1 Bobby Hadz 04/14/1998 abc # 2 Carl Lemon 06/11/1994 3000 # 3 Dean Berry 06/11/1996 xyz print(df)

By default, the low_memory argument is set to True.
When the argument is set to True, the file is processed in chunks, resulting
in lower memory consumption while parsing, but possibly mixed type inference.
To ensure no mixed types, you can set the low_memory argument to False or
explicitly set the dtype parameter as we did in the previous subheading.
unicode data type insteadYou can also silence the warning by using the unicode data type.
import pandas as pd import numpy as np df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', index_col=False, dtype=np.dtype('unicode'), ) # first_name last_name date salary # 0 Alice Smith 01/21/1995 1500 # 1 Bobby Hadz 04/14/1998 abc # 2 Carl Lemon 06/11/1994 3000 # 3 Dean Berry 06/11/1996 xyz print(df)
Make sure you have the NumPy module installed by running the following command.
pip install numpy # or with pip3 pip3 install numpy
We set the data type to Unicode strings.
When index_col is set to False, pandas won't use the first column as the
index (e.g. when you have a malformed CSV file with delimiters at the end of
each line).
pythonAnother thing you can try is to set the engine argument to python when
calling pandas.read_csv.
import pandas as pd df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', engine='python' ) # first_name last_name date salary # 0 Alice Smith 01/21/1995 1500 # 1 Bobby Hadz 04/14/1998 abc # 2 Carl Lemon 06/11/1994 3000 # 3 Dean Berry 06/11/1996 xyz print(df)
The engine argument is the parser engine that should be used.
The available engines are c, pyarrow and python.
The c and pyarrow engines are faster, however, the python engine is more
feature-complete.
You can also use converters to resolve the warning.
Suppose you have the following employees.csv file.
first_name,last_name,date,salary Alice,Smith,01/21/1995,1500 Bobby,Hadz,04/14/1998,abc Carl,Lemon,06/11/1994,3000 Dean,Berry,06/11/1996,xyz
Notice that the salary column contains integers and strings.
Here is how we can use a converter to initialize the non-numeric salary values
to 0.
import pandas as pd import numpy as np def converter1(value): if not value: return 0 try: return np.float64(value) except Exception as _e: return np.float64(0) df = pd.read_csv( 'employees.csv', sep=',', encoding='utf-8', converters={'salary': converter1} ) # first_name last_name date salary # 0 Alice Smith 01/21/1995 1500.0 # 1 Bobby Hadz 04/14/1998 0.0 # 2 Carl Lemon 06/11/1994 3000.0 # 3 Dean Berry 06/11/1996 0.0 print(df)
The example only uses a converter for the salary column, however, you can
specify as many converters as necessary in the converters dictionary.
The converter1 function gets called with each field from the salary column.
If the supplied value is empty, we return 0.
The try block tries to convert the value to a NumPy float.
If an error is raised the except block runs where we return a 0 value.
You can learn more about the related topics by checking out the following tutorials: