Pandas: Changing the column type to Categorical

# Table of Contents

# Pandas: Changing the column type to Categorical

To change the column type to Categorical in Pandas:

Use square bracket notation to select the specific column.
Call the astype() method on the selected column, passing it "category" as a parameter.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})

df['name'] = df['name'].astype('category')

print(df)

print('-' * 50)

print(df.dtypes)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  experience  salary
0  Alice           1   189.1
1  Bobby           5   180.2
2   Carl           3   190.3
3    Dan           8   205.4
--------------------------------------------------
name          category
experience       int64
salary         float64
dtype: object

change column type to categorical in pandas

We used bracket notation to select the name column and called the astype method on it.

main.py

Copied!
df['name'] = df['name'].astype('category')

The astype() method casts a pandas object to the supplied data type.

You can access the dtypes attribute on the DataFrame to verify that the column has been converted to category.

main.py

Copied!
# name          category
# experience       int64
# salary         float64
# dtype: object
print(df.dtypes)

The dtypes attribute returns the data types in the DataFrame.

To be more precise, a Series with the data type of each column is returned.

# Pandas: Changing the data type of multiple columns to Categorical

You can use the same approach if you need to change the data type of multiple columns to Categorical.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})

columns = ['name', 'experience']

df[columns] = df[columns].astype('category')

# name          category
# experience    category
# salary         float64
# dtype: object
print(df.dtypes)

changing data type of multiple columns to categorical

The code for this article is available on GitHub

We passed a list of columns between the square brackets [] to change the type of the name and experience columns to category.

You can also specify the column names inline.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})


df[['name', 'experience']] = df[['name', 'experience']].astype('category')

# name          category
# experience    category
# salary         float64
# dtype: object
print(df.dtypes)

However, notice that we have 2 sets of square brackets next to one another.

# Change column type to Categorical using a For loop

In older versions of Pandas, you used to use a for loop to change the type of multiple columns.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})

columns = ['name', 'experience']

for column in columns:
    df[column] = df[column].astype('category')

# name          category
# experience    category
# salary         float64
# dtype: object
print(df.dtypes)

The code for this article is available on GitHub

However, in recent Pandas versions, iterating over the columns collection is not necessary.

# Change column type to Categorical using a Lambda function

You might also see examples online that use a lambda function.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})

columns = ['name', 'experience']

df[columns] = df[columns].apply(lambda x: x.astype('category'))

# name          category
# experience    category
# salary         float64
# dtype: object
print(df.dtypes)

The code for this article is available on GitHub

The lambda function gets called with each column name and sets its type to category.

However, using a lambda is not needed in recent Pandas versions, as you can directly specify the list of columns between the square brackets.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})

columns = ['name', 'experience']

df[columns] = df[columns].astype('category')

# name          category
# experience    category
# salary         float64
# dtype: object
print(df.dtypes)

changing data type of multiple columns to categorical

# Change column type to categorical for all, except some columns

If you need to convert all columns, except for columns that have a specific data type to categorical, use the select_dtypes() method.

mqain.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'id': ['a', 'b', 'c', 'd'],
    'name': ['Alice', 'Bobby', 'Carl', 'Dan'],
    'experience': [1, 5, 3, 8],
    'salary': [189.1, 180.2, 190.3, 205.4],
})

columns = df.select_dtypes(exclude='int').columns.to_list()

df[columns] = df[columns].astype('category')

# id            category
# name          category
# experience       int64
# salary        category
# dtype: object
print(df.dtypes)

The code for this article is available on GitHub

The DataFrame.select_dtypes() method returns a subset of the DataFrame's columns based on the column data types.

We excluded the int column (experience) and converted all other columns to categorical.

I've also written an article on how to get a list of categories or categorical columns.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Changing the column type to Categorical

# Table of Contents