Pandas: Changing the column type to Categorical

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: Changing the column type to Categorical
  2. Pandas: Changing the data type of multiple columns to Categorical
  3. Change column type to Categorical using a For loop
  4. Change column type to Categorical using a Lambda function
  5. Change column type to categorical for all, except some columns

# Pandas: Changing the column type to Categorical

To change the column type to Categorical in Pandas:

  1. Use square bracket notation to select the specific column.
  2. Call the astype() method on the selected column, passing it "category" as a parameter.
main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) df['name'] = df['name'].astype('category') print(df) print('-' * 50) print(df.dtypes)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience salary 0 Alice 1 189.1 1 Bobby 5 180.2 2 Carl 3 190.3 3 Dan 8 205.4 -------------------------------------------------- name category experience int64 salary float64 dtype: object

change column type to categorical in pandas

We used bracket notation to select the name column and called the astype method on it.

main.py
df['name'] = df['name'].astype('category')

The astype() method casts a pandas object to the supplied data type.

You can access the dtypes attribute on the DataFrame to verify that the column has been converted to category.

main.py
# name category # experience int64 # salary float64 # dtype: object print(df.dtypes)

The dtypes attribute returns the data types in the DataFrame.

To be more precise, a Series with the data type of each column is returned.

# Pandas: Changing the data type of multiple columns to Categorical

You can use the same approach if you need to change the data type of multiple columns to Categorical.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] df[columns] = df[columns].astype('category') # name category # experience category # salary float64 # dtype: object print(df.dtypes)

changing data type of multiple columns to categorical

The code for this article is available on GitHub

We passed a list of columns between the square brackets [] to change the type of the name and experience columns to category.

You can also specify the column names inline.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) df[['name', 'experience']] = df[['name', 'experience']].astype('category') # name category # experience category # salary float64 # dtype: object print(df.dtypes)

However, notice that we have 2 sets of square brackets next to one another.

# Change column type to Categorical using a For loop

In older versions of Pandas, you used to use a for loop to change the type of multiple columns.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] for column in columns: df[column] = df[column].astype('category') # name category # experience category # salary float64 # dtype: object print(df.dtypes)
The code for this article is available on GitHub

However, in recent Pandas versions, iterating over the columns collection is not necessary.

# Change column type to Categorical using a Lambda function

You might also see examples online that use a lambda function.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] df[columns] = df[columns].apply(lambda x: x.astype('category')) # name category # experience category # salary float64 # dtype: object print(df.dtypes)
The code for this article is available on GitHub

The lambda function gets called with each column name and sets its type to category.

However, using a lambda is not needed in recent Pandas versions, as you can directly specify the list of columns between the square brackets.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] df[columns] = df[columns].astype('category') # name category # experience category # salary float64 # dtype: object print(df.dtypes)

changing data type of multiple columns to categorical

# Change column type to categorical for all, except some columns

If you need to convert all columns, except for columns that have a specific data type to categorical, use the select_dtypes() method.

mqain.py
import pandas as pd df = pd.DataFrame({ 'id': ['a', 'b', 'c', 'd'], 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = df.select_dtypes(exclude='int').columns.to_list() df[columns] = df[columns].astype('category') # id category # name category # experience int64 # salary category # dtype: object print(df.dtypes)
The code for this article is available on GitHub

The DataFrame.select_dtypes() method returns a subset of the DataFrame's columns based on the column data types.

We excluded the int column (experience) and converted all other columns to categorical.

I've also written an article on how to get a list of categories or categorical columns.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.