
Last updated: Apr 12, 2024
Reading timeยท4 min

To get the categorical columns in a DataFrame:
select_dtypes() method on the DataFrame.include() argument to "category".DataFrame containing only the categorical columns.import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) print(df.select_dtypes(include=['category'])) print('-' * 50) print(df['name'].cat.categories)
Running the code sample produces the following output.
id name 0 a Alice 1 b Bobby 2 c Carl 3 d Dan -------------------------------------------------- Index(['Alice', 'Bobby', 'Carl', 'Dan'], dtype='object')

The DataFrame.select_dtypes method returns a subset of a DataFrame's columns based on the column data types.
To only select the categorical
columns, we set the include argument to "category".
# id name # 0 a Alice # 1 b Bobby # 2 c Carl # 3 d Dan print(df.select_dtypes(include=['category']))
The include argument can be set to a selection of dtypes or strings to be
included.
You can also specify multiple columns in the include list.
print(df.select_dtypes(include=['category', 'object']))
If the columns you're looking for don't get listed, try adding the object type
as shown in the code sample.
There is also an exclude argument that does the opposite.
import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) # id name # 0 a Alice # 1 b Bobby # 2 c Carl # 3 d Dan print(df.select_dtypes(exclude=['number', 'bool_', 'object_']))

If you need to get a list of the categories in a Category column:
cat.categories attribute on the selected column.import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) # Index(['a', 'b', 'c', 'd'], dtype='object') print(df['id'].cat.categories) print('-' * 50) # Index(['Alice', 'Bobby', 'Carl', 'Dan'], dtype='object') print(df['name'].cat.categories)

The cat.categories() method returns the categories of the given categorical column.
The method returns an Index object, so if you want to get the result as a
list, use the tolist() method.
import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) # ['a', 'b', 'c', 'd'] print(df['id'].cat.categories.tolist()) print('-' * 50) # ['Alice', 'Bobby', 'Carl', 'Dan'] print(df['name'].cat.categories.tolist())

The index.tolist() method returns a list of the values in the index.
_get_numeric_data()If your DataFrame doesn't have any numerical columns that are categorical, you
can also get the categorical columns using _get_numeric_data().
import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) numeric_columns = df._get_numeric_data().columns # ๐๏ธ Index(['experience', 'salary'], dtype='object') print(numeric_columns) categorical_columns = list(set(df.columns) - set(numeric_columns)) print(categorical_columns) # ๐๏ธ ['name', 'id']
We used the _get_numeric_data() method to get all numeric columns in the
DataFrame.
The last step is to subtract the numeric columns from all of the DataFrame's columns and convert the result to a list.
We used the set() constructor to convert the index objects to Set objects to be able to use the subtraction (-) operator.
We could've achieved the same result by using the set.difference() method.
import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) numeric_columns = df._get_numeric_data().columns # ๐๏ธ Index(['experience', 'salary'], dtype='object') print(numeric_columns) categorical_columns = list(set(df.columns).difference(numeric_columns)) print(categorical_columns) # ๐๏ธ ['name', 'id']
The
difference()
method returns a new set with elements in the set that are not in the
provided iterable.
In other words, set(list2).difference(list1) returns a new set that contains
the items in list2 that are not in list1.
If you need to check if a specific DataFrame column is categorical:
dtype name of the column by accessing the dtype.name attribute."category".import pandas as pd df = pd.DataFrame({ 'id': pd.Categorical(['a', 'b', 'c', 'd']), 'name': pd.Categorical(['Alice', 'Bobby', 'Carl', 'Dan']), 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) if df['name'].dtype.name == 'category': # ๐๏ธ this runs print('The column is categorical') else: print('The column is NOT categorical')
The dtype
attribute returns a dtype object, so we can't directly compare it to the
string "category".
Instead, we accessed the name attribute on the object to get the data type
name as a string.
The last step is to compare the returned value with the string "category".
I've also written an article on how to change the type of a column to Categorical.
You can learn more about the related topics by checking out the following tutorials: