Pandas: Drop columns if Name contains a given String

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Pandas: Drop columns if Name contains a given String
  2. Pandas: Drop columns if Name contains one of multiple Strings
  3. Drop columns if Name contains a given string using str.contains()
  4. Drop columns if Name contains a given string in a case-insensitive manner
  5. Keeping only the columns whose name contains a given string

# Pandas: Drop columns if Name contains a given String

To drop the columns in a DataFrame whose name contains a given string:

  1. Call the drop() method on the DataFrame.
  2. Filter the columns by name using the regex parameter.
  3. Drop the columns in place.
main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'last_name': ['Smith', 'Hadz', 'Lemon'], 'salary': [175.1, 180.2, 190.3], 'experience': [5, 10, 15] }) print(df) print('-' * 50) df.drop(list(df.filter(regex='name')), axis=1, inplace=True) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first_name last_name salary experience 0 Alice Smith 175.1 5 1 Bobby Hadz 180.2 10 2 Carl Lemon 190.3 15 -------------------------------------------------- salary experience 0 175.1 5 1 180.2 10 2 190.3 15

pandas drop columns if name contains string

We used the DataFrame.filter method to select all DataFrame columns that contain the string "name".

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'last_name': ['Smith', 'Hadz', 'Lemon'], 'salary': [175.1, 180.2, 190.3], 'experience': [5, 10, 15] }) # first_name last_name # 0 Alice Smith # 1 Bobby Hadz # 2 Carl Lemon print(df.filter(regex='name')) # ['first_name', 'last_name'] print(list(df.filter(regex='name')))
The code for this article is available on GitHub

Once you select the matching DataFrame columns, you can get their names by using the list class.

The DataFrame.drop() method is then used to drop the columns based on their labels.

main.py
df.drop(list(df.filter(regex='name')), axis=1, inplace=True)
The axis argument is set to 1 so that the drop() method drops labels from the columns.

When the inplace argument is set to True, the columns are dropped from the original DataFrame in place and None is returned.

# Pandas: Drop columns if Name contains one of multiple Strings

The same approach can be used to drop the columns of a DataFrame whose name contains at least one of multiple strings.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'last_name': ['Smith', 'Hadz', 'Lemon'], 'salary': [175.1, 180.2, 190.3], 'experience': [5, 10, 15] }) print(df) print('-' * 50) df.drop(list(df.filter(regex='name|salary')), axis=1, inplace=True) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first_name last_name salary experience 0 Alice Smith 175.1 5 1 Bobby Hadz 180.2 10 2 Carl Lemon 190.3 15 -------------------------------------------------- experience 0 5 1 10 2 15

drop dataframe column if name contains one of multiple strings

Notice that we used the pipe | character in the regular expression.

The example drops the DataFrame columns whose name contains the strings "name" or "salary".

main.py
df.drop(list(df.filter(regex='name|salary')), axis=1, inplace=True)

The format for the regular expression is "A|B|C" where A or B or C is matched.

In other words, the column is removed if it contains "A" or "B" or "C".

You can use the pipe | character to separate as many strings as necessary.

# Drop columns if Name contains a given string using str.contains()

You can also use the str.contains method to drop the columns whose name contains a given string.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'last_name': ['Smith', 'Hadz', 'Lemon'], 'salary': [175.1, 180.2, 190.3], 'experience': [5, 10, 15] }) print(df) bool_list = df.columns.str.contains('name') print('-' * 50) print(bool_list) df = df.loc[:, ~df.columns.str.contains('name')] print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first_name last_name salary experience 0 Alice Smith 175.1 5 1 Bobby Hadz 180.2 10 2 Carl Lemon 190.3 15 -------------------------------------------------- [ True True False False] -------------------------------------------------- salary experience 0 175.1 5 1 180.2 10 2 190.3 15

drop columns whose name contains string using contains

We used the df.loc indexer to access the group of columns by a boolean list.

main.py
# 👇️ [ True True False False] bool_list = df.columns.str.contains('name') df = df.loc[:, ~df.columns.str.contains('name')]

The boolean list stores a True value for each column whose name contains the given string.

The str.contains() method tests if a pattern or a regular expression is contained within a string.

# Drop columns if Name contains a given string in a case-insensitive manner

If you need to drop the columns whose name contains a given string in a case-insensitive manner, set the case argument to False.

main.py
import pandas as pd df = pd.DataFrame({ 'FIRST_NAME': ['Alice', 'Bobby', 'Carl'], 'last_name': ['Smith', 'Hadz', 'Lemon'], 'salary': [175.1, 180.2, 190.3], 'experience': [5, 10, 15] }) print(df) bool_list = df.columns.str.contains('name', case=False) print('-' * 50) print(bool_list) df = df.loc[:, ~df.columns.str.contains('name', case=False)] print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
FIRST_NAME last_name salary experience 0 Alice Smith 175.1 5 1 Bobby Hadz 180.2 10 2 Carl Lemon 190.3 15 -------------------------------------------------- [ True True False False] -------------------------------------------------- salary experience 0 175.1 5 1 180.2 10 2 190.3 15

drop columns containing string case insensitive

Notice that we set the case argument to False when calling str.contains().

main.py
bool_list = df.columns.str.contains('name', case=False) df = df.loc[:, ~df.columns.str.contains('name', case=False)]

When the case argument is set to False, the method checks if the pattern is contained within the string in a case-insensitive manner.

By default, the case argument is set to True.

# Keeping only the columns whose name contains a given string

If you only want to keep the DataFrame columns whose name contains a given string, use the DataFrame.filter() method.

main.py
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'last_name': ['Smith', 'Hadz', 'Lemon'], 'salary': [175.1, 180.2, 190.3], 'experience': [5, 10, 15] }) print(df) df = df.filter(like='name', axis=1) print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first_name last_name salary experience 0 Alice Smith 175.1 5 1 Bobby Hadz 180.2 10 2 Carl Lemon 190.3 15 -------------------------------------------------- first_name last_name 0 Alice Smith 1 Bobby Hadz 2 Carl Lemon

keeping only dataframe columns whose name contains given string

We set the like argument to a string that we want to check for.

main.py
df = df.filter(like='name', axis=1)

The method call keeps the labels from the column axis for which the column name contains the given string (like in label == True).

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.