Replace whole String if it contains Substring in Pandas

avatar
Borislav Hadzhiev

Last updated: Jul 8, 2023
5 min

banner

# Table of Contents

  1. Replace whole String if it contains Substring in Pandas
  2. Replace whole String if it contains Substring in Pandas ignoring the case
  3. Replace whole string if it contains substring in Pandas using Regex
  4. Replace whole string if it contains substring in Pandas using apply()

# Replace whole String if it contains Substring in Pandas

To replace whole strings if they contain a substring in Pandas:

  1. Use the str.contains() method to check if each string contains a substring.
  2. Assign a new value to the matching strings.
main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['dev', 'web dev', 'accountant', 'dev'] }) print(df) df.loc[df['job'].str.contains('dev'), 'job'] = 'developer' print('-' * 50) print(df)

Running the code sample produces the following output.

shell
name job 0 Alice dev 1 Bobby web dev 2 Carl accountant 3 Dan dev -------------------------------------------------- name job 0 Alice developer 1 Bobby developer 2 Carl accountant 3 Dan developer

replace whole string if it contains substring in pandas

The DataFrame.loc indexer enables us to access a group of rows and columns by label(s) or a boolean array.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['dev', 'web dev', 'accountant', 'dev'] }) # name job # 0 Alice dev # 1 Bobby web dev # 2 Carl accountant # 3 Dan dev print(df) print('-' * 50) # 0 dev # 1 web dev # 3 dev # Name: job, dtype: object print(df.loc[df['job'].str.contains('dev'), 'job'])

using-loc-indexer-to-access-group-of-rows-cols-by-label

Once we have the group of rows that contain the string "dev" in the job column, we update their values to "developer".

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['dev', 'web dev', 'accountant', 'dev'] }) # 0 Alice dev # 1 Bobby web dev # 2 Carl accountant # 3 Dan dev print(df) df.loc[df['job'].str.contains('dev'), 'job'] = 'developer' print('-' * 50) # name job # 0 Alice developer # 1 Bobby developer # 2 Carl accountant # 3 Dan developer print(df)

You can see that the rows with values dev and web dev got updated to developer.

The str.contains method takes a pattern as a parameter and checks if the supplied pattern or regex is contained within a string.

The method returns a boolean Series or Index indicating the result.

# Replace whole String if it contains Substring in Pandas ignoring the case

If you need to replace a whole string if it contains a substring in pandas, ignoring the case, set the case argument to False when calling str.contains().

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['DEV', 'web Dev', 'accountant', 'dev'] }) print(df) df.loc[df['job'].str.contains('dev', case=False), 'job'] = 'developer' print('-' * 50) # name job # 0 Alice developer # 1 Bobby developer # 2 Carl accountant # 3 Dan developer print(df)

replace whole string if it contains substring ignoring case

Notice that the values in the job column are not consistently cased.

The str.contains method takes a case argument that can be used to make the method case-insensitive.

By default, the case argument is set to True, which means that str.contains() is case-sensitive.

Setting the argument to False means that the case is ignored when matching the substring in each string.

# Replace whole string if it contains substring in Pandas using Regex

You can also use a regular expression to replace a whole string if it contains a substring in pandas.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['dev', 'web dev', 'accountant', 'dev'] }) print(df) print('-' * 50) df['job'] = df.job.str.replace( r'(^.*dev.*$)', 'developer', regex=True ) print(df)

Running the code sample produces the following output.

shell
name job 0 Alice dev 1 Bobby web dev 2 Carl accountant 3 Dan dev -------------------------------------------------- name job 0 Alice developer 1 Bobby developer 2 Carl accountant 3 Dan developer

replace whole string if it contains substring using regex

We passed a regular expression as the first parameter to the str.replace() method.

main.py
df['job'] = df.job.str.replace( r'(^.*dev.*$)', 'developer', regex=True )

The regular expression matches a string that contains the substring dev.

The replacement string is provided as the second argument.

Notice that we also had to set the regex keyword argument to True.

The regex argument determines if the supplied pattern is a regular expression.

By default, the argument is set to False, so it has to be explicitly specified when using a regex with str.replace().

If you want to ignore the case when matching the substring in the strings, set the flags keyword argument to re.IGNORECASE

main.py
import re import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['DEV', 'web Dev', 'accountant', 'dev'] }) print(df) print('-' * 50) df['job'] = df.job.str.replace( r'(^.*dev.*$)', 'developer', regex=True, flags=re.IGNORECASE ) print(df)

Running the code sample produces the following output.

shell
name job 0 Alice DEV 1 Bobby web Dev 2 Carl accountant 3 Dan dev -------------------------------------------------- name job 0 Alice developer 1 Bobby developer 2 Carl accountant 3 Dan developer

When the flags argument is set to re.IGNORECASE, the substring is matched in the string in a case-insensitive manner.

# Replace whole string if it contains substring in Pandas using apply()

You can also use the DataFrame.apply() method to replace a whole string if it contains a substring in Pandas.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['dev', 'web dev', 'accountant', 'dev'] }) print(df) print('-' * 50) df['job'] = df.job.apply( lambda x: 'developer' if 'dev' in x else x ) print(df)

Running the code sample produces the following output.

shell
name job 0 Alice dev 1 Bobby web dev 2 Carl accountant 3 Dan dev -------------------------------------------------- name job 0 Alice developer 1 Bobby developer 2 Carl accountant 3 Dan developer

replace whole string if it contains substring using apply

The DataFrame.apply method applies a function along an axis of the DataFrame.

main.py
df['job'] = df.job.apply( lambda x: 'developer' if 'dev' in x else x )

Our lambda function returns the string developer if the substring dev is contained in the current value, otherwise, the current value is returned.

If you need to ignore the case when matching the substring in the string, use the str.lower method.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'job': ['DEV', 'web Dev', 'accountant', 'dev'] }) print(df) print('-' * 50) df['job'] = df.job.apply( lambda x: 'developer' if 'dev'.lower() in x.lower() else x ) print(df)

Running the code sample produces the following output.

shell
name job 0 Alice DEV 1 Bobby web Dev 2 Carl accountant 3 Dan dev -------------------------------------------------- name job 0 Alice developer 1 Bobby developer 2 Carl accountant 3 Dan developer

Converting the substring we're checking for and the current string to lowercase enables us to perform a case-insensitive test whether the substring is contained in the string.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.