Last updated: Apr 12, 2024
Reading time·5 min

The Pandas "ValueError: Cannot mask with non-boolean array containing NA / NaN
values" occurs when you use the str.contains method on a column that contains
NA/NaN values or contains non-string values.
To solve the error, set the na argument to False when calling
str.contains.
Here is an example of how the error occurs.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) result = df[df['first_name'].str.contains('Bob')] # ⛔️ ValueError: Cannot mask with non-boolean array containing NA / NaN values print(result)

Notice that the first_name column contains a None value.
If you print the output of calling
Series.str.contains(),
you will see a None value.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) # 0 False # 1 True # 2 False # 3 None # Name: first_name, dtype: object print(df['first_name'].str.contains('Bob'))
This is why Pandas won't allow us to mask with a non-boolean array (one containing NA / NaN values).
na argument to False when calling str.containsOne way to solve the error is to set the na argument to False when calling
the Series.str.contains() method.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) result = df[df['first_name'].str.contains('Bob', na=False)] # first_name salary experience # 1 Bobby 180.2 15 print(result)

The Series.str.contains() method tests if a pattern or a regex is contained
within a string of a Series or Index.
numpy.nan and string-dtype values with pandas.NA.We set the na argument to False to not have missing values in the produced
boolean Series.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) # 0 False # 1 True # 2 False # 3 False # Name: first_name, dtype: bool print(df['first_name'].str.contains('Bob', na=False))
Now all elements in the Series are booleans, so we can safely use it to filter
the DataFrame.
We could've also used the DataFrame.loc label indexer to achieve the same result.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) result = df.loc[df['first_name'].str.contains('Bob', na=False)] # first_name salary experience # 1 Bobby 180.2 15 print(result)

Make sure to set the na argument to False when calling
Series.str.contains().
This probably should've been the default value for the na argument, however,
it is not.
You can also use the equality operator to solve the error.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) result = df[df['first_name'].str.contains('Bob') == True] # first_name salary experience # 1 Bobby 180.2 15 print(result)

By comparing the results of calling the Series.str.contains() method to the
True boolean value, we only get boolean results.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) # 0 False # 1 True # 2 False # 3 False # Name: first_name, dtype: bool print(df['first_name'].str.contains('Bob') == True)
The equality comparison only returns True for True values.
If the value is False, NA or NaN, then False is returned, so the Series
only contains booleans.
The error also occurs if the column on which you used Series.str.contains()
has non-string values or contains values of multiple types.
Here is an example.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 1000], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) # ⛔️ ValueError: Cannot mask with non-boolean array containing NA / NaN values result = df[df['first_name'].str.contains('Bob')]

Notice that the first_name column has string and integer values.
You can use the
DataFrame.astype()
method to convert the column values to strings before calling str.contains()
to solve the error.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 1000], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) result = df[df['first_name'].astype(str).str.contains('Bob')] # first_name salary experience # 1 Bobby 180.2 15 print(result)
Alternatively, you can simply convert the values in the column to strings.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 1000], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) df['first_name'] = df['first_name'].astype(str) result = df[df['first_name'].str.contains('Bob')] # first_name salary experience # 1 Bobby 180.2 15 print(result)
Once all values in the column are strings (and it doesn't contain any NA/NaN values), the error will be resolved.
DataFrame.fillna() method to solve the errorYou can also use the DataFrame.fillna() method to solve the error.
The method fills the NA/NaN values in the DataFrame.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) result = df[df['first_name'].str.contains('Bob').fillna(False)] # first_name salary experience # 1 Bobby 180.2 15 print(result)

We used the method to fill na NA/NaN values with False values.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) # 0 False # 1 True # 2 False # 3 False # Name: first_name, dtype: bool print(df['first_name'].str.contains('Bob').fillna(False))
The returned boolean Series no longer contains any NA/NaN values, so we can
safely filter the DataFrame.
DataFrame.dropna() method to solve the errorYou can also use the DataFrame.dropna() method to solve the error.
The method removes any missing values from the DataFrame.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', None], 'salary': [175.1, 180.2, 190.3, 205.3], 'experience': [10, 15, 20, 25] }) df.dropna(inplace=True) # first_name salary experience # 0 Alice 175.1 10 # 1 Bobby 180.2 15 # 2 Carl 190.3 20 print(df) print('-' * 50) result = df[df['first_name'].str.contains('Bob')] # first_name salary experience # 1 Bobby 180.2 15 print(result)

Make sure to only use this approach if you want to remove all rows that contain
missing values from the DataFrame.
As shown in the code sample, the DataFrame only contains 3 rows after calling
DataFrame.dropna().
You can learn more about the related topics by checking out the following tutorials: