Pandas: Unalignable boolean Series provided as indexer

avatar
Borislav Hadzhiev

Last updated: Apr 13, 2024
4 min

banner

# Pandas: Unalignable boolean Series provided as indexer

The Pandas "pandas.errors.IndexingError: Unalignable boolean Series provided as indexer" error occurs when you try to filter a DataFrame by columns without using the loc indexer.

To solve the error, use the loc indexer when filtering by columns.

Here is an example of how the error occurs.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) # name salary experience # 0 Alice 175.1 None # 1 Bobby NaN None # 2 Carl 190.3 None # 3 Dan NaN None # 4 Ethan 210.5 None print(df) print('-' * 50) df = df[df.notnull().any(axis=0)] # ⛔️ pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match). print(df)

pandas unalignable boolean series provided as indexer

The code sample tries to remove the columns that have only NaN values from the DataFrame.

However, the df[] syntax is used for a row-based index, not a column-based index.

# Use the DataFrame.loc indexer when filtering by columns

To solve the error, use the DataFrame.loc indexer when filtering by columns.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df) print('-' * 50) df = df.loc[:, df.notnull().any(axis=0)] print(df)

use dataframe loc indexer when filtering by columns

The code for this article is available on GitHub

The DataFrame.loc indexer is used to access a group of rows and columns by label(s) or a boolean array.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df.notnull().any(axis=0)) print('-' * 50) print(df.loc[:, df.notnull().any(axis=0)])

Running the code sample produces the following output.

shell
name True salary True experience False dtype: bool -------------------------------------------------- name salary 0 Alice 175.1 1 Bobby NaN 2 Carl 190.3 3 Dan NaN 4 Ethan 210.5

access group of rows and columns by boolean array

# Filtering the columns first before using bracket notation

Alternatively, you can solve the error by filtering by columns first and then using bracket notation.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df) print('-' * 50) df = df[df.columns[df.notnull().any(axis=0)]] print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name salary experience 0 Alice 175.1 None 1 Bobby NaN None 2 Carl 190.3 None 3 Dan NaN None 4 Ethan 210.5 None -------------------------------------------------- name salary 0 Alice 175.1 1 Bobby NaN 2 Carl 190.3 3 Dan NaN 4 Ethan 210.5

filtering columns first before using bracket notation

The DataFrame.columns property returns an Index object that contains the column names.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df.columns) print('-' * 50) print(df.columns[df.notnull().any(axis=0)])

first filtering by columns

The code for this article is available on GitHub

# Solving the error with the DataFrame.dropna() method

You can also solve the error by setting the how parameter to "all" when calling DataFrame.dropna().

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df) print('-' * 50) df = df.dropna(how='all', axis=1) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name salary experience 0 Alice 175.1 None 1 Bobby NaN None 2 Carl 190.3 None 3 Dan NaN None 4 Ethan 210.5 None -------------------------------------------------- name salary 0 Alice 175.1 1 Bobby NaN 2 Carl 190.3 3 Dan NaN 4 Ethan 210.5

set how parameter to all when calling dropna

The DataFrame.dropna() method removes the missing values from the DataFrame.

main.py
df = df.dropna(how='all', axis=1)

We set the axis parameter to 1 so the method drops the columns that contain a missing value.

When the how parameter is set to "all", then all values have to be NA for the method to drop the column.

You can achieve the same result by setting the thresh argument to 1.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df) print('-' * 50) df = df.dropna(thresh=1, axis=1) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name salary experience 0 Alice 175.1 None 1 Bobby NaN None 2 Carl 190.3 None 3 Dan NaN None 4 Ethan 210.5 None -------------------------------------------------- name salary 0 Alice 175.1 1 Bobby NaN 2 Carl 190.3 3 Dan NaN 4 Ethan 210.5

setting thresh argument to 1

When the thresh argument is set to 1, the method drops all columns that don't have at least 1 non-NA value.

main.py
df = df.dropna(thresh=1, axis=1)

In other words, all columns with all NA values are dropped.

# Forgetting to use the .str attribute

You might also get the error when you forget to use the str attribute.

For example, the following code sample causes the error.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df) print('-' * 50) new_df = df[(df['name'][0:2] != 'Bo')] print(new_df)

forgetting to access str attribute

The code for this article is available on GitHub

The code sample attempts to filter out the strings in the name column that start with "Bo".

To solve the error, use the .str attribute after selecting the values in the "name" column.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'], 'salary': [175.1, None, 190.3, None, 210.5], 'experience': [None, None, None, None, None], }) print(df) print('-' * 50) new_df = df[(df['name'].str[0:2] != 'Bo')] print(new_df)

Running the code sample produces the following output.

shell
name salary experience 0 Alice 175.1 None 1 Bobby NaN None 2 Carl 190.3 None 3 Dan NaN None 4 Ethan 210.5 None -------------------------------------------------- name salary experience 0 Alice 175.1 None 2 Carl 190.3 None 3 Dan NaN None 4 Ethan 210.5 None

access str attribute to solve the error

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.