Pandas: Unalignable boolean Series provided as indexer

# Pandas: Unalignable boolean Series provided as indexer

The Pandas "pandas.errors.IndexingError: Unalignable boolean Series provided as indexer" error occurs when you try to filter a DataFrame by columns without using the loc indexer.

To solve the error, use the loc indexer when filtering by columns.

Here is an example of how the error occurs.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

#     name  salary experience
# 0  Alice   175.1       None
# 1  Bobby     NaN       None
# 2   Carl   190.3       None
# 3    Dan     NaN       None
# 4  Ethan   210.5       None
print(df)

print('-' * 50)

df = df[df.notnull().any(axis=0)]

# ⛔️ pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
print(df)

pandas unalignable boolean series provided as indexer

The code sample tries to remove the columns that have only NaN values from the DataFrame.

However, the df[] syntax is used for a row-based index, not a column-based index.

# Use the `DataFrame.loc` indexer when filtering by columns

To solve the error, use the DataFrame.loc indexer when filtering by columns.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df)

print('-' * 50)

df = df.loc[:, df.notnull().any(axis=0)]

print(df)

use dataframe loc indexer when filtering by columns

The code for this article is available on GitHub

The DataFrame.loc indexer is used to access a group of rows and columns by label(s) or a boolean array.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df.notnull().any(axis=0))

print('-' * 50)

print(df.loc[:, df.notnull().any(axis=0)])

Running the code sample produces the following output.

shell

Copied!
name           True
salary         True
experience    False
dtype: bool
--------------------------------------------------
    name  salary
0  Alice   175.1
1  Bobby     NaN
2   Carl   190.3
3    Dan     NaN
4  Ethan   210.5

access group of rows and columns by boolean array

# Filtering the columns first before using bracket notation

Alternatively, you can solve the error by filtering by columns first and then using bracket notation.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df)

print('-' * 50)

df = df[df.columns[df.notnull().any(axis=0)]]

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  salary experience
0  Alice   175.1       None
1  Bobby     NaN       None
2   Carl   190.3       None
3    Dan     NaN       None
4  Ethan   210.5       None
--------------------------------------------------
    name  salary
0  Alice   175.1
1  Bobby     NaN
2   Carl   190.3
3    Dan     NaN
4  Ethan   210.5

filtering columns first before using bracket notation

The DataFrame.columns property returns an Index object that contains the column names.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df.columns)

print('-' * 50)

print(df.columns[df.notnull().any(axis=0)])

first filtering by columns

The code for this article is available on GitHub

# Solving the error with the `DataFrame.dropna()` method

You can also solve the error by setting the how parameter to "all" when calling DataFrame.dropna().

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df)

print('-' * 50)

df = df.dropna(how='all', axis=1)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  salary experience
0  Alice   175.1       None
1  Bobby     NaN       None
2   Carl   190.3       None
3    Dan     NaN       None
4  Ethan   210.5       None
--------------------------------------------------
    name  salary
0  Alice   175.1
1  Bobby     NaN
2   Carl   190.3
3    Dan     NaN
4  Ethan   210.5

set how parameter to all when calling dropna

The DataFrame.dropna() method removes the missing values from the DataFrame.

main.py

Copied!
df = df.dropna(how='all', axis=1)

We set the axis parameter to 1 so the method drops the columns that contain a missing value.

When the how parameter is set to "all", then all values have to be NA for the method to drop the column.

You can achieve the same result by setting the thresh argument to 1.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df)

print('-' * 50)

df = df.dropna(thresh=1, axis=1)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  salary experience
0  Alice   175.1       None
1  Bobby     NaN       None
2   Carl   190.3       None
3    Dan     NaN       None
4  Ethan   210.5       None
--------------------------------------------------
    name  salary
0  Alice   175.1
1  Bobby     NaN
2   Carl   190.3
3    Dan     NaN
4  Ethan   210.5

setting thresh argument to 1

When the thresh argument is set to 1, the method drops all columns that don't have at least 1 non-NA value.

main.py

Copied!
df = df.dropna(thresh=1, axis=1)

In other words, all columns with all NA values are dropped.

# Forgetting to use the `.str` attribute

You might also get the error when you forget to use the str attribute.

For example, the following code sample causes the error.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df)

print('-' * 50)


new_df = df[(df['name'][0:2] != 'Bo')]

print(new_df)

forgetting to access str attribute

The code for this article is available on GitHub

The code sample attempts to filter out the strings in the name column that start with "Bo".

To solve the error, use the .str attribute after selecting the values in the "name" column.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl', 'Dan', 'Ethan'],
    'salary': [175.1, None, 190.3, None, 210.5],
    'experience': [None, None, None, None, None],
})

print(df)

print('-' * 50)

new_df = df[(df['name'].str[0:2] != 'Bo')]

print(new_df)

Running the code sample produces the following output.

shell

Copied!
    name  salary experience
0  Alice   175.1       None
1  Bobby     NaN       None
2   Carl   190.3       None
3    Dan     NaN       None
4  Ethan   210.5       None
--------------------------------------------------
    name  salary experience
0  Alice   175.1       None
2   Carl   190.3       None
3    Dan     NaN       None
4  Ethan   210.5       None