Last updated: Apr 12, 2024
Reading time·4 min
To select the rows where two columns are equal in a Pandas DataFrame
:
DataFrame.loc
indexer for indexing based on a boolean array.A
vs column
B
.import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(df) cols_equal = df.loc[(df['B'] == df['C'])] print('-' * 50) print(cols_equal)
Running the code sample produces the following output.
A B C 0 Alice 10 10 1 Bobby 20 15 2 Carl 30 30 3 Dan 50 25 -------------------------------------------------- A B C 0 Alice 10 10 2 Carl 30 30
The condition returns a Series
with True
values for all matching elements
and False
for all elements that don't match.
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) # 0 True # 1 False # 2 True # 3 False # dtype: bool print(df['B'] == df['C'])
If the corresponding rows in columns B
and C
are equal, then the index of
the Series
is set to True
.
The last step is to use the
DataFrame.loc indexer to
access the group for rows for which the condition returns True
.
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) cols_equal = df.loc[(df['B'] == df['C'])] # A B C # 0 Alice 10 10 # 2 Carl 30 30 print(cols_equal)
If you need to select the rows where two columns are not equal, use the
inequality !=
operator.
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(df) cols_not_equal = df.loc[(df['B'] != df['C'])] print('-' * 50) # A B C # 1 Bobby 20 15 # 3 Dan 50 25 print(cols_not_equal)
The example selects the rows where the cell value in column B
is not equal to
the one in column C
.
Notice that we used the inequality operator !=
and not the equality ==
operator.
You can achieve the same result by using
the tilde ~
operator
to negate the result of the comparison.
The following code sample also selects the rows where two columns are NOT equal.
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(df) cols_not_equal = df.loc[~(df['B'] == df['C'])] print('-' * 50) # A B C # 1 Bobby 20 15 # 3 Dan 50 25 print(cols_not_equal)
The tilde ~
operator is used to toggle each boolean value (True
becomes
False
and vice versa).
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(~(df['B'] == df['C'])) print('-' * 50) print(df['B'] == df['C'])
Running the code sample produces the following output.
0 False 1 True 2 False 3 True dtype: bool -------------------------------------------------- 0 True 1 False 2 True 3 False dtype: bool
When using the tilde ~
operator, you have to make sure to wrap the condition
in parentheses so that it is applied to the entire expression.
print(~(df['B'] == df['C']))
df.query()
You can also use the DataFrame.query() method to select the rows where two columns are equal.
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) cols_equal = df.query('B == C') # A B C # 0 Alice 10 10 # 2 Carl 30 30 print(cols_equal)
The
DataFrame.query()
method enables us to query the columns of the DataFrame
with a boolean
condition.
The code sample compares the values in the B
and C
columns and returns a new
DataFrame
that only contains the matching rows.
df.query()
You can also use the DataFrame.query()
method to select the rows where two
columns are NOT equal.
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) cols_equal = df.query('B != C') # A B C # 1 Bobby 20 15 # 3 Dan 50 25 print(cols_equal)
We used the inequality operator !=
to select the rows where the cell values in
column B
are not equal to those in column C
.
You can learn more about the related topics by checking out the following tutorials: