Pandas: Select the Rows where two Columns are Equal

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. Pandas: Select the Rows where two Columns are Equal
  2. Pandas: Select the Rows where two Columns are NOT Equal
  3. Pandas: Select the Rows where two Columns are Equal using df.query()
  4. Pandas: Select the Rows where two Columns are NOT Equal using df.query()

# Pandas: Select the Rows where two Columns are Equal

To select the rows where two columns are equal in a Pandas DataFrame:

  1. Use the DataFrame.loc indexer for indexing based on a boolean array.
  2. Specify a condition that compares the cell values of column A vs column B.
main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(df) cols_equal = df.loc[(df['B'] == df['C'])] print('-' * 50) print(cols_equal)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
A B C 0 Alice 10 10 1 Bobby 20 15 2 Carl 30 30 3 Dan 50 25 -------------------------------------------------- A B C 0 Alice 10 10 2 Carl 30 30

select rows where two columns are equal in pandas

The condition returns a Series with True values for all matching elements and False for all elements that don't match.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) # 0 True # 1 False # 2 True # 3 False # dtype: bool print(df['B'] == df['C'])
The code for this article is available on GitHub

If the corresponding rows in columns B and C are equal, then the index of the Series is set to True.

The last step is to use the DataFrame.loc indexer to access the group for rows for which the condition returns True.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) cols_equal = df.loc[(df['B'] == df['C'])] # A B C # 0 Alice 10 10 # 2 Carl 30 30 print(cols_equal)

# Pandas: Select the Rows where two Columns are NOT Equal

If you need to select the rows where two columns are not equal, use the inequality != operator.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(df) cols_not_equal = df.loc[(df['B'] != df['C'])] print('-' * 50) # A B C # 1 Bobby 20 15 # 3 Dan 50 25 print(cols_not_equal)

select rows where two columns are not equal in pandas dataframe

The code for this article is available on GitHub

The example selects the rows where the cell value in column B is not equal to the one in column C.

Notice that we used the inequality operator != and not the equality == operator.

You can achieve the same result by using the tilde ~ operator to negate the result of the comparison.

The following code sample also selects the rows where two columns are NOT equal.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(df) cols_not_equal = df.loc[~(df['B'] == df['C'])] print('-' * 50) # A B C # 1 Bobby 20 15 # 3 Dan 50 25 print(cols_not_equal)

select rows where two columns are not equal using tilde operator

The code for this article is available on GitHub

The tilde ~ operator is used to toggle each boolean value (True becomes False and vice versa).

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) print(~(df['B'] == df['C'])) print('-' * 50) print(df['B'] == df['C'])

Running the code sample produces the following output.

shell
0 False 1 True 2 False 3 True dtype: bool -------------------------------------------------- 0 True 1 False 2 True 3 False dtype: bool

When using the tilde ~ operator, you have to make sure to wrap the condition in parentheses so that it is applied to the entire expression.

main.py
print(~(df['B'] == df['C']))

# Pandas: Select the Rows where two Columns are Equal using df.query()

You can also use the DataFrame.query() method to select the rows where two columns are equal.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) cols_equal = df.query('B == C') # A B C # 0 Alice 10 10 # 2 Carl 30 30 print(cols_equal)

select rows where two columns are equal using query

The code for this article is available on GitHub

The DataFrame.query() method enables us to query the columns of the DataFrame with a boolean condition.

The code sample compares the values in the B and C columns and returns a new DataFrame that only contains the matching rows.

# Pandas: Select the Rows where two Columns are NOT Equal using df.query()

You can also use the DataFrame.query() method to select the rows where two columns are NOT equal.

main.py
import pandas as pd df = pd.DataFrame({ 'A': ['Alice', 'Bobby', 'Carl', 'Dan'], 'B': [10, 20, 30, 50], 'C': [10, 15, 30, 25], }) cols_equal = df.query('B != C') # A B C # 1 Bobby 20 15 # 3 Dan 50 25 print(cols_equal)

select rows where two columns are not equal using query

The code for this article is available on GitHub

We used the inequality operator != to select the rows where the cell values in column B are not equal to those in column C.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.