Last updated: Apr 12, 2024
Reading timeยท6 min
DataFrame
To check if all values in a column are equal in Pandas:
to_numpy()
method to convert the column to an array.import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [3, 3, 3, 3], 'salary': [175.1, 180.2, 190.3, 205.4], }) def values_in_column_equal(col): arr = col.to_numpy() return (arr[0] == arr).all() # ๐๏ธ True print(values_in_column_equal(df['experience'])) # ๐๏ธ False print(values_in_column_equal(df['name']))
The
DataFrame.to_numpy()
method converts the DataFrame
to a NumPy array.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [3, 3, 3, 3], 'salary': [175.1, 180.2, 190.3, 205.4], }) # ๐๏ธ [3 3 3 3] print(df['experience'].to_numpy())
We selected the first element in the array (index 0
) and compared it to all
other array elements.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [3, 3, 3, 3], 'salary': [175.1, 180.2, 190.3, 205.4], }) # ๐๏ธ [3 3 3 3] arr = df['experience'].to_numpy() # ๐๏ธ [ True True True True] print(arr[0] == arr) # ๐๏ธ True print((arr[0] == arr).all())
If the condition returns True
for all array elements, then all values in the
column are equal.
When using this approach, make sure to call the function with a DataFrame
column and not with an entire DataFrame
.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [3, 3, 3, 3], 'salary': [175.1, 180.2, 190.3, 205.4], }) def values_in_column_equal(col): arr = col.to_numpy() return (arr[0] == arr).all() # ๐๏ธ True print(values_in_column_equal(df['experience'])) # ๐๏ธ False print(values_in_column_equal(df['name']))
DataFrame
If you need to check if all values in a column are equal for an entire
DataFrame
, set the axis
to 0
when calling the all()
method.
import pandas as pd def values_in_column_equal(df_): arr = df_.to_numpy() return (arr[0] == arr).all(0) df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [3, 3, 3, 3], 'salary': [175.1, 180.2, 190.3, 205.4], }) # ๐๏ธ [False True False] print(values_in_column_equal(df))
As shown in the code sample, only the values in the experience
column are
equal.
If you need to check if all columns of a DataFrame
are equal to a given value,
use the DataFrame.eq()
method.
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], }) value = 1 # a True # b True # dtype: bool print(df.eq(value).all(axis=0))
The
DataFrame.eq()
method returns a DataFrame
of boolean value with the results of the
comparison.
We compared each value to 1
and got True
values for columns a
and b
.
You can use a similar approach to find the rows where all columns are equal.
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], 'c': [1, 2, 3], 'd': [1, 2, 3], }) # ๐๏ธ check all columns against the first column print(df.eq(df.iloc[:, 0], axis=0)) print('-' * 50) print(df.eq(df.iloc[:, 0], axis=0).all(1))
Running the code sample produces the following output.
a b c d 0 True True True True 1 True True False False 2 True True False False -------------------------------------------------- 0 True 1 False 2 False dtype: bool
Once we check all columns against the first column, we can use the all()
method to see if all columns are equal for the specific row.
You could achieve the same result by using the DataFrame.values
attribute.
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], 'c': [1, 2, 3], 'd': [1, 2, 3], }) values = df.values print(values) print('-' * 50) result = (values == values[:, [0]]).all(axis=1) print(result)
Running the code sample produces the following output.
[[1 1 1 1] [1 1 2 2] [1 1 3 3]] -------------------------------------------------- [ True False False]
You can achieve the same result by using the DataFrame.iloc indexer.
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], 'c': [1, 2, 3], 'd': [1, 2, 3], }) df['result'] = (df.iloc[:, :-1] == 1).all(1) # a b c d result # 0 1 1 1 1 True # 1 1 1 2 2 False # 2 1 1 3 3 False print(df)
Only the values in the first row are equal for all columns.
The code sample outputs the results as booleans (True
and False
), however,
you might also want to output the results as integer 1
(for True
) and 0
(for False
).
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], 'c': [1, 2, 3], 'd': [1, 2, 3], }) df['result'] = (df.iloc[:, :-1] == 1).all(1).astype(int) # a b c d result # 0 1 1 1 1 1 # 1 1 1 2 2 0 # 2 1 1 3 3 0 print(df)
Use the
DataFrame.apply()
method if you need to check if specific columns in a Pandas DataFrame
are
equal.
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], 'c': [1, 2, 3], 'd': [1, 2, 3], }) df['result'] = df.apply( lambda x: x['a'] == x['b'], axis=1 ) # a b c d result # 0 1 1 1 1 True # 1 1 1 2 2 True # 2 1 1 3 3 True print(df)
The code sample shows that the a
and b
columns are equal for all 3 rows.
The same approach can be used to check if more than 2 specific columns are equal.
import pandas as pd df = pd.DataFrame({ 'a': [1, 1, 1], 'b': [1, 1, 1], 'c': [1, 2, 3], 'd': [1, 2, 3], }) df['result'] = df.apply( lambda x: x['a'] == x['b'] == x['c'], axis=1 ) # a b c d result # 0 1 1 1 1 True # 1 1 1 2 2 False # 2 1 1 3 3 False print(df)
As shown in the output, the columns a
, b
and c
are equal for the first row
only.
You can learn more about the related topics by checking out the following tutorials:
pd.read_json()