Last updated: Apr 12, 2024
Reading timeยท4 min

for loopUse the drop_duplicates() method to "select distinct" across multiple
DataFrame columns in Pandas.
The method will return a new DataFrame object with the duplicate rows
removed.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 25, 40, 45, 45, 65] }) # Animal Max Speed # 0 Cat 25 # 2 Cat 40 # 3 Dog 45 # 5 Dog 65 print(df.drop_duplicates())

The code sample shows how to get the unique values across multiple DataFrame
columns.
Note: if you need to get the unique values across multiple columns contained
in a single array, use the pandas.unique() method.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Animal2': ['Cat', 'Dog', 'Lizzard', 'Monkey', 'Lizzard', 'Cat'] }) # ๐๏ธ ['Cat' 'Dog' 'Lizzard' 'Monkey'] print(pd.unique(df[['Animal', 'Animal2']].values.ravel('K')))
The ravel() method returns a multi-dimensional array which we then flatten.
The
DataFrame.drop_duplicates()
method returns a new DataFrame object with the duplicate rows removed.
By default, the method doesn't mutate the DataFrame in place, so make sure to
assign the result of calling it into a variable.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 25, 40, 45, 45, 65] }) print(df) print('-' * 50) df = df.drop_duplicates() print(df)
Running the code sample produces the following output.
Animal Max Speed 0 Cat 25 1 Cat 25 2 Cat 40 3 Dog 45 4 Dog 45 5 Dog 65 -------------------------------------------------- Animal Max Speed 0 Cat 25 2 Cat 40 3 Dog 45 5 Dog 65

Alternatively, you can set the inplace argument to True to mutate the
original DataFrame object.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 25, 40, 45, 45, 65] }) print(df) print('-' * 50) df.drop_duplicates(inplace=True) print(df)
Running the code sample produces the following output.
Animal Max Speed 0 Cat 25 1 Cat 25 2 Cat 40 3 Dog 45 4 Dog 45 5 Dog 65 -------------------------------------------------- Animal Max Speed 0 Cat 25 2 Cat 40 3 Dog 45 5 Dog 65
When the inplace argument is set to True, the DataFrame is modified in
place and None is returned.
By default, the DataFrame.drop_duplicates method considers all of the columns
when identifying duplicates.
If you only want to consider some of the columns, supply the subset argument.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 25, 40, 45, 45, 65] }) print(df) print('-' * 50) df.drop_duplicates(subset=['Animal'], inplace=True) print(df)
Running the code sample produces the following output.
Animal Max Speed 0 Cat 25 1 Cat 25 2 Cat 40 3 Dog 45 4 Dog 45 5 Dog 65 -------------------------------------------------- Animal Max Speed 0 Cat 25 3 Dog 45
We only passed the Animal column to the subset list, so the method will only
consider the specified column for identifying duplicates.
for loopYou can also use a for loop with pandas.unique() to select the distinct
values across multiple DataFrame columns.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 25, 40, 45, 45, 65] }) a_dict = {} for column in df: a_dict[column] = df[column].unique() # {'Animal': array(['Cat', 'Dog'], dtype=object), # 'Max Speed': array([25, 40, 45, 65])} print(a_dict)
We used a for loop to iterate over the DataFrame.
On each iteration, we use the pandas.unique() method to get an array that stores the unique values for the current column.
You should also use the unique() method if you need to find the unique values
in a specific column.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Max Speed': [25, 25, 40, 45, 45, 65] }) print(df['Animal'].unique()) # ['Cat' 'Dog'] print(df['Max Speed'].unique()) # [25 40 45 65]
Note: if you need to get the unique values across multiple columns contained
in a single array, use the pandas.unique() method.
import pandas as pd df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Animal2': ['Cat', 'Dog', 'Lizzard', 'Monkey', 'Lizzard', 'Cat'] }) # ๐๏ธ ['Cat' 'Dog' 'Lizzard' 'Monkey'] print(pd.unique(df[['Animal', 'Animal2']].values.ravel('K')))
You can also use the numpy.unique() method if you need to get the unique
values across multiple columns in a single array.
First, make sure you
have the numpy module installed.
Open your terminal and run the following command.
pip install numpy # or with pip3 pip3 install numpy
Now, import the module and use numpy.unique().
import pandas as pd import numpy as np df = pd.DataFrame({ 'Animal': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog'], 'Animal2': ['Cat', 'Dog', 'Lizzard', 'Monkey', 'Lizzard', 'Cat'] }) # ๐๏ธ ['Cat' 'Dog' 'Lizzard' 'Monkey'] print(np.unique(df[['Animal', 'Animal2']].values))

We used bracket notation to select the Animal and Animal2 columns and passed
the resulting DataFrame to numpy.unique().
The numpy.unique method takes an array-like object and returns the unique elements of the array.
You can learn more about the related topics by checking out the following tutorials: