Last updated: Apr 12, 2024
Reading timeยท4 min
To create a Set
from a Series
in Pandas:
Series.unique()
method if you need to get an array containing the
unique values in the Series
.set()
class if you need to convert the Series
to a set
object.import pandas as pd s = pd.Series([1, 2, 3, 3, 1, 4, 5, 5]) unique = s.unique() print(unique) # ๐๏ธ [1 2 3 4 5] # ๐๏ธ <class 'numpy.ndarray'> print(type(unique)) a_set = set(unique) print(a_set) # ๐๏ธ {1, 2, 3, 4, 5}
The
Series.unique()
method returns the unique values contained in a Series
object.
import pandas as pd s = pd.Series([1, 2, 3, 3, 1, 4, 5, 5]) unique = s.unique() print(unique) # ๐๏ธ [1 2 3 4 5]
The unique()
method returns the unique values as a NumPy array.
If you need to get the result as a set
, you can use the set()
constructor
instead.
import pandas as pd s = pd.Series([1, 2, 3, 3, 1, 4, 5, 5]) a_set = set(s) print(a_set) # ๐๏ธ {1, 2, 3, 4, 5} print(type(a_set)) # ๐๏ธ <class 'set'>
Set objects are an unordered, unique collection of elements.
If you need to convert a Series
in a DataFrame
to a Set
, access it before
using the unique()
method or set()
constructor.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [100, 100, 100, 200] }) unique = df['salary'].unique() print(unique) # ๐๏ธ [100 200] a_set = set(df['salary']) print(a_set) # ๐๏ธ {200, 100}
We used bracket notation []
to access the Series
before calling the
unique()
method.
If you need to get an array containing the unique values in the Series
, the
unique()
method will suffice.
If you need to get a set
object, use the set()
constructor.
set
are not ordered.unique()
to the set()
constructorIf you work with large Series
objects, it is faster to:
unique()
method to remove the duplicates from the Series
.Series
of unique elements to the set()
constructor.import pandas as pd s = pd.Series([1, 2, 3, 3, 1, 4, 5, 5]) a_set = set(s.unique()) print(a_set) # ๐๏ธ {1, 2, 3, 4, 5}
We first remove the duplicates from the Series
using unique()
and pass the
Series
of unique values to the set()
.
This will be more performant for large Series
objects.
for
loopYou can also use a basic for loop to create a
set
from a Series
in Pandas.
import pandas as pd s = pd.Series([1, 2, 3, 3, 1, 4, 5, 5]) a_set = set() for element in s.unique(): a_set.add(element) print(a_set) # ๐๏ธ {1, 2, 3, 4, 5}
We used a for
loop to iterate over the unique values in the Series
and used
the set.add()
method to add each element to the set
.
You don't necessarily have to call the unique()
method to achieve the same
result.
import pandas as pd s = pd.Series([1, 2, 3, 3, 1, 4, 5, 5]) a_set = set() for element in s: a_set.add(element) print(a_set) # ๐๏ธ {1, 2, 3, 4, 5}
The code sample achieves the same result because set
objects only store unique
elements, so no duplicates can get added to the set
.
In other words, adding a duplicate value to a set
is a no-op (no operation).
You can learn more about the related topics by checking out the following tutorials:
pd.read_json()