Last updated: Apr 12, 2024
Reading time·5 min
To create a tuple from two DataFrame
columns in Pandas:
zip()
function to get a zip
object of tuples with the values of
the two columns.zip
object to a list.DataFrame
column.import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) df['stats'] = list(zip(df['salary'], df['experience'])) # first_name salary experience stats # 0 Alice 175.1 10 (175.1, 10) # 1 Bobby 180.2 15 (180.2, 15) # 2 Carl 190.3 20 (190.3, 20) print(df)
The zip() function iterates over several iterables in parallel and produces tuples with an item from each iterable.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [(175.1, 10), (180.2, 15), (190.3, 20)] print(list(zip(df['salary'], df['experience'])))
The first item in each tuple is the salary
value and the second is the
experience
value.
Once we've created the list of tuples, we can add it as a column to the
DataFrame
using bracket notation.
df['stats'] = list(zip(df['salary'], df['experience'])) # first_name salary experience stats # 0 Alice 175.1 10 (175.1, 10) # 1 Bobby 180.2 15 (180.2, 15) # 2 Carl 190.3 20 (190.3, 20) print(df)
The zip()
function returns a zip
object, so make sure to convert the result
to a list by using the list class.
apply()
You can also use the
DataFrame.apply()
method to create a tuple from two DataFrame
columns.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) df['stats'] = df[['salary', 'experience']].apply(tuple, axis=1) # first_name salary experience stats # 0 Alice 175.1 10 (175.1, 10.0) # 1 Bobby 180.2 15 (180.2, 15.0) # 2 Carl 190.3 20 (190.3, 20.0) print(df)
The DataFrame.apply()
method applies a function along an axis of the
DataFrame
.
We set the axis
to 1
, so the given function is applied to each row.
df['stats'] = df[['salary', 'experience']].apply(tuple, axis=1)
We passed the tuple
class as the first argument to apply()
, so the values of
the salary
and experience
columns get converted to a tuple.
Notice that we used two sets of square brackets when accessing the values of the two columns.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # salary experience # 0 175.1 10 # 1 180.2 15 # 2 190.3 20 print(df[['salary', 'experience']])
itertuples()
You can also use the
DataFrame.itertuples()
method to create a tuple from two DataFrame
columns.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) df['stats'] = list( df[['salary', 'experience']].itertuples( index=False, name=None ) ) # first_name salary experience stats # 0 Alice 175.1 10 (175.1, 10) # 1 Bobby 180.2 15 (180.2, 15) # 2 Carl 190.3 20 (190.3, 20) print(df)
The method iterates over the rows of the DataFrame
as named tuples.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [(175.1, 10), (180.2, 15), (190.3, 20)] print( list(df[['salary', 'experience']].itertuples( index=False, name=None )) )
Notice that we set the name
argument to None
.
This is necessary because, by default, the tuples are named.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [Pandas(salary=175.1, experience=10), Pandas(salary=180.2, experience=15), Pandas(salary=190.3, experience=20)] print( list(df[['salary', 'experience']].itertuples( index=False, )) )
We also had to set the index
argument to False
.
If you don't, the index is the first element of each tuple.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [(0, 175.1, 10), (1, 180.2, 15), (2, 190.3, 20)] print( list(df[['salary', 'experience']].itertuples( name=None )) )
If you need to create a list from two DataFrame
columns (instead of a tuple),
you can also use the
DataFrame.to_records()
method.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) df['stats'] = list( df[['salary', 'experience']].to_records(index=False) ) # first_name salary experience stats # 0 Alice 175.1 10 [175.1, 10] # 1 Bobby 180.2 15 [180.2, 15] # 2 Carl 190.3 20 [190.3, 20] print(df)
The DataFrame.to_records()
method converts the DataFrame
to a NumPy record
array.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [(175.1, 10), (180.2, 15), (190.3, 20)] print( list( df[['salary', 'experience']].to_records(index=False) ) )
Notice that we had to set the index
argument to False
.
The argument defaults to True
which means that the index is the first element
of each
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [(0, 175.1, 10), (1, 180.2, 15), (2, 190.3, 20)] print( list( df[['salary', 'experience']].to_records(index=True) ) )
values.tolist()
You can also use the DataFrame.values.tolist()
method to achieve the same
result.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) df['stats'] = df[['salary', 'experience']].values.tolist() # first_name salary experience stats # 0 Alice 175.1 10 [175.1, 10.0] # 1 Bobby 180.2 15 [180.2, 15.0] # 2 Carl 190.3 20 [190.3, 20.0] print(df)
The
DataFrame.values()
method returns a NumPy representation of the DataFrame
.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], 'experience': [10, 15, 20] }) # [[175.1 10. ] # [180.2 15. ] # [190.3 20. ]] print(df[['salary', 'experience']].values)
The last step is to use the
tolist
method to convert the values ndarray
to a list.
You can learn more about the related topics by checking out the following tutorials: