Pandas: Create new row for each element in List in DataFrame

# Table of Contents

# Pandas: Create new row for each element in List in DataFrame

Use the DataFrame.explode() method to create a new row for each element in a List in a DataFrame.

The method transforms each element of the specified list to a row and replicates the index values.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'],
    'second': [1, 2, 3, 4]
})

print(df)

df = df.explode('first')

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    first  second
0  [a, b]       1
1  [c, d]       2
2  [e, f]       3
3       g       4
--------------------------------------------------
  first  second
0     a       1
0     b       1
1     c       2
1     d       2
2     e       3
2     f       3
3     g       4

create new row for each element in list in dataframe

The DataFrame.explode method transforms each element of a list-like object to a row and replicates the index values.

The only argument we passed to the index is the column to explode.

If you need to explode multiple columns, set the argument to a list of strings.

Notice that the index values are replicated in the output.

shell

Copied!
  first  second
0     a       1
0     b       1
1     c       2
1     d       2
2     e       3
2     f       3
3     g       4

You can change the default behavior by setting the ignore_index argument to True.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'],
    'second': [1, 2, 3, 4]
})

print(df)

df = df.explode('first', ignore_index=True)

print('-' * 50)

print(df)

The code for this article is available on GitHub

Here is the output of running the script with python main.py.

shell

Copied!
    first  second
0  [a, b]       1
1  [c, d]       2
2  [e, f]       3
3       g       4
--------------------------------------------------
  first  second
0     a       1
1     b       1
2     c       2
3     d       2
4     e       3
5     f       3
6     g       4

set ignore index to true

Alternatively, you can use the reset_index() method.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'],
    'second': [1, 2, 3, 4]
})

print(df)

df = df.explode('first').reset_index(drop=True)

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    first  second
0  [a, b]       1
1  [c, d]       2
2  [e, f]       3
3       g       4
--------------------------------------------------
  first  second
0     a       1
1     b       1
2     c       2
3     d       2
4     e       3
5     f       3
6     g       4

The DataFrame.reset_index() method resets the index of the DataFrame, causing it to use the default index.

# Creating new rows by exploding only the specific column

You can also create new rows by exploding only the specific column in the DataFrame.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'],
    'second': [1, 2, 3, 4]
})

print(df)

print('-' * 50)

print(df['first'].explode())

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    first  second
0  [a, b]       1
1  [c, d]       2
2  [e, f]       3
3       g       4
--------------------------------------------------
0    a
0    b
1    c
1    d
2    e
2    f
3    g
Name: first, dtype: object

creating new rows for each element of column

3 things to note when using the DataFrame.explode() method:

It replaces empty lists with numpy.nan.
It preserves scalar entries.
The dtype of the resulting DataFrame or Series is always object.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'first': [['a', 'b'], ['c', 'd'], [], 'e'],
    'second': [1, 2, 3, 4]
})

print(df)

df = df.explode('first')

print('-' * 50)

print(df)

print('-' * 50)

print(df.dtypes)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    first  second
0  [a, b]       1
1  [c, d]       2
2      []       3
3       e       4
--------------------------------------------------
  first  second
0     a       1
0     b       1
1     c       2
1     d       2
2   NaN       3
3     e       4
--------------------------------------------------
first     object
second     int64
dtype: object

# Expanding comma-separated strings in a column in Pandas

The DataFrame.explode() method is commonly used to expand comma-separated strings in a column.

Here is an example.

main.py

Copied!
import pandas as pd

df = pd.DataFrame([
    {
        'one': 'a,b',
        'two': 1
    },
    {
        'one': 'c,d',
        'two': 2
    }
])

print(df)

df = df.assign(one=df.one.str.split(',')).explode('one')

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
   one  two
0  a,b    1
1  c,d    2
--------------------------------------------------
  one  two
0   a    1
0   b    1
1   c    2
1   d    2

We used the str.split() method to split the values in the one column into a list on each comma.

The DataFrame.assign method assigns new columns to a DataFrame.

The last step is to call the explode() method on the result.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Pandas: Create new row for each element in List in DataFrame

# Table of Contents