Pandas: Create new row for each element in List in DataFrame

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
4 min

banner

# Table of Contents

  1. Pandas: Create new row for each element in List in DataFrame
  2. Creating new rows by exploding only the specific column
  3. Expanding comma-separated strings in a column in Pandas

# Pandas: Create new row for each element in List in DataFrame

Use the DataFrame.explode() method to create a new row for each element in a List in a DataFrame.

The method transforms each element of the specified list to a row and replicates the index values.

main.py
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first') print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- first second 0 a 1 0 b 1 1 c 2 1 d 2 2 e 3 2 f 3 3 g 4

create new row for each element in list in dataframe

The DataFrame.explode method transforms each element of a list-like object to a row and replicates the index values.

The only argument we passed to the index is the column to explode.

If you need to explode multiple columns, set the argument to a list of strings.

Notice that the index values are replicated in the output.

shell
first second 0 a 1 0 b 1 1 c 2 1 d 2 2 e 3 2 f 3 3 g 4

You can change the default behavior by setting the ignore_index argument to True.

main.py
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first', ignore_index=True) print('-' * 50) print(df)
The code for this article is available on GitHub

Here is the output of running the script with python main.py.

shell
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- first second 0 a 1 1 b 1 2 c 2 3 d 2 4 e 3 5 f 3 6 g 4

set ignore index to true

Alternatively, you can use the reset_index() method.

main.py
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first').reset_index(drop=True) print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- first second 0 a 1 1 b 1 2 c 2 3 d 2 4 e 3 5 f 3 6 g 4

The DataFrame.reset_index() method resets the index of the DataFrame, causing it to use the default index.

# Creating new rows by exploding only the specific column

You can also create new rows by exploding only the specific column in the DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) print('-' * 50) print(df['first'].explode())
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- 0 a 0 b 1 c 1 d 2 e 2 f 3 g Name: first, dtype: object

creating new rows for each element of column

3 things to note when using the DataFrame.explode() method:

  1. It replaces empty lists with numpy.nan.
  2. It preserves scalar entries.
  3. The dtype of the resulting DataFrame or Series is always object.
main.py
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], [], 'e'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first') print('-' * 50) print(df) print('-' * 50) print(df.dtypes)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
first second 0 [a, b] 1 1 [c, d] 2 2 [] 3 3 e 4 -------------------------------------------------- first second 0 a 1 0 b 1 1 c 2 1 d 2 2 NaN 3 3 e 4 -------------------------------------------------- first object second int64 dtype: object

# Expanding comma-separated strings in a column in Pandas

The DataFrame.explode() method is commonly used to expand comma-separated strings in a column.

Here is an example.

main.py
import pandas as pd df = pd.DataFrame([ { 'one': 'a,b', 'two': 1 }, { 'one': 'c,d', 'two': 2 } ]) print(df) df = df.assign(one=df.one.str.split(',')).explode('one') print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
one two 0 a,b 1 1 c,d 2 -------------------------------------------------- one two 0 a 1 0 b 1 1 c 2 1 d 2

expand comma separated strings in dataframe column

We used the str.split() method to split the values in the one column into a list on each comma.

The DataFrame.assign method assigns new columns to a DataFrame.

The last step is to call the explode() method on the result.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.