Last updated: Apr 11, 2024
Reading time·4 min
Use the DataFrame.explode()
method to create a new row for each element in a
List in a DataFrame.
The method transforms each element of the specified list to a row and replicates the index values.
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first') print('-' * 50) print(df)
Running the code sample produces the following output.
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- first second 0 a 1 0 b 1 1 c 2 1 d 2 2 e 3 2 f 3 3 g 4
The DataFrame.explode method transforms each element of a list-like object to a row and replicates the index values.
The only argument we passed to the index is the column to explode.
If you need to explode multiple columns, set the argument to a list of strings.
Notice that the index values are replicated in the output.
first second 0 a 1 0 b 1 1 c 2 1 d 2 2 e 3 2 f 3 3 g 4
You can change the default behavior by setting the ignore_index
argument to
True
.
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first', ignore_index=True) print('-' * 50) print(df)
Here is the output of running the script with python main.py
.
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- first second 0 a 1 1 b 1 2 c 2 3 d 2 4 e 3 5 f 3 6 g 4
Alternatively, you can use the reset_index()
method.
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first').reset_index(drop=True) print('-' * 50) print(df)
Running the code sample produces the following output.
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- first second 0 a 1 1 b 1 2 c 2 3 d 2 4 e 3 5 f 3 6 g 4
The DataFrame.reset_index()
method resets the index of the DataFrame
, causing it to use the default index.
You can also create new rows by exploding only the specific column in the
DataFrame
.
import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], ['e', 'f'], 'g'], 'second': [1, 2, 3, 4] }) print(df) print('-' * 50) print(df['first'].explode())
Running the code sample produces the following output.
first second 0 [a, b] 1 1 [c, d] 2 2 [e, f] 3 3 g 4 -------------------------------------------------- 0 a 0 b 1 c 1 d 2 e 2 f 3 g Name: first, dtype: object
3 things to note when using the DataFrame.explode()
method:
numpy.nan
.DataFrame
or Series
is always object
.import pandas as pd df = pd.DataFrame({ 'first': [['a', 'b'], ['c', 'd'], [], 'e'], 'second': [1, 2, 3, 4] }) print(df) df = df.explode('first') print('-' * 50) print(df) print('-' * 50) print(df.dtypes)
Running the code sample produces the following output.
first second 0 [a, b] 1 1 [c, d] 2 2 [] 3 3 e 4 -------------------------------------------------- first second 0 a 1 0 b 1 1 c 2 1 d 2 2 NaN 3 3 e 4 -------------------------------------------------- first object second int64 dtype: object
The DataFrame.explode()
method is commonly used to expand comma-separated
strings in a column.
Here is an example.
import pandas as pd df = pd.DataFrame([ { 'one': 'a,b', 'two': 1 }, { 'one': 'c,d', 'two': 2 } ]) print(df) df = df.assign(one=df.one.str.split(',')).explode('one') print('-' * 50) print(df)
Running the code sample produces the following output.
one two 0 a,b 1 1 c,d 2 -------------------------------------------------- one two 0 a 1 0 b 1 1 c 2 1 d 2
We used the str.split()
method to split the values in the one
column into a
list on each comma.
The
DataFrame.assign
method assigns new columns to a DataFrame
.
The last step is to call the explode()
method on the result.
You can learn more about the related topics by checking out the following tutorials: