Last updated: Apr 12, 2024
Reading time·5 min
Use the DataFrame.index.repeat()
method to repeat the rows in a Pandas
DataFrame N times.
The repeat()
method will repeat each index in the DataFrame
the specified
number of times.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = df.loc[df.index.repeat(2)] print(df2)
Running the code sample produces the following output.
first_name salary 0 Alice 175.1 0 Alice 175.1 1 Bobby 180.2 1 Bobby 180.2 2 Carl 190.3 2 Carl 190.3
The index.repeat() method repeats the elements of an index.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) # 👇️ Index([0, 0, 1, 1, 2, 2], dtype='int64') print(df.index.repeat(2))
We then used the DataFrame.loc indexer to access the group of rows and columns by the indices.
df2 = df.loc[df.index.repeat(2)] # first_name salary # 0 Alice 175.1 # 0 Alice 175.1 # 1 Bobby 180.2 # 1 Bobby 180.2 # 2 Carl 190.3 # 2 Carl 190.3 print(df2)
Notice that there are also repeat indices in the output.
If you want to reset the indices, use the DataFrame.reset_index() method.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = df.loc[df.index.repeat(2)].reset_index(drop=True) print(df2)
Running the code sample produces the following output.
first_name salary 0 Alice 175.1 1 Alice 175.1 2 Bobby 180.2 3 Bobby 180.2 4 Carl 190.3 5 Carl 190.3
The DataFrame.reset_index
method resets the index of the DataFrame
, so the default index is used.
The same approach can be used to repeat each row N times based on another column.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'times': [1, 2, 3], }) df2 = df.loc[df.index.repeat(df.times)].reset_index(drop=True) # first_name times # 0 Alice 1 # 1 Bobby 2 # 2 Bobby 2 # 3 Carl 3 # 4 Carl 3 # 5 Carl 3 print(df2)
We used the times
column to repeat each row N times.
The first row is not repeated, the second row is repeated once and the third row is repeated twice in the example.
The code sample also uses the reset_index()
method to reset the index,
however, this is optional.
np.repeat()
You can also use the
numpy.repeat()
method to repeat the rows of a DataFrame
N times.
import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = pd.DataFrame(np.repeat(df.values, 2, axis=0)) df2.columns = df.columns print(df2)
Running the code sample produces the following output.
first_name salary 0 Alice 175.1 1 Alice 175.1 2 Bobby 180.2 3 Bobby 180.2 4 Carl 190.3 5 Carl 190.3
Make sure you have the numpy module installed to be able to run the code sample.
pip install pandas numpy # or with pip3 pip3 install pandas numpy
The numpy.repeat()
method takes an input array, the number of repetitions and
the axis
as parameters and returns an array with the repeated values.
import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) # [['Alice' 175.1] # ['Alice' 175.1] # ['Bobby' 180.2] # ['Bobby' 180.2] # ['Carl' 190.3] # ['Carl' 190.3]] print(np.repeat(df.values, 2, axis=0))
We have to pass the array to the pandas.DataFrame()
constructor and set the
columns of the new DataFrame
to the columns of the existing DataFrame
.
df2 = pd.DataFrame(np.repeat(df.values, 2, axis=0)) df2.columns = df.columns
You can also set the columns when initializing the new DataFrame
.
import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = pd.DataFrame(np.repeat(df.values, 2, axis=0), columns=df.columns) # first_name salary # 0 Alice 175.1 # 1 Alice 175.1 # 2 Bobby 180.2 # 3 Bobby 180.2 # 4 Carl 190.3 # 5 Carl 190.3 print(df2)
The code sample uses the column
argument of the pandas.DataFrame()
class to
achieve the same result.
You can also use the loc
indexer as we did in the first subheading.
import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = df.loc[np.repeat(df.index, 2)].reset_index(drop=True) # first_name salary # 0 Alice 175.1 # 1 Alice 175.1 # 2 Bobby 180.2 # 3 Bobby 180.2 # 4 Carl 190.3 # 5 Carl 190.3 print(df2)
However, when using the df.loc
indexer, we have to manually reset the indices
of the DataFrame
by calling reset_index()
.
pd.concat()
You can also use the
pandas.concat
method to repeat the rows in a DataFrame
N times.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = pd.concat([df] * 2).sort_index().reset_index(drop=True) # first_name salary # 0 Alice 175.1 # 1 Alice 175.1 # 2 Bobby 180.2 # 3 Bobby 180.2 # 4 Carl 190.3 # 5 Carl 190.3 print(df2)
The pandas.concat()
method concatenates Pandas objects along a particular
axis.
import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) # first_name salary # 0 Alice 175.1 # 1 Bobby 180.2 # 2 Carl 190.3 # 0 Alice 175.1 # 1 Bobby 180.2 # 2 Carl 190.3 print(pd.concat([df] * 2))
We then have to:
reset_index()
method to reset the indices of the DataFrame
.import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl'], 'salary': [175.1, 180.2, 190.3], }) df2 = pd.concat([df] * 2).sort_index().reset_index(drop=True) # first_name salary # 0 Alice 175.1 # 1 Alice 175.1 # 2 Bobby 180.2 # 3 Bobby 180.2 # 4 Carl 190.3 # 5 Carl 190.3 print(df2)
You can learn more about the related topics by checking out the following tutorials: