Last updated: Apr 11, 2024
Reading time·6 min
To add a column with incremental numbers to a Pandas DataFrame
:
DataFrame.insert()
method to insert a column into the DataFrame
at a specific index.range()
class to add a column with incremental numbers.import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) df.insert(0, 'ID', range(0, 0 + len(df))) # ID name experience salary # 0 0 Alice NaN NaN # 1 1 Bobby 5.0 180.2 # 2 2 None NaN 190.3 # 3 3 None NaN 205.4 print(df)
The
DataFrame.insert()
method inserts a column into a DataFrame
at a specified location.
ValueError
if a column with the specified name is already contained in the DataFrame
unless the allow_duplicates
parameter is set to `True`.We passed the following 3 parameters to the DataFrame.insert()
method:
0
inserts the
ID
column as the first in the DataFrame
.Note that the insertion index has to be greater than or equal to 0
and less
than or equal to len(df)
.
ID
in the example).values
the column should contain. Can be scalar, Series
or
array-like.df.insert(0, 'ID', range(0, 0 + len(df))) # ID name experience salary # 0 0 Alice NaN NaN # 1 1 Bobby 5.0 180.2 # 2 2 None NaN 190.3 # 3 3 None NaN 205.4 print(df)
We used the range()
class to get an object that contains the values of the
column.
The incremental numbers in the example start from 0
, however, you can specify
any other starting value.
Here is an example that uses the number 5
as the starting value of the
incremental column.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) df.insert(0, 'ID', range(5, 5 + len(df))) # ID name experience salary # 0 5 Alice NaN NaN # 1 6 Bobby 5.0 180.2 # 2 7 None NaN 190.3 # 3 8 None NaN 205.4 print(df)
The range() class is commonly used for looping
a specific number of times in for
loops and takes the following arguments:
Name | Description |
---|---|
start | An integer representing the start of the range (defaults to 0 ) |
stop | Go up to, but not including the provided integer |
step | Range will consist of every N numbers from start to stop (defaults to 1 ) |
The first argument you pass to the range()
class is going to be the number you
want to start incrementing from.
The second argument is the stop
value (exclusive) and is determined by adding
the start
value to the length of the DataFrame
.
Note that the DataFrame.insert()
method modifies the DataFrame
in place
and returns None
.
If you don't need to add the new column at a specific index, you can shorten this a bit.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) df['ID'] = range(5, 5 + len(df)) print('-' * 50) # name experience salary ID # 0 Alice NaN NaN 5 # 1 Bobby 5.0 180.2 6 # 2 None NaN 190.3 7 # 3 None NaN 205.4 8 print(df)
We directly added a column with incremental numbers to the DataFrame
without
using insert()
.
However, notice that the column is added at the end of the DataFrame
.
rename()
You can also use the
DataFrame.rename() method
to add a column with incremental numbers to a DataFrame
.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) df = df.reset_index() df = df.rename(columns={'index': 'ID'}) df['ID'] = df.index + 5 print('-' * 50) # ID name experience salary # 0 5 Alice NaN NaN # 1 6 Bobby 5.0 180.2 # 2 7 None NaN 190.3 # 3 8 None NaN 205.4 print(df)
We used the
DataFrame.reset_index() method
to reset the index of the DataFrame
.
The next step is to use the
DataFrame.rename()
method to rename the index
column to ID
(or any other name).
Lastly, we set the starting value of the ID
column to 5
.
This can be any other value you want to start incrementing from.
The ID
column is an index column, so it automatically increments the value
with each row.
assign()
You can also use the
pandas.assign()
method to add a column with incremental numbers to a DataFrame
.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', None, None], 'experience': [None, 5, None, None], 'salary': [None, 180.2, 190.3, 205.4], }) # name experience salary # 0 Alice NaN NaN # 1 Bobby 5.0 180.2 # 2 None NaN 190.3 # 3 None NaN 205.4 print(df) df = df.assign(ID=lambda x: range(5, 5 + len(x))) print('-' * 50) # name experience salary ID # 0 Alice NaN NaN 5 # 1 Bobby 5.0 180.2 6 # 2 None NaN 190.3 7 # 3 None NaN 205.4 8 print(df)
The
DataFrame.assign()
method assigns a new column to a DataFrame
.
The method returns a new object with all the existing DataFrame
columns and
the new columns.
Columns whose names already exist in the DataFrame
, get overwritten.
We start incrementing from 5
in the example, however, you could use any other
value.
You can learn more about the related topics by checking out the following tutorials: