Last updated: Apr 12, 2024
Reading time·7 min

To add columns of a different length to a DataFrame in Pandas:
pd.DataFrame() constructor to create a new DataFrame with the
additional columns.pandas.concat() method to concatenate the existing and the new
DataFrames.import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) df2 = pd.concat([df, additional_cols], axis=1) print(df2)
Running the code sample produces the following output.
name experience salary 0 Alice 10.0 1500 1 Bobby 13.0 1200 2 Carl 15.0 2500 3 NaN NaN 3500

The initial DataFrame has 2 columns and 3 rows.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) # name experience # 0 Alice 10 # 1 Bobby 13 # 2 Carl 15 print(df)
We created a new DataFrame that has 1 column and 4 rows.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) # salary # 0 1500 # 1 1200 # 2 2500 # 3 3500 print(additional_cols)
The last step is to use the
pandas.concat()
method to add the column of a different length to the existing DataFrame.
df2 = pd.concat([df, additional_cols], axis=1) # name experience salary # 0 Alice 10.0 1500 # 1 Bobby 13.0 1200 # 2 Carl 15.0 2500 # 3 NaN NaN 3500 print(df2)
The pandas.concat method concatenates Pandas objects along a given axis.
The axis argument is used to determine the axis along which to concatenate the
DataFrames.
0 and concatenates the objects along the index axis.Setting the axis argument to 1 means "concatenate along the columns
axis".
Notice that the values in the fourth row for the name and experience columns
are missing (NaN).
Make sure the ignore_index argument is set to False when calling
pd.concat().
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) df2 = pd.concat([df, additional_cols], axis=1, ignore_index=False) # name experience salary # 0 Alice 10.0 1500 # 1 Bobby 13.0 1200 # 2 Carl 15.0 2500 # 3 NaN NaN 3500 print(df2)
False is the default value for the ignore_index argument.
If you set the argument to True, then the column names will be lost and the
axis will be labeled 0, 1, ..., n - 1.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) df2 = pd.concat([df, additional_cols], axis=1, ignore_index=True) # 0 1 2 # 0 Alice 10.0 1500 # 1 Bobby 13.0 1200 # 2 Carl 15.0 2500 # 3 NaN NaN 3500 print(df2)
Setting the ignore_index argument to True is useful if the columns of the
objects you are concatenating don't have meaningful indexing information.
You can also use the list.extend() method to extend the column before you add
it to the DataFrame.
import pandas as pd a = ['Alice', 'Bobby'] b = [10, 13, 15] c = [1000, 2000, 3000, 4000] a_len, b_len, c_len = len(a), len(b), len(c) max_len = max(a_len, b_len, c_len) if not max_len == a_len: a.extend([''] * (max_len - a_len)) if not max_len == b_len: b.extend([''] * (max_len - b_len)) if not max_len == c_len: c.extend([''] * (max_len - b_len)) df = pd.DataFrame({ 'A': a, 'B': b, 'C': c }) # A B C # 0 Alice 10 1000 # 1 Bobby 13 2000 # 2 15 3000 # 3 4000 print(df)
The lists in the example have different lengths.
We used the len() function to get the length of each list.
a_len, b_len, c_len = len(a), len(b), len(c)
The len() function returns the length (the number of items) of an object.
The next step is to get the maximum length.
max_len = max(a_len, b_len, c_len)
We know that the columns we have to add to the DataFrame have to be of the
same length, so we use the
list.extend() method
if the length is insufficient.
if not max_len == a_len: a.extend([''] * (max_len - a_len))
Once all lists have the same length, we use the pd.DataFrame() constructor.
df = pd.DataFrame({ 'A': a, 'B': b, 'C': c }) # A B C # 0 Alice 10 1000 # 1 Bobby 13 2000 # 2 15 3000 # 3 4000 print(df)
SeriesIf you convert the values of the additional column to Series, the extra rows
will get dropped.
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) print(df) salary_col = [1500, 1200, 2500, 3500] df['salary'] = pd.Series(salary_col) print('-' * 50) print(df)
Running the code sample produces the following output.
name experience 0 Alice 10 1 Bobby 13 2 Carl 15 -------------------------------------------------- name experience salary 0 Alice 10 1500 1 Bobby 13 1200 2 Carl 15 2500
Notice that the new column has 4 rows.
We converted the list to a Series and added the result to the existing
DataFrame and the last row got automatically dropped.
If you omit the conversion to Series, you'd get the
ValueError: Length of values does not match length of index
error.
If you need to create a DataFrame from a dictionary with different length
values:
Series.dict() class to convert the list of key, value tuples to a
dictionary.pandas.DataFrame() constructor.import pandas as pd a_dict = { 'name': ['Alice', 'Bobby'], 'experience': [10, 13, 15], 'salary': [1000, 2000, 3000, 4000] } df = pd.DataFrame( dict( [(key, pd.Series(value)) for key, value in a_dict.items()] ) ) # name experience salary # 0 Alice 10.0 1000 # 1 Bobby 13.0 2000 # 2 NaN 15.0 3000 # 3 NaN NaN 4000 print(df)

We used a list comprehension to iterate over the dictionary's items.
The dict.items() method returns a new view of the dictionary's items ((key, value) pairs).
import pandas as pd a_dict = { 'name': ['Alice', 'Bobby'], 'experience': [10, 13, 15], 'salary': [1000, 2000, 3000, 4000] } # 👇️ dict_items([('name', ['Alice', 'Bobby']), ('experience', [10, 13, 15]), ('salary', [1000, 2000, 3000, 4000])]) print(a_dict.items())
On each iteration, we convert the current value (list) to a Series and return
the key-value pair in a tuple.
Lastly, we use the dict() class to convert the list of key-value pair tuples
to a dictionary and pass the dictionary to the pandas.DataFrame() constructor.
df = pd.DataFrame( dict( [(key, pd.Series(value)) for key, value in a_dict.items()] ) ) # name experience salary # 0 Alice 10.0 1000 # 1 Bobby 13.0 2000 # 2 NaN 15.0 3000 # 3 NaN NaN 4000 print(df)
Notice that missing values are marked as NaN in the DataFrame.
from_dict()You can also use the
DataFrame.from_dict()
method to create a DataFrame from a dictionary with different lengths, as long
as the orient argument is set to index.
import pandas as pd a_dict = { 'name': ['Alice', 'Bobby'], 'experience': [10, 13, 15], 'salary': [1000, 2000, 3000, 4000] } df = pd.DataFrame.from_dict(a_dict, orient='index') # 0 1 2 3 # name Alice Bobby NaN NaN # experience 10 13 15.0 NaN # salary 1000 2000 3000.0 4000.0 print(df)

The from_dict() method constructs a DataFrame from a dictionary of
array-like objects.
The orient argument determines the orientation of the data.
We set the orient to index so the keys of the dict become rows in the
DataFrame.
You can learn more about the related topics by checking out the following tutorials: