Add columns of a different Length to a DataFrame in Pandas

# Table of Contents

# Add columns of a different Length to a DataFrame in Pandas

To add columns of a different length to a DataFrame in Pandas:

Use the pd.DataFrame() constructor to create a new DataFrame with the additional columns.
Use the pandas.concat() method to concatenate the existing and the new DataFrames.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl'],
    'experience': [10, 13, 15],
})


additional_cols = pd.DataFrame({
    'salary': [1500, 1200, 2500, 3500]
})

df2 = pd.concat([df, additional_cols], axis=1)

print(df2)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  experience  salary
0  Alice        10.0    1500
1  Bobby        13.0    1200
2   Carl        15.0    2500
3    NaN         NaN    3500

add columns of different length to dataframe in pandas

The initial DataFrame has 2 columns and 3 rows.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl'],
    'experience': [10, 13, 15],
})

#     name  experience
# 0  Alice          10
# 1  Bobby          13
# 2   Carl          15
print(df)

We created a new DataFrame that has 1 column and 4 rows.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl'],
    'experience': [10, 13, 15],
})

additional_cols = pd.DataFrame({
    'salary': [1500, 1200, 2500, 3500]
})

#    salary
# 0    1500
# 1    1200
# 2    2500
# 3    3500
print(additional_cols)

The code for this article is available on GitHub

The last step is to use the pandas.concat() method to add the column of a different length to the existing DataFrame.

main.py

Copied!
df2 = pd.concat([df, additional_cols], axis=1)

#     name  experience  salary
# 0  Alice        10.0    1500
# 1  Bobby        13.0    1200
# 2   Carl        15.0    2500
# 3    NaN         NaN    3500
print(df2)

The pandas.concat method concatenates Pandas objects along a given axis.

The axis argument is used to determine the axis along which to concatenate the DataFrames.

By default, the argument defaults to 0 and concatenates the objects along the index axis.

Setting the axis argument to 1 means "concatenate along the columns axis".

Notice that the values in the fourth row for the name and experience columns are missing (NaN).

Make sure the ignore_index argument is set to False when calling pd.concat().

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl'],
    'experience': [10, 13, 15],
})

additional_cols = pd.DataFrame({
    'salary': [1500, 1200, 2500, 3500]
})


df2 = pd.concat([df, additional_cols], axis=1, ignore_index=False)

#     name  experience  salary
# 0  Alice        10.0    1500
# 1  Bobby        13.0    1200
# 2   Carl        15.0    2500
# 3    NaN         NaN    3500
print(df2)

The code for this article is available on GitHub

False is the default value for the ignore_index argument.

If you set the argument to True, then the column names will be lost and the axis will be labeled 0, 1, ..., n - 1.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl'],
    'experience': [10, 13, 15],
})

additional_cols = pd.DataFrame({
    'salary': [1500, 1200, 2500, 3500]
})


df2 = pd.concat([df, additional_cols], axis=1, ignore_index=True)

#        0     1     2
# 0  Alice  10.0  1500
# 1  Bobby  13.0  1200
# 2   Carl  15.0  2500
# 3    NaN   NaN  3500
print(df2)

Setting the ignore_index argument to True is useful if the columns of the objects you are concatenating don't have meaningful indexing information.

# Adding columns of different length to a DataFrame by extending them

You can also use the list.extend() method to extend the column before you add it to the DataFrame.

main.py

Copied!
import pandas as pd


a = ['Alice', 'Bobby']
b = [10, 13, 15]
c = [1000, 2000, 3000, 4000]

a_len, b_len, c_len = len(a), len(b), len(c)

max_len = max(a_len, b_len, c_len)

if not max_len == a_len:
    a.extend([''] * (max_len - a_len))

if not max_len == b_len:
    b.extend([''] * (max_len - b_len))

if not max_len == c_len:
    c.extend([''] * (max_len - b_len))

df = pd.DataFrame({
    'A': a,
    'B': b,
    'C': c
})

#        A   B     C
# 0  Alice  10  1000
# 1  Bobby  13  2000
# 2         15  3000
# 3             4000
print(df)

The code for this article is available on GitHub

The lists in the example have different lengths.

We used the len() function to get the length of each list.

main.py

Copied!
a_len, b_len, c_len = len(a), len(b), len(c)

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

The next step is to get the maximum length.

main.py

Copied!
max_len = max(a_len, b_len, c_len)

We know that the columns we have to add to the DataFrame have to be of the same length, so we use the list.extend() method if the length is insufficient.

main.py

Copied!
if not max_len == a_len:
    a.extend([''] * (max_len - a_len))

Once all lists have the same length, we use the pd.DataFrame() constructor.

main.py

Copied!
df = pd.DataFrame({
    'A': a,
    'B': b,
    'C': c
})

#        A   B     C
# 0  Alice  10  1000
# 1  Bobby  13  2000
# 2         15  3000
# 3             4000
print(df)

# Converting the additional column to a `Series`

If you convert the values of the additional column to Series, the extra rows will get dropped.

main.py

Copied!
import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bobby', 'Carl'],
    'experience': [10, 13, 15],
})

print(df)

salary_col = [1500, 1200, 2500, 3500]

df['salary'] = pd.Series(salary_col)

print('-' * 50)

print(df)

The code for this article is available on GitHub

Running the code sample produces the following output.

shell

Copied!
    name  experience
0  Alice          10
1  Bobby          13
2   Carl          15
--------------------------------------------------
    name  experience  salary
0  Alice          10    1500
1  Bobby          13    1200
2   Carl          15    2500

Notice that the new column has 4 rows.

We converted the list to a Series and added the result to the existing DataFrame and the last row got automatically dropped.

If you omit the conversion to Series, you'd get the ValueError: Length of values does not match length of index error.

# Creating a DataFrame from a dictionary with different lengths

If you need to create a DataFrame from a dictionary with different length values:

Use a list comprehension to convert each dictionary value to a Series.
Use the dict() class to convert the list of key, value tuples to a dictionary.
Pass the result to the pandas.DataFrame() constructor.

main.py

Copied!
import pandas as pd


a_dict = {
    'name': ['Alice', 'Bobby'],
    'experience': [10, 13, 15],
    'salary': [1000, 2000, 3000, 4000]
}

df = pd.DataFrame(
    dict(
        [(key, pd.Series(value))
         for key, value in a_dict.items()]
    )
)

#     name  experience  salary
# 0  Alice        10.0    1000
# 1  Bobby        13.0    2000
# 2    NaN        15.0    3000
# 3    NaN         NaN    4000
print(df)

create dataframe from dictionary with different lengths

The code for this article is available on GitHub

We used a list comprehension to iterate over the dictionary's items.

The dict.items() method returns a new view of the dictionary's items ((key, value) pairs).

main.py

Copied!
import pandas as pd


a_dict = {
    'name': ['Alice', 'Bobby'],
    'experience': [10, 13, 15],
    'salary': [1000, 2000, 3000, 4000]
}

# 👇️ dict_items([('name', ['Alice', 'Bobby']), ('experience', [10, 13, 15]), ('salary', [1000, 2000, 3000, 4000])])
print(a_dict.items())

On each iteration, we convert the current value (list) to a Series and return the key-value pair in a tuple.

Lastly, we use the dict() class to convert the list of key-value pair tuples to a dictionary and pass the dictionary to the pandas.DataFrame() constructor.

main.py

Copied!
df = pd.DataFrame(
    dict(
        [(key, pd.Series(value))
         for key, value in a_dict.items()]
    )
)

#     name  experience  salary
# 0  Alice        10.0    1000
# 1  Bobby        13.0    2000
# 2    NaN        15.0    3000
# 3    NaN         NaN    4000
print(df)

Notice that missing values are marked as NaN in the DataFrame.

# Creating a DataFrame from a dictionary with different lengths using `from_dict()`

You can also use the DataFrame.from_dict() method to create a DataFrame from a dictionary with different lengths, as long as the orient argument is set to index.

main.py

Copied!
import pandas as pd


a_dict = {
    'name': ['Alice', 'Bobby'],
    'experience': [10, 13, 15],
    'salary': [1000, 2000, 3000, 4000]
}

df = pd.DataFrame.from_dict(a_dict, orient='index')

#                 0      1       2       3
# name        Alice  Bobby     NaN     NaN
# experience     10     13    15.0     NaN
# salary       1000   2000  3000.0  4000.0
print(df)

create dataframe from dict with different lengths using from dict

The code for this article is available on GitHub

The from_dict() method constructs a DataFrame from a dictionary of array-like objects.

The orient argument determines the orientation of the data.

We set the orient to index so the keys of the dict become rows in the DataFrame.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

You can use the search field on my Home Page to filter through all of my articles.

Add columns of a different Length to a DataFrame in Pandas

# Table of Contents