Add columns of a different Length to a DataFrame in Pandas

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
7 min

banner

# Table of Contents

  1. Add columns of a different Length to a DataFrame in Pandas
  2. Adding columns of different length to a DataFrame by extending them
  3. Converting the additional column to a Series
  4. Creating a DataFrame from a dictionary with different lengths
  5. Creating a DataFrame from a dictionary with different lengths using from_dict()

# Add columns of a different Length to a DataFrame in Pandas

To add columns of a different length to a DataFrame in Pandas:

  1. Use the pd.DataFrame() constructor to create a new DataFrame with the additional columns.
  2. Use the pandas.concat() method to concatenate the existing and the new DataFrames.
main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) df2 = pd.concat([df, additional_cols], axis=1) print(df2)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience salary 0 Alice 10.0 1500 1 Bobby 13.0 1200 2 Carl 15.0 2500 3 NaN NaN 3500

add columns of different length to dataframe in pandas

The initial DataFrame has 2 columns and 3 rows.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) # name experience # 0 Alice 10 # 1 Bobby 13 # 2 Carl 15 print(df)

We created a new DataFrame that has 1 column and 4 rows.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) # salary # 0 1500 # 1 1200 # 2 2500 # 3 3500 print(additional_cols)
The code for this article is available on GitHub

The last step is to use the pandas.concat() method to add the column of a different length to the existing DataFrame.

main.py
df2 = pd.concat([df, additional_cols], axis=1) # name experience salary # 0 Alice 10.0 1500 # 1 Bobby 13.0 1200 # 2 Carl 15.0 2500 # 3 NaN NaN 3500 print(df2)

The pandas.concat method concatenates Pandas objects along a given axis.

The axis argument is used to determine the axis along which to concatenate the DataFrames.

By default, the argument defaults to 0 and concatenates the objects along the index axis.

Setting the axis argument to 1 means "concatenate along the columns axis".

Notice that the values in the fourth row for the name and experience columns are missing (NaN).

Make sure the ignore_index argument is set to False when calling pd.concat().

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) df2 = pd.concat([df, additional_cols], axis=1, ignore_index=False) # name experience salary # 0 Alice 10.0 1500 # 1 Bobby 13.0 1200 # 2 Carl 15.0 2500 # 3 NaN NaN 3500 print(df2)
The code for this article is available on GitHub

False is the default value for the ignore_index argument.

If you set the argument to True, then the column names will be lost and the axis will be labeled 0, 1, ..., n - 1.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) additional_cols = pd.DataFrame({ 'salary': [1500, 1200, 2500, 3500] }) df2 = pd.concat([df, additional_cols], axis=1, ignore_index=True) # 0 1 2 # 0 Alice 10.0 1500 # 1 Bobby 13.0 1200 # 2 Carl 15.0 2500 # 3 NaN NaN 3500 print(df2)

Setting the ignore_index argument to True is useful if the columns of the objects you are concatenating don't have meaningful indexing information.

# Adding columns of different length to a DataFrame by extending them

You can also use the list.extend() method to extend the column before you add it to the DataFrame.

main.py
import pandas as pd a = ['Alice', 'Bobby'] b = [10, 13, 15] c = [1000, 2000, 3000, 4000] a_len, b_len, c_len = len(a), len(b), len(c) max_len = max(a_len, b_len, c_len) if not max_len == a_len: a.extend([''] * (max_len - a_len)) if not max_len == b_len: b.extend([''] * (max_len - b_len)) if not max_len == c_len: c.extend([''] * (max_len - b_len)) df = pd.DataFrame({ 'A': a, 'B': b, 'C': c }) # A B C # 0 Alice 10 1000 # 1 Bobby 13 2000 # 2 15 3000 # 3 4000 print(df)
The code for this article is available on GitHub

The lists in the example have different lengths.

We used the len() function to get the length of each list.

main.py
a_len, b_len, c_len = len(a), len(b), len(c)

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

The next step is to get the maximum length.

main.py
max_len = max(a_len, b_len, c_len)

We know that the columns we have to add to the DataFrame have to be of the same length, so we use the list.extend() method if the length is insufficient.

main.py
if not max_len == a_len: a.extend([''] * (max_len - a_len))

Once all lists have the same length, we use the pd.DataFrame() constructor.

main.py
df = pd.DataFrame({ 'A': a, 'B': b, 'C': c }) # A B C # 0 Alice 10 1000 # 1 Bobby 13 2000 # 2 15 3000 # 3 4000 print(df)

# Converting the additional column to a Series

If you convert the values of the additional column to Series, the extra rows will get dropped.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl'], 'experience': [10, 13, 15], }) print(df) salary_col = [1500, 1200, 2500, 3500] df['salary'] = pd.Series(salary_col) print('-' * 50) print(df)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience 0 Alice 10 1 Bobby 13 2 Carl 15 -------------------------------------------------- name experience salary 0 Alice 10 1500 1 Bobby 13 1200 2 Carl 15 2500

Notice that the new column has 4 rows.

We converted the list to a Series and added the result to the existing DataFrame and the last row got automatically dropped.

If you omit the conversion to Series, you'd get the ValueError: Length of values does not match length of index error.

# Creating a DataFrame from a dictionary with different lengths

If you need to create a DataFrame from a dictionary with different length values:

  1. Use a list comprehension to convert each dictionary value to a Series.
  2. Use the dict() class to convert the list of key, value tuples to a dictionary.
  3. Pass the result to the pandas.DataFrame() constructor.
main.py
import pandas as pd a_dict = { 'name': ['Alice', 'Bobby'], 'experience': [10, 13, 15], 'salary': [1000, 2000, 3000, 4000] } df = pd.DataFrame( dict( [(key, pd.Series(value)) for key, value in a_dict.items()] ) ) # name experience salary # 0 Alice 10.0 1000 # 1 Bobby 13.0 2000 # 2 NaN 15.0 3000 # 3 NaN NaN 4000 print(df)

create dataframe from dictionary with different lengths

The code for this article is available on GitHub

We used a list comprehension to iterate over the dictionary's items.

The dict.items() method returns a new view of the dictionary's items ((key, value) pairs).

main.py
import pandas as pd a_dict = { 'name': ['Alice', 'Bobby'], 'experience': [10, 13, 15], 'salary': [1000, 2000, 3000, 4000] } # 👇️ dict_items([('name', ['Alice', 'Bobby']), ('experience', [10, 13, 15]), ('salary', [1000, 2000, 3000, 4000])]) print(a_dict.items())

On each iteration, we convert the current value (list) to a Series and return the key-value pair in a tuple.

Lastly, we use the dict() class to convert the list of key-value pair tuples to a dictionary and pass the dictionary to the pandas.DataFrame() constructor.

main.py
df = pd.DataFrame( dict( [(key, pd.Series(value)) for key, value in a_dict.items()] ) ) # name experience salary # 0 Alice 10.0 1000 # 1 Bobby 13.0 2000 # 2 NaN 15.0 3000 # 3 NaN NaN 4000 print(df)

Notice that missing values are marked as NaN in the DataFrame.

# Creating a DataFrame from a dictionary with different lengths using from_dict()

You can also use the DataFrame.from_dict() method to create a DataFrame from a dictionary with different lengths, as long as the orient argument is set to index.

main.py
import pandas as pd a_dict = { 'name': ['Alice', 'Bobby'], 'experience': [10, 13, 15], 'salary': [1000, 2000, 3000, 4000] } df = pd.DataFrame.from_dict(a_dict, orient='index') # 0 1 2 3 # name Alice Bobby NaN NaN # experience 10 13 15.0 NaN # salary 1000 2000 3000.0 4000.0 print(df)

create dataframe from dict with different lengths using from dict

The code for this article is available on GitHub

The from_dict() method constructs a DataFrame from a dictionary of array-like objects.

The orient argument determines the orientation of the data.

We set the orient to index so the keys of the dict become rows in the DataFrame.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.