Last updated: Apr 11, 2024
Reading time·4 min
The Pandas "ValueError: Columns must be same length as key" occurs when the
columns you are trying to assign to the DataFrame
are not equal to the number
of keys you've specified.
To solve the error, make sure the number of keys you've specified is equal to the number of values (columns).
Here is an example of how the error occurs.
import pandas as pd df1 = pd.DataFrame({ 'column1': ['Alice', 'Bobby', 'Carl', 'Dan'], 'column2': [29, 30, 31, 32] }) df2 = pd.DataFrame({ 'column1': [100, 200, 300] }) # ⛔️ ValueError: Columns must be same length as key df1[['column3', 'column4']] = df2['column1']
The error occurred when trying to add 2 keys to the first DataFrame
object.
Notice that there are 2 keys (column3
and column4
) between the square
brackets, however, column1
in df2
contains 3 items.
The mismatch between the number of keys (column3
and column4
) and the number
of values (100
, 200
and 300
) caused the error.
To solve the error, you have to make sure the number of keys on the left matches the number of values on the right.
import pandas as pd df1 = pd.DataFrame({ 'column1': ['Alice', 'Bobby', 'Carl', 'Dan'], 'column2': [29, 30, 31, 32] }) df2 = pd.DataFrame({ 'column1': [100, 200, 300] }) df1[['column3', 'column4', 'column5']] = df2['column1'] print(df1)
Running the code sample produces a DataFrame
with the following output.
column1 column2 column3 column4 column5 0 Alice 29 100 200 300 1 Bobby 30 100 200 300 2 Carl 31 100 200 300 3 Dan 32 100 200 300
Notice that we passed 3 keys (column3
, column4
and column5
) because
column1
in df
contains 3 values.
We added a third key to resolve the issue, however, you could've also removed one of the values on the right-hand side.
import pandas as pd df1 = pd.DataFrame({ 'column1': ['Alice', 'Bobby', 'Carl', 'Dan'], 'column2': [29, 30, 31, 32] }) df2 = pd.DataFrame({ 'column1': [100, 200] }) df1[['column3', 'column4']] = df2['column1'] print(df1)
Running the code sample above produces the following DataFrame
.
column1 column2 column3 column4 0 Alice 29 100 200 1 Bobby 30 100 200 2 Carl 31 100 200 3 Dan 32 100 200
If you meant to copy a column from one DataFrame to another, use the following code sample instead.
import pandas as pd df1 = pd.DataFrame({ 'column1': ['Alice', 'Bobby', 'Carl', 'Dan'], 'column2': [29, 30, 31, 32] }) df2 = pd.DataFrame({ 'column1': [100, 200, 300] }) df1['column3'] = df2['column1'] # column1 column2 column3 # 0 Alice 29 100.0 # 1 Bobby 30 200.0 # 2 Carl 31 300.0 # 3 Dan 32 NaN print(df1)
The column in the second DataFrame
only has 3 values, so the fourth row gets
set to a value of NaN
.
pandas.Series.str.split()
Here is an example that solves the error when using pandas.Series.str.split.
import pandas as pd df1 = pd.DataFrame({ 'column1': ['Alice', 'Bobby', 'Carl', 'Dan'], 'column2': [29, 30, 31, 32] }) df2 = pd.DataFrame({ 'column1': ['1. AB', '2. CD', '3. EF'] }) df1[['column3', 'column4']] = df2['column1'].str.split( '.', n=1, expand=True, regex=False) print(df1)
Running the code sample produces the following output.
column1 column2 column3 column4 0 Alice 29 1 AB 1 Bobby 30 2 CD 2 Carl 31 3 EF 3 Dan 32 NaN NaN
The pandas.Series.str.split()
method splits strings around a given separator.
import pandas as pd df2 = pd.DataFrame({ 'column1': ['1. AB', '2. CD', '3. EF'] }) # 0 1 # 0 1 AB # 1 2 CD # 2 3 EF print(df2['column1'].str.split('.', n=1, expand=True, regex=False))
The first argument we passed to the method is the separator to split on.
If not specified, the method splits on whitespace.
The n
argument is the number of splits.
If you set the argument to None
, 0
or -1
, all splits are returned.
The expand
argument is used to expand the split strings into separate columns.
The regex
argument determines if the supplied pattern is a regular expression
or a string literal.
You can learn more about the related topics by checking out the following tutorials:
pd.read_json()