Copy a column from one DataFrame to another in Pandas

avatar
Borislav Hadzhiev

Last updated: Jun 16, 2023
6 min

banner

# Table of Contents

  1. Copy a column from one DataFrame to another in Pandas
  2. Copying columns from one DataFrame to another with the copy() method
  3. Copy columns from one DataFrame to another without NaN values

If you get NaN values when copying columns from one DataFrame to another, check out the third subheading.

# Copy a column from one DataFrame to another in Pandas

You can use bracket notation to copy a column from one DataFrame to another.

The specified column will get copied to the new DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }) df2['year'] = df1['year'] df2['profit'] = df1['profit'] # employees year profit # 0 10 2020 1500 # 1 15 2021 2500 # 2 20 2022 3500 # 3 25 2023 4500 print(df2)

copy column from one dataframe to another

You can also copy multiple columns from one DataFrame to another in a single statement.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }) # ✅ Copy 2 columns from one DataFrame to another df2[['year', 'profit']] = df1[['year', 'profit']] # employees year profit # 0 10 2020 1500 # 1 15 2021 2500 # 2 20 2022 3500 # 3 25 2023 4500 print(df2)

Notice that we have 2 sets of curly braces in the assignment.

The code samples use bracket notation to copy the year and profit columns from the first DataFrame to the second.

Make sure you don't try to use dot notation, otherwise, you'd get a warning:

  • "Warning: Pandas doesn't allow columns to be created via a new attribute name"
main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }) # ⛔️ Warning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access df2.year = df1.year

The warning simply means that you should use bracket notation [] when copying a column and not dot notation.

The following is incorrect:

main.py
# ⛔️ Incorrect df2.year = df1.year

The following is correct:

main.py
# ✅ Correct df2['year'] = df1['year']

# Copying columns from one DataFrame to another with the copy() method

You can also use the copy() method to copy columns from one DataFrame to another.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], 'employees': [10, 15, 20, 25], }) columns_to_copy = df1[['year', 'profit']] df2 = columns_to_copy.copy() # year profit # 0 2020 1500 # 1 2021 2500 # 2 2022 3500 # 3 2023 4500 print(df2)

copy column from one dataframe to another using copy method

The columns_to_copy variable is a DataFrame that consists of the columns we want to copy.

The DataFrame.copy method makes a copy of the DataFrame's indices and data.

The method creates a deep copy of the DataFrame, so modifications to the data or indices of the copy won't be reflected in the original DataFrame.

This is determined by the deep argument which is set to True by default.

main.py
df2 = columns_to_copy.copy(deep=True)

# Copy columns from one DataFrame to another without NaN values

When copying columns from one DataFrame to another, you might get NaN values in the resulting DataFrame.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=[1, 2, 3, 4]) df2['year'] = df1['year'] df2['profit'] = df1['profit'] # employees year profit # 1 10 NaN NaN # 2 15 NaN NaN # 3 20 NaN NaN # 4 25 NaN NaN print(df2)

getting nan values when copying dataframe columns

Notice that the year and profit columns contain NaN values after copying them to the other DataFrame.

The issue is caused because the indexes of the DataFrames are different.

This causes the indexes for each column to be different.

When pandas tries to align the indexes when assigning columns to the second DataFrame, it fails and inserts NaN values.

One way to resolve the issue is to homogenize the index values.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=[1, 2, 3, 4]) # ✅ Homogenize indexes before copying columns df2.index = df1.index df2['year'] = df1['year'] df2['profit'] = df1['profit'] # employees year profit # a 10 2020 1500 # b 15 2021 2500 # c 20 2022 3500 # d 25 2023 4500 print(df2)

homogenize indexes before copying columns

I only added the following line to the code snippet.

main.py
# ✅ Homogenize indexes before copying columns df2.index = df1.index

Once you homogenize the index values, you can copy the columns over and they won't contain NaN values.

You can also resolve the issue by assigning NumPy arrays to the columns.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=[1, 2, 3, 4]) # ✅ call to_numpy() method df2['year'] = df1['year'].to_numpy() df2['profit'] = df1['profit'].to_numpy() # employees year profit # a 10 2020 1500 # b 15 2021 2500 # c 20 2022 3500 # d 25 2023 4500 print(df2)

bypass index alignment

The DataFrame.to_numpy method converts a DataFrame to a NumPy array.

Converting the columns to a NumPy array enables us to bypass the index alignment.

You can use two sets of square brackets if you need to copy multiple columns from one DataFrame to another in a single statement.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=[1, 2, 3, 4]) df2[['year', 'profit']] = df1[['year', 'profit']].to_numpy() # employees year profit # a 10 2020 1500 # b 15 2021 2500 # c 20 2022 3500 # d 25 2023 4500 print(df2)

copy multiple columns from one dataframe to another

Notice that we used two sets of square brackets [] when specifying multiple columns.

You can also use the values attribute when copying columns.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=[1, 2, 3, 4]) df2['year'] = df1['year'].values df2['profit'] = df1['profit'].values # employees year profit # a 10 2020 1500 # b 15 2021 2500 # c 20 2022 3500 # d 25 2023 4500 print(df2)

copy columns using values attribute

The DataFrame.values attribute returns a NumPy representation of the DataFrame.

When accessing the values attribute, only the values in the DataFrame are returned (the axes labels are removed).

Homogenizing the indexes or converting the columns to a NumPy array to bypass index alignment is necessary because the indexes in the two DataFrames are different.

The comment in the following code sample demonstrates how there is no overlap between the indexes of the two DataFrames.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=[1, 2, 3, 4]) # df1.index df2.index # a # b # c # d # 1 # 2 # 3 # 4

The indexes of the two DataFrames are not alignable because there is no overlap.

Here is an example where the indexes partially overlap.

main.py
import pandas as pd df1 = pd.DataFrame({ 'year': [2020, 2021, 2022, 2023], 'profit': [1500, 2500, 3500, 4500], }, index=['a', 'b', 'c', 'd']) df2 = pd.DataFrame({ 'employees': [10, 15, 20, 25], }, index=['c', 'd', 'e', 'f']) df2['year'] = df1['year'] df2['profit'] = df1['profit'] # employees year profit # c 10 2022.0 3500.0 # d 15 2023.0 4500.0 # e 20 NaN NaN # f 25 NaN NaN print(df2)

There is a partial overlap between the indexes of the two DataFrames.

main.py
# df1.index df2.index # a # b # c c # d d # e # f

The c and d indexes overlap so their values in the copied columns are not NaN.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.