Pandas: Calculate mean (average) across multiple DataFrames

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. Pandas: Calculate mean (average) across multiple DataFrames
  2. Calculate mean across multiple DataFrames by row index
  3. Calculate mean across multiple DataFrames by row index using stack() and unstack()
  4. Pandas: Calculate median across multiple DataFrames

# Pandas: Calculate mean (average) across multiple DataFrames

To calculate the mean (average) across multiple DataFrames():

  1. Use the pandas.concat() method to concatenate the DataFrames.
  2. Call the mean() method on the resulting DataFrame to get the mean of the values.
main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) print(df3) print('-' * 50) print(df3.mean())
The code for this article is available on GitHub

Running the code sample returns the following output.

shell
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 3 4 9 4 5 10 -------------------------------------------------- x 4.5 y 6.5 dtype: float64

calculate mean average across multiple dataframes in pandas

We used the pandas.concat() method to concatenate the two DataFrames along the index axis.

As a result, we got a new DataFrame that contains the rows of both DataFrames.

main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) # x y # 0 2 1 # 1 4 3 # 2 6 5 # 3 8 7 # 4 10 9 # 0 1 6 # 1 2 7 # 2 3 8 # 3 4 9 # 4 5 10 print(df3)
The code for this article is available on GitHub

The last step is to call the DataFrame.mean() method on the resulting DataFrame.

main.py
# x 4.5 # y 6.5 # dtype: float64 print(df3.mean())

The method returns the mean of the values over the requested axis (the index axis by default).

The mean (or average) is calculated by:

  1. Adding the numbers in the column together.
  2. Dividing the total sum by the number of scores.

# Calculate mean across multiple DataFrames by row index

If you want to calculate the mean values across multiple DataFrames by row index, use the DataFrame.groupby() method.

main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) print(df3) by_row_index = df3.groupby(df3.index) mean_values = by_row_index.mean() print('-' * 50) print(mean_values)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 3 4 9 4 5 10 -------------------------------------------------- x y 0 1.5 3.5 1 3.0 5.0 2 4.5 6.5 3 6.0 8.0 4 7.5 9.5

calculate mean by row index across multiple dataframes

We grouped the DataFrame by the index column.

The code sample calculates the mean between the values that have a matching index (e.g. mean between values with index 0, index 1, etc).

This approach works even if the two DataFrames have a different number of rows.

main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3], 'y': [6, 7, 8] }) df3 = pd.concat([df1, df2]) print(df3) by_row_index = df3.groupby(df3.index) mean_values = by_row_index.mean() print('-' * 50) print(mean_values)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 -------------------------------------------------- x y 0 1.5 3.5 1 3.0 5.0 2 4.5 6.5 3 8.0 7.0 4 10.0 9.0

If a row index is missing in one of the DataFrames, the mean is computed on the single available row.

# Calculate mean across multiple DataFrames by row index using stack() and unstack()

You can also calculate the mean across multiple DataFrames by row index by using the stack() and unstack() methods.

main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) mean_values = (df1.stack() + df2.stack()) / 2 mean_values = mean_values.unstack() print(mean_values)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
x y 0 1.5 3.5 1 3.0 5.0 2 4.5 6.5 3 6.0 8.0 4 7.5 9.5

calculate mean across multiple dataframes using stack and-unstack

The DataFrame.stack method stacks the prescribed level(s) from columns to index.

main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) # 0 x 2 # y 1 # 1 x 4 # y 3 # 2 x 6 # y 5 # 3 x 8 # y 7 # 4 x 10 # y 9 # dtype: int64 print(df1.stack())
The code for this article is available on GitHub

The method returns the stacked DataFrame.

We stacked the two DataFrames and divided the result by 2.

main.py
mean_values = (df1.stack() + df2.stack()) / 2 mean_values = mean_values.unstack()

The last step is to use the DataFrame.unstack method to unstack the resulting DataFrame.

The method returns a DataFrame that has a new level of column labels whose innermost level consists of the pivoted index labels.

# Pandas: Calculate median across multiple DataFrames

If you need to find the median across the two DataFrames, use the DataFrame.median method instead.

main.py
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) print(df3) print('-' * 50) print(df3.median())
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 3 4 9 4 5 10 -------------------------------------------------- x 4.0 y 7.0 dtype: float64

calculate median across two dataframes in pandas

The DataFrame.median() method returns the median of the values over the requested axis (the index axis by default).

The median is calculated by:

  1. Arranging the numbers from the smallest to the largest.
  2. If the number of data points is odd, the median is the middle data point.
  3. If the number is even, the median is the average of the two middle data points.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.