Last updated: Apr 12, 2024
Reading time·5 min
To calculate the mean (average) across multiple DataFrames()
:
pandas.concat()
method to concatenate the DataFrames.mean()
method on the resulting DataFrame
to get the mean of the
values.import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) print(df3) print('-' * 50) print(df3.mean())
Running the code sample returns the following output.
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 3 4 9 4 5 10 -------------------------------------------------- x 4.5 y 6.5 dtype: float64
We used the pandas.concat() method to concatenate the two DataFrames along the index axis.
As a result, we got a new DataFrame
that contains the rows of both DataFrames.
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) # x y # 0 2 1 # 1 4 3 # 2 6 5 # 3 8 7 # 4 10 9 # 0 1 6 # 1 2 7 # 2 3 8 # 3 4 9 # 4 5 10 print(df3)
The last step is to call the DataFrame.mean()
method on the resulting DataFrame
.
# x 4.5 # y 6.5 # dtype: float64 print(df3.mean())
The method returns the mean of the values over the requested axis
(the index
axis by default).
The mean (or average) is calculated by:
If you want to calculate the mean values across multiple DataFrames by row index, use the DataFrame.groupby() method.
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) print(df3) by_row_index = df3.groupby(df3.index) mean_values = by_row_index.mean() print('-' * 50) print(mean_values)
Running the code sample produces the following output.
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 3 4 9 4 5 10 -------------------------------------------------- x y 0 1.5 3.5 1 3.0 5.0 2 4.5 6.5 3 6.0 8.0 4 7.5 9.5
We grouped the DataFrame
by the index column.
0
, index 1
, etc).This approach works even if the two DataFrames have a different number of rows.
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3], 'y': [6, 7, 8] }) df3 = pd.concat([df1, df2]) print(df3) by_row_index = df3.groupby(df3.index) mean_values = by_row_index.mean() print('-' * 50) print(mean_values)
Running the code sample produces the following output.
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 -------------------------------------------------- x y 0 1.5 3.5 1 3.0 5.0 2 4.5 6.5 3 8.0 7.0 4 10.0 9.0
If a row index is missing in one of the DataFrames, the mean is computed on the single available row.
You can also calculate the mean across multiple DataFrames by row index by using
the stack()
and unstack()
methods.
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) mean_values = (df1.stack() + df2.stack()) / 2 mean_values = mean_values.unstack() print(mean_values)
Running the code sample produces the following output.
x y 0 1.5 3.5 1 3.0 5.0 2 4.5 6.5 3 6.0 8.0 4 7.5 9.5
The DataFrame.stack
method stacks the prescribed level(s) from columns to
index.
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) # 0 x 2 # y 1 # 1 x 4 # y 3 # 2 x 6 # y 5 # 3 x 8 # y 7 # 4 x 10 # y 9 # dtype: int64 print(df1.stack())
The method returns the stacked DataFrame
.
We stacked the two DataFrames and divided the result by 2.
mean_values = (df1.stack() + df2.stack()) / 2 mean_values = mean_values.unstack()
The last step is to use the
DataFrame.unstack
method to unstack the resulting DataFrame
.
The method returns a DataFrame
that has a new level of column labels whose
innermost level consists of the pivoted index labels.
If you need to find the median across the two DataFrames, use the DataFrame.median method instead.
import pandas as pd df1 = pd.DataFrame({ 'x': [2, 4, 6, 8, 10], 'y': [1, 3, 5, 7, 9] }) df2 = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10] }) df3 = pd.concat([df1, df2]) print(df3) print('-' * 50) print(df3.median())
Running the code sample produces the following output.
x y 0 2 1 1 4 3 2 6 5 3 8 7 4 10 9 0 1 6 1 2 7 2 3 8 3 4 9 4 5 10 -------------------------------------------------- x 4.0 y 7.0 dtype: float64
The DataFrame.median()
method returns the median of the values over the
requested axis (the index axis by default).
The median is calculated by:
You can learn more about the related topics by checking out the following tutorials: