How to get the Memory size of a DataFrame in Pandas

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
4 min

banner

# Table of Contents

  1. How to get the Memory size of a DataFrame in Pandas
  2. Including the memory footprint of object dtype columns in the result
  3. How to get the Memory size of a DataFrame using sys.getsizeof()
  4. Get the memory size of a DataFrame using DataFrame.info()

# How to get the Memory size of a DataFrame in Pandas

To get the memory size of a DataFrame in Pandas:

  1. Use the DataFrame.memory_usage() method to get the number of bytes each column occupies.
  2. Call the sum() method on the result to get the total memory size of the DataFrame.
main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) # Index 128 # Name 24 # Date 24 # dtype: int64 print(df.memory_usage()) print('-' * 50) print(df.memory_usage(index=True).sum()) # ๐Ÿ‘‰๏ธ 176

get memory size of dataframe in pandas

The code for this article is available on GitHub

The pandas.DataFrame method returns the memory usage of each column of the DataFrame in bytes.

You can use the index argument to specify if you want to include the contribution of the index in the calculation.

main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) # Index 128 # Name 24 # Date 24 # dtype: int64 print(df.memory_usage(index=True)) print('-' * 50) # Name 24 # Date 24 # dtype: int64 print(df.memory_usage(index=False))

specify if index should be included when calculating memory

By default, the index argument is set to True, which means the memory usage of the DataFrame's index is included in the returned Series.

As shown in the code sample, if index is set to True, its memory consumption is the first row in the output.

To calculate the memory consumption of the entire DataFrame (in bytes), sum the memory usage of all columns.

main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) print(df.memory_usage(index=True).sum()) # ๐Ÿ‘‰๏ธ 176

sum memory usage of all columns

The code for this article is available on GitHub

The DataFrame.sum() method returns the sum of the values over the requested axis.

The method is equivalent to numpy.sum().

# Including the memory footprint of object dtype columns in the result

If you want to include the memory footprint of object dtype columns in the result, set the deep argument to True when calling DataFrame.memory_usage().

main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) print(df.memory_usage(deep=True).sum()) # ๐Ÿ‘‰๏ธ 514 print(df.memory_usage(deep=False).sum()) # ๐Ÿ‘‰๏ธ 176

include memory footprint of object dtype columns

The code for this article is available on GitHub

If the deep argument is set to True, the calculation accounts for the full usage of the contained in the DataFrame objects.

By default, the deep argument is set to True, so the memory footprint of object dtype columns is not included.

Here is an example of setting deep to True without chaining a sum() call.

main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) # Index 128 # Name 185 # Date 201 # dtype: int64 print(df.memory_usage(deep=True)) print('-' * 50) # Index 128 # Name 24 # Date 24 # dtype: int64 print(df.memory_usage(deep=False))

Passing deep=False is the same as not passing the argument at all because False is its default value.

# How to get the Memory size of a DataFrame using sys.getsizeof()

You can also use the sys.getsizeof() method to get the memory size of a DataFrame.

main.py
import sys import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) print(df.memory_usage(deep=True).sum()) # ๐Ÿ‘‰๏ธ 514 print(sys.getsizeof(df)) # ๐Ÿ‘‰๏ธ 530
The code for this article is available on GitHub

The method returns the size of the supplied object in bytes.

# Get the memory size of a DataFrame using DataFrame.info()

You can also use the DataFrame.info() method to get the memory size of a DataFrame.

main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) # memory usage: 176.0+ bytes print(df.info())
The code for this article is available on GitHub

The DataFrame.info() method prints a concise summary of a DataFrame.

You should be able to see the memory usage toward the end of the output.

You can also set the memory_usage argument to "deep" to include the memory footprint of object dtype columns.

main.py
import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) # memory usage: 514.0 bytes print(df.info(memory_usage='deep'))
The code for this article is available on GitHub

If the deep argument is set to True, the calculation accounts for the full usage of the contained in the DataFrame objects.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev