How to Split a Pandas DataFrame into Chunks

avatar
Borislav Hadzhiev

Last updated: Apr 12, 2024
5 min

banner

# Table of Contents

  1. How to Split a Pandas DataFrame into Chunks
  2. Split a Pandas DataFrame into chunks every N rows
  3. Split a Pandas DataFrame into chunks using DataFrame.iloc
  4. Split a Pandas DataFrame every N rows using a list comprehension

# How to Split a Pandas DataFrame into Chunks

Use the numpy.array_split() method to split a DataFrame into chunks.

The method takes the DataFrame and the number of chunks as parameters and splits the DataFrame.

First, make sure that you've installed the numpy module.

shell
pip install numpy pandas # or with pip3 pip3 install numpy pandas

Now, import and use the module as follows.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) list_of_dataframes = np.array_split(df, 2) print(list_of_dataframes)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
[ name experience salary 0 A 1 175.1 1 B 1 180.2 2 C 5 190.3, name experience salary 3 D 7 205.4 4 E 7 210.5 5 F 10 225.3]

split dataframe into chunks

The numpy.array_split() method splits an array (or a DataFrame) into multiple sub-arrays.

main.py
import numpy as np list_of_dataframes = np.array_split(df, 2)

The first argument we passed to the method is the DataFrame and the second is the number of chunks we want to get in the resulting list.

The method returns a list of DataFrames, so you can access a specific DataFrame at an index.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) list_of_dataframes = np.array_split(df, 2) # name experience salary # 0 A 1 175.1 # 1 B 1 180.2 # 2 C 5 190.3 print(list_of_dataframes[0])

accessing specific dataframe chunk

The code sample accesses the first DataFrame (index 0).

You can also use a for loop to iterate over the list of DataFrame chunks.

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) list_of_dataframes = np.array_split(df, 2) for DF in list_of_dataframes: print(DF) print('-' * 50)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
name experience salary 0 A 1 175.1 1 B 1 180.2 2 C 5 190.3 -------------------------------------------------- name experience salary 3 D 7 205.4 4 E 7 210.5 5 F 10 225.3 --------------------------------------------------

using for loop to iterate over dataframe chunks

If you need to split the DataFrame into N chunks, make sure to pass N as the second argument to numpy.array_split().

main.py
import pandas as pd import numpy as np df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) list_of_dataframes = np.array_split(df, 3) print(list_of_dataframes)

The code sample splits the DataFrame into 3 chunks.

Running the code sample produces the following output.

shell
[ name experience salary 0 A 1 175.1 1 B 1 180.2, name experience salary 2 C 5 190.3 3 D 7 205.4, name experience salary 4 E 7 210.5 5 F 10 225.3]

# Split a Pandas DataFrame into chunks every N rows

If you need to split a DataFrame every N rows, use the following reusable function.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) def split_every_n_rows(dataframe, chunk_size=2): chunks = [] num_chunks = len(dataframe) // chunk_size + 1 for index in range(num_chunks): chunks.append(dataframe[index * chunk_size:(index+1) * chunk_size]) return chunks list_of_dataframes = split_every_n_rows(df, 2) print(list_of_dataframes)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
[ name experience salary 0 A 1 175.1 1 B 1 180.2, name experience salary 2 C 5 190.3 3 D 7 205.4, name experience salary 4 E 7 210.5 5 F 10 225.3]

split dataframe into chunks every n rows

The function splits the DataFrame every chunk_size rows (by default 2 rows).

The function returns a list of DataFrames.

You can access the list at a specific index to get a specific DataFrame chunk or you can iterate over the list to access each chunk.

# Split a Pandas DataFrame into chunks using DataFrame.iloc

You can also use the DataFrame.iloc integer-based indexer to split a Pandas DataFrame into chunks.

main.py
import math import pandas as pd df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) def split_every_n_rows(dataframe, chunk_size=2): chunks = [] num_chunks = math.ceil(int(dataframe.shape[0] / chunk_size)) for index in range(0, dataframe.shape[0], num_chunks): chunks.append( dataframe.iloc[index:index + chunk_size] ) return chunks list_of_dataframes = split_every_n_rows(df, 2) print(list_of_dataframes)
The code for this article is available on GitHub

Running the code sample produces the following output.

shell
[ name experience salary 0 A 1 175.1 1 B 1 180.2, name experience salary 3 D 7 205.4 4 E 7 210.5]

split dataframe into chunks using dataframe iloc

The function takes a DataFrame and the number of chunks as parameters and returns a list containing the DataFrame chunks.

By default, the function splits the DataFrame into 2 chunks, however, you can set the chunk_size argument to any other value.

# Split a Pandas DataFrame every N rows using a list comprehension

Here is a reusable function that splits a Pandas DataFrame every N rows using a list comprehension.

main.py
import pandas as pd df = pd.DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'experience': [1, 1, 5, 7, 7, 10], 'salary': [175.1, 180.2, 190.3, 205.4, 210.5, 225.3], }) def split_every_n_rows(dataframe, chunk_size=2): return [ dataframe[index:index + chunk_size] for index in range(0, df.shape[0], chunk_size) ] list_of_dataframes = split_every_n_rows(df, 2) print(list_of_dataframes)

Running the code sample produces the following output.

shell
[ name experience salary 0 A 1 175.1 1 B 1 180.2, name experience salary 2 C 5 190.3 3 D 7 205.4, name experience salary 4 E 7 210.5 5 F 10 225.3]

split pandas dataframe every n rows using list comprehension

The code for this article is available on GitHub

The function splits the DataFrame every N rows using a list comprehension.

It iterates over a range() object with a step of chunk_size and returns a list containing the DataFrame chunks.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.