How to unzip a .gz file using Python [5 simple Ways]

avatar
Borislav Hadzhiev

Last updated: Apr 11, 2024
4 min

banner

# Table of Contents

  1. How to unzip a .gz file using Python
  2. Unzip a .gz file to read its contents
  3. Using the sh module to unzip a .gz file in Python

# How to unzip a .gz file using Python

To unzip a .gz file using Python:

  1. Use the gzip.open() method to open the gzip-compressed file in binary mode.
  2. Use the with open() statement to open a file you want to write the contents of the compressed file to.
  3. Use the shutil.copyfileobj() method to copy the data from the compressed file to the newly created file.
main.py
import gzip import shutil with gzip.open('example.json.gz', 'rb') as file_in: with open('example.json', 'wb') as file_out: shutil.copyfileobj(file_in, file_out) print('example.json file created')

unzip gz file in python

The code for this article is available on GitHub

The code sample assumes that you have a file named example.json.gz in the same directory as your Python script.

You can download a sample .gz file from wiki.mozilla.org.

Make sure to rename the file to example.json.gz once you download it.

Running the code sample produces the following example.json file in the same directory as my main.py script.

unzip gz file into local file

We used the with open() statement twice in the example.

The with open() statement takes care of automatically closing the file even if an error occurred.

The gzip.open() method opens a gzip-compressed file in binary or text mode and returns a text object.

main.py
import gzip import shutil with gzip.open('example.json.gz', 'rb') as file_in: # ...

The first argument the method takes is the filename and the second is the mode.

By default, the mode is set to rb. The rb mode stands for "read bytes".

The next step is to open the file to which we want to write the contents of the file object that was returned from gzip.open().

main.py
import gzip import shutil with gzip.open('example.json.gz', 'rb') as file_in: with open('example.json', 'wb') as file_out: shutil.copyfileobj(file_in, file_out) print('example.json file created')
The code for this article is available on GitHub

We passed 2 arguments to the shutil.copyfileobj() method:

  1. The source file.
  2. The output file.

The method copies the contents of the source file to the output file.

You can optionally pass a third argument to shutil.copyfileobj() - an integer length that sets the buffer size.

By default, the data is read in chunks to avoid uncontrolled memory consumption.

If a negative length is supplied, the data is copied without looping over the source data in chunks.

# Unzip a .gz file to read its contents

If you only need to unzip the .gz file to read its contents, use the following code sample instead.

main.py
import gzip with gzip.open('example.json.gz', 'rb') as file_in: file_contents = file_in.read() print(file_contents)

ungzip gz file read contents

The code for this article is available on GitHub

The code sample assumes that you have an example.json.gz file in the same directory as your Python script.

You can download a sample .gz file from wiki.mozilla.org.

Make sure to rename the file to example.json.gz once you download it.

We first use the gzip.open() method to open the gzip-compressed file in binary mode.

The next step is to use the file.read() method to read the file's contents.

You can also read the file line by line.

main.py
import gzip with gzip.open('example.json.gz', 'rb') as file_in: for line in file_in: print(line.decode().strip())

Notice that we used the bytes.decode method to convert each line from bytes to string.

If you have a .gz file that stores a .csv file, you can use the pandas library to read the file's contents.

main.py
import gzip import pandas as pd with gzip.open('example.csv.gz') as file_in: df = pd.read_csv(file_in) print(df.head())
The code for this article is available on GitHub

The code sample assumes that you have an example.csv.gz file that stores a .csv file.

The pandas.read_csv() method reads a comma-separated-values (CSV) file into a DataFrame.

Make sure you have the pandas module installed.

shell
pip install pandas # or with pip3 pip3 install pandas

You can also read a gzip-compressed .json file using pandas.read_json.

main.py
import pandas as pd df = pd.read_json( 'example.json.gz', lines=True, compression='gzip' ) print(df.tail())
The code for this article is available on GitHub

The pandas.read_json method converts a JSON string to a pandas object.

The compression keyword argument can be set to gzip for on-the-fly decompression of on-disk data.

# Using the sh module to unzip a .gz file in Python

You can also use the sh module to unzip a .gz file in Python.

First, make sure you have the module installed by running the following command.

shell
pip install sh pip3 install sh

The gunzip() method will unzip the .gz file into an example.json file and will delete the .gz file.

main.py
from sh import gunzip gunzip('example.json.gz')

The code sample assumes that you have a file named example.json.gz in the same directory as your Python script.

You can download a sample .gz file from wiki.mozilla.org.

Make sure to rename the file to example.json.gz once you download it.

After running the python main.py command, the following example.json file is produced.

file unzipped successfully

I've also written an article on how to create a Zip archive of a Directory in Python.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.