UnicodeDecodeError: 'charmap' codec can't decode byte

avatar
Borislav Hadzhiev

Last updated: Apr 8, 2024
4 min

banner

# UnicodeDecodeError: 'charmap' codec can't decode byte

The Python "UnicodeDecodeError: 'charmap' codec can't decode byte in position" occurs when we specify an incorrect encoding or don't explicitly set the encoding keyword argument when opening a file.

To solve the error, specify the correct encoding, e.g. utf-8.

unicodedecodeerror charmap codec cant decode byte

Here is an example of how the error occurs.

I have a file called example.txt with the following contents.

example.txt
𝘈Ḇ𝖢𝕯٤ḞԍНǏ hello world

And here is the code that tries to decode the contents of example.txt.

main.py
# ⛔️ UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined> with open('example.txt', 'r', encoding='cp856') as f: lines = f.readlines() print(lines)

specifying incorrect encoding

The error is caused because the example.txt file doesn't use the specified encoding (cp856).

example.txt
𝘈Ḇ𝖢𝕯٤ḞԍНǏ hello world

# Specifying the correct encoding when opening the file

If you know the encoding the file uses, make sure to specify it using the encoding keyword argument.

Otherwise, the first thing you can try is setting the encoding to utf-8.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() # ✅ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world'] print(lines)

specify the correct encoding

The utf-8 encoding is capable of encoding over a million valid character code points in Unicode.

The same approach can be used if you use the open() function directly instead of using the with statement.

main.py
my_file = open('example.txt', 'r', encoding='utf-8') lines = my_file.readlines() print(lines) # ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world']

specify correct encoding in open function

You can view all of the standard encodings in this table of the official docs.

Some of the common encodings are ascii, latin-1 and utf-32.

# Specifying an encoding when using the patlib module

If you use the pathlib module, specify an encoding when calling the specific method.

main.py
from pathlib import Path text = Path('example.txt').read_text(encoding='utf-8') # 𝘈Ḇ𝖢𝕯٤ḞԍНǏ # hello world print(text)

You can pass the encoding when calling methods such as Path.read_text or Path.write_text.

# Ignoring characters that cannot be decoded

If the error persists, you could set the errors keyword argument to ignore to ignore the characters that cannot be decoded.

Note that ignoring characters that cannot be decoded can lead to data loss.

main.py
# 👇️ Set errors to ignore with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f: lines = f.readlines() # ✅ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world'] print(lines)

Opening the file with an incorrect encoding with errors set to ignore won't raise a UnicodeDecodeError.

main.py
with open('example.txt', 'r', encoding='cp856', errors='ignore') as f: lines = f.readlines() # ✅ ['\xadרט©ז\xadצ\xadץ»┘©×םן\n', 'hello world'] print(lines)

The characters that cannot be decoded are simply ignored.

# Opening the file in binary mode

If you don't need to interact with the contents of the file, you can open it in binary mode without decoding it.

main.py
with open('example.txt', 'rb') as f: lines = f.readlines() # ✅ [b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f\n', b'hello world'] print(lines)

We opened the file in binary mode (using the rb - read binary mode), so the lines list contains bytes objects.

You can use this approach if you need to upload the file to a remote server and don't need to decode it.

Encoding is the process of converting a string to a bytes object and decoding is the process of converting a bytes object to a string.

When decoding a bytes object, we have to use the same encoding that was used to encode the string to a bytes object.

# Try using the cp437 encoding

If the error persists, try to use the cp437 encoding when opening the file.

main.py
with open('example.txt', 'r', encoding='cp437') as f: lines = f.readlines() # ✅ ['≡¥ÿêß╕å≡¥ûó≡¥ò»┘ñß╕₧╘ì╨¥╟Å\n', 'hello world'] print(lines)

The Code page 437 encoding is the character set of the original IBM personal computer and includes all printable ASCII characters as well as some accented letters.

If you still get an error, set the errors keyword argument to ignore in the call to the open() function.

main.py
with open('example.txt', 'r', encoding='cp437', errors='ignore') as f: lines = f.readlines() # ✅ ['≡¥ÿêß╕å≡¥ûó≡¥ò»┘ñß╕₧╘ì╨¥╟Å\n', 'hello world'] print(lines)

The characters that cannot be decoded are simply ignored which may cause data loss.

If the error persists, try other encodings such as utf-16, utf-32, latin-1, etc.

# Trying to find the encoding of the file

You can try to figure out what the encoding of the file is by using the file command.

The command is available on macOS and Linux, but can also be used on Windows if you have Git and Git Bash installed.

Make sure to run the command in Git Bash if on Windows.

Open your shell in the directory that contains the file and run the following command.

shell
file *

find encoding of file

The screenshot shows that the file uses the ASCII encoding.

This is the encoding you should specify when opening the file.

main.py
with open('example.txt', 'r', encoding='ascii') as f: lines = f.readlines() print(lines)

If you are on Windows, you can also:

  1. Open the file in the basic version of Notepad.
  2. Click on "Save as".
  3. Look at the selected encoding right next to the "Save" button.

view selected encoding on windows

The screenshot shows that the encoding for the file is UTF-8, so that's what we have to specify when calling the open() function.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() print(lines)

# Try using the latin-1 encoding

If the error persists, try to use the latin-1 encoding when opening the file.

main.py
with open('example.txt', 'r', encoding='latin-1') as f: lines = f.readlines() # ['ð\x9d\x98\x88á¸\x86ð\x9d\x96¢ð\x9d\x95¯Ù¤á¸\x9eÔ\x8dÐ\x9dÇ\x8f\n', 'hello world'] print(lines)

Make sure to check if you get legible results when using the latin-1 encoding.

# Using a different encoding causes the error

Here is an example that shows how using a different encoding to encode a string to bytes than the one used to decode the bytes object causes the error.

main.py
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # ⛔️ UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined> my_text_again = my_binary_data.decode('cp856')

We can solve the error by using the utf-8 encoding to decode the bytes object.

main.py
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # 👉️ b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f' print(my_binary_data) # ✅ Specify the correct encoding my_text_again = my_binary_data.decode('utf-8') print(my_text_again) # 👉️ '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.