Fix - UnicodeDecodeError: 'charmap' codec can't decode byte

avatar

Borislav Hadzhiev

Last updated: May 1, 2022

banner

Check out my new book

Fix - UnicodeDecodeError: 'charmap' codec can't decode byte #

The Python "UnicodeDecodeError: 'charmap' codec can't decode byte in position" occurs when we specify an incorrect encoding or don't explicitly set the encoding keyword argument when opening a file. To solve the error, specify the correct encoding, e.g. utf-8.

unicodedecodeerror charmap codec cant decode byte

Here is an example of how the error occurs.

I have a file called example.txt with the following contents.

example.txt
๐˜ˆแธ†๐–ข๐•ฏูคแธžิะว hello world

And here is the code that tries to decode the contents of example.txt.

main.py
# โ›”๏ธ UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined> with open('example.txt', 'r', encoding='cp856') as f: lines = f.readlines() print(lines)

The error is caused because the example.txt file doesn't use the specified encoding.

If you know the encoding the file uses, make sure to specify it using the encoding keyword argument.

Otherwise, the first thing you can try is setting the encoding to utf-8.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() # โœ… ['๐˜ˆแธ†๐–ข๐•ฏูคแธžิะว\n', 'hello world'] print(lines)
The utf-8 encoding is capable of encoding over a million valid character code points in Unicode.

You can view all of the standard encodings in this table of the official docs.

Some of the common encodings are ascii, latin-1 and utf-32.

If the error persists, you could set the errors keyword argument to ignore to ignore the characters that cannot be decoded.

Note that ignoring characters that cannot be decoded can lead to data loss.

main.py
# ๐Ÿ‘‡๏ธ set errors to ignore with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f: lines = f.readlines() # โœ… ['๐˜ˆแธ†๐–ข๐•ฏูคแธžิะว\n', 'hello world'] print(lines)

Opening the file with an incorrect encoding with errors set to ignore won't raise a UnicodeDecodeError.

main.py
with open('example.txt', 'r', encoding='cp856', errors='ignore') as f: lines = f.readlines() # โœ… ['\xadืจื˜ยฉื–\xadืฆ\xadืฅยปโ”˜ยฉร—ืืŸ\n', 'hello world'] print(lines)

If you don't need to interact with the contents of the file, you can open it in binary mode without decoding it.

main.py
with open('example.txt', 'rb') as f: lines = f.readlines() # โœ… [b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f\n', b'hello world'] print(lines)

We opened the file in binary mode (using the rb mode), so the lines list contains bytes objects.

You can use this approach if you need to upload the file to a remote server and don't need to decode it.

Encoding is the process of converting a string to a bytes object and decoding is the process of converting a bytes object to a string.

When decoding a bytes object, we have to use the same encoding that was used to encode the string to a bytes object.

Here is an example that shows how using a different encoding to encode a string to bytes than the one used to decode the bytes object causes the error.

main.py
my_text = '๐˜ˆแธ†๐–ข๐•ฏูคแธžิะว' my_binary_data = my_text.encode('utf-8') # โ›”๏ธ UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined> my_text_again = my_binary_data.decode('cp856')

We can solve the error by using the utf-8 encoding to decode the bytes object.

main.py
my_text = '๐˜ˆแธ†๐–ข๐•ฏูคแธžิะว' my_binary_data = my_text.encode('utf-8') # ๐Ÿ‘‰๏ธ b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f' print(my_binary_data) # โœ… specify correct encoding my_text_again = my_binary_data.decode('utf-8') print(my_text_again) # ๐Ÿ‘‰๏ธ '๐˜ˆแธ†๐–ข๐•ฏูคแธžิะว'
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2022 Borislav Hadzhiev