Solve - UnicodeDecodeError: 'ascii' codec can't decode byte

avatar

Borislav Hadzhiev

Sun May 01 20222 min read

banner

Photo by Mert Kahveci

Solve - UnicodeDecodeError: 'ascii' codec can't decode byte #

The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8.

unicodedecodeerror ascii codec cant decode byte in position

Here is an example of how the error occurs.

I have a file called example.txt with the following contents.

example.txt
𝘈Ḇ𝖢𝕯٤ḞԍНǏ hello world

And here is the code that tries to decode the contents of example.txt using the ascii codec.

main.py
# ⛔️ UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) with open('example.txt', 'r', encoding='ascii') as f: lines = f.readlines() print(lines)

The error is caused because the example.txt file doesn't use the ascii encoding.

If you know the encoding the file uses, make sure to specify it using the encoding keyword argument.

Otherwise, the first thing you can try is setting the encoding to utf-8.

main.py
# 👇️ set encoding to utf-8 with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() print(lines) # 👉️ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world']
The utf-8 encoding is capable of encoding over a million valid character code points in Unicode.

You can view all of the standard encodings in this table of the official docs.

Encoding is the process of converting a string to a bytes object and decoding is the process of converting a bytes object to a string.

When decoding a bytes object, we have to use the same encoding that was used to encode the string to a bytes object.

Here is an example that shows how using a different encoding to encode a string to bytes than the one used to decode the bytes object causes the error.

main.py
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # ⛔️ UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) my_text_again = my_binary_data.decode('ascii')

We can solve the error by using the utf-8 encoding to decode the bytes object.

main.py
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # 👉️ b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f' print(my_binary_data) # ✅ specify correct encoding my_text_again = my_binary_data.decode('utf-8') print(my_text_again) # 👉️ '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'

If you get an error when decoding the bytes using the utf-8 encoding, you can try setting the errors keyword argument to ignore to ignore the characters that cannot be decoded.

main.py
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # 👇️ set errors to ignore my_text_again = my_binary_data.decode('utf-8', errors='ignore') print(my_text_again)

Note that ignoring characters that cannot be decoded can lead to data loss.

Here is an example where errors is set to ignore when opening a file.

main.py
# 👇️ set errors to ignore with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f: lines = f.readlines() # ✅ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world'] print(lines)

Opening the file with an incorrect encoding with errors set to ignore won't raise an error.

main.py
with open('example.txt', 'r', encoding='ascii', errors='ignore') as f: lines = f.readlines() # ✅ ['\n', 'hello world'] print(lines)
Use the search field on my Home Page to filter through my more than 1,000 articles.