Remove the BOM character from a file in Python

avatar

Borislav Hadzhiev

Last updated: Aug 15, 2022

banner

Photo from Unsplash

Remove the BOM character from a file in Python #

Set the encoding to utf-8-sig to remove the BOM character when reading from a file, e.g. with open('example.txt', 'r', encoding='utf-8-sig') as f:. The utf-8--sig encoding skips the BOM byte if it appears as the first byte in the file.

main.py
# ✅ strip BOM when reading from a file with open('example.txt', 'r', encoding='utf-8-sig') as f: lines = f.readlines() print(lines) # -------------------------------------------------- # ✅ remove the BOM character from a string my_str = '\ufeffhello world' result = my_str.replace('\ufeff', '') print(repr(result)) # 👉️ 'hello world'

The first example uses the utf-8-sig encoding to strip the byte order mark (BOM) character when reading from a file.

main.py
with open('example.txt', 'r', encoding='utf-8-sig') as f: contents = f.read() print(contents)
The open() function takes an encoding keyword argument, which can be set to utf-8-sig to treat the byte order mark as metadata instead of a string.

When decoding, the utf-8-sig codec skips the BOM byte if it appears as the first byte in the file.

When using the utf-8 encoding, the use of the byte order mark (BOM) is discouraged and should be avoided.

If you have a string that contains a BOM character, use the str.replace() method to remove it.

main.py
my_str = '\ufeffhello world' result = my_str.replace('\ufeff', '') print(repr(result)) # 👉️ 'hello world'

The \ufeff character is a byte order mark (BOM) and is interpreted as a zero-width non-breaking space.

The BOM character causes an issue when we use an incorrect codec to decode bytes that were encoded using a different codec.

The str.replace method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.

The method takes the following parameters:

NameDescription
oldThe substring we want to replace in the string
newThe replacement for each occurrence of old
countOnly the first count occurrences are replaced (optional)

The method doesn't change the original string. Strings are immutable in Python.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.