How to remove Accents from a String in Python

avatar

Borislav Hadzhiev

Last updated: Aug 14, 2022

banner

Photo from Unsplash

Remove accents from a String in Python #

Use the unidecode package to remove the accents from a string, e.g. str_without_accents = unidecode(str_with_accents). The unidecode() function will remove all the accents from the string by replacing the characters with characters that can safely be encoded to ASCII.

The first thing you should do is install the unidecode package.

shell
pip install Unidecode

Now you can import and use the unidecode function.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ' str_without_accents = unidecode(str_with_accents) print(str_without_accents) # 👉️ 'AeuON'
The unidecode function takes a string that possibly contains non-ASCII characters and returns a string that can safely be encoded to ASCII.

If your string contains characters that unidecode cannot translate to ASCII-compatible characters, the function replaces them with empty strings.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' str_without_accents = unidecode(str_with_accents) print(str_without_accents) # 👉️ 'AeuON'

Notice that the \ue123 character couldn't get converted to an ASCII-compatible character and got dropped from the string.

If you want to raise an error if the unidecode function encounters a character it cannot translate to an ASCII-compatible character, set the errors keyword argument to strict.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' # ⛔️ unidecode.UnidecodeError: no replacement found for character '\ue123' in position 5 str_without_accents = unidecode(str_with_accents, errors='strict')

The unidecode function found no replacement for the \ue123 character, so it raised an error.

The unidecode package exposes a UnidecodeError object that gives us access to the index of the character that couldn't get translated.

main.py
from unidecode import unidecode, UnidecodeError str_with_accents = 'ÂéüÒÑ\ue123' # ⛔️ unidecode.UnidecodeError: no replacement found for character '\ue123' in position 5 try: str_without_accents = unidecode(str_with_accents, errors='strict') except UnidecodeError as e: print(e.index) # 👉️ 5

The character at index 5 raised the error.

You can also set the errors keyword argument to replace to replace the character that cannot be translated with another string.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' str_without_accents = unidecode( str_with_accents, errors='replace', replace_str='?' ) print(str_without_accents) # 👉️ 'AeuON?'
The replace_str keyword argument is used to specify the replacement string.

You can use the preserve keyword argument if you want to preserve the characters that cannot be translated to ASCII-compatible characters.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' str_without_accents = unidecode( str_with_accents, errors='preserve', ) print(str_without_accents) # 👉️ 'AeuON'

However, if errors is set to preserve, the unidecode function doesn't produce an ASCII-compatible string.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.