How to remove Accents from a String in Python

avatar
Borislav Hadzhiev

Last updated: Apr 9, 2024
4 min

banner

# Table of Contents

  1. Remove accents from a String in Python
  2. Remove the accents from a List of Strings
  3. Raising an error if an incompatible character is encountered
  4. Replacing characters that cannot be translated
  5. Preserving characters that cannot be translated
  6. Remove accents from a String using unicodedata

# Remove accents from a String in Python

Use the unidecode package to remove the accents from a string.

The unidecode() function will remove all the accents from the string by replacing the characters with characters that can safely be encoded to ASCII.

The first thing you should do is install the unidecode package.

shell
pip install Unidecode # 👇️ or with pip3 pip3 install Unidecode

install unidecode

Now you can import and use the unidecode function.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ' str_without_accents = unidecode(str_with_accents) print(str_without_accents) # 👉️ 'AeuON'

remove accents from string

The code for this article is available on GitHub
The unidecode function takes a string that possibly contains non-ASCII characters and returns a string that can safely be encoded to ASCII.

If your string contains characters that unidecode cannot translate to ASCII-compatible characters, the function replaces them with empty strings.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' str_without_accents = unidecode(str_with_accents) print(str_without_accents) # 👉️ 'AeuON'

Notice that the \ue123 character couldn't get converted to an ASCII-compatible character and got dropped from the string.

# Remove the accents from a List of Strings

If you need to remove the accents from a list of strings, use a list comprehension.

main.py
from unidecode import unidecode names = ['Renée', 'Noël', 'Sørina', 'Adrián', 'Zoë'] names_without_accents = [ unidecode(name) for name in names ] # 👇️ ['Renee', 'Noel', 'Sorina', 'Adrian', 'Zoe'] print(names_without_accents)

remove accents from list of strings

The code for this article is available on GitHub

List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the unidecode() method to remove the accents from the current list item and return the result.

The strings in the new list don't contain any accents.

# Raising an error if an incompatible character is encountered

If you want to raise an error if the unidecode function encounters a character it cannot translate to an ASCII-compatible character, set the errors keyword argument to strict.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' # ⛔️ unidecode.UnidecodeError: no replacement found for character '\ue123' in position 5 str_without_accents = unidecode(str_with_accents, errors='strict')
The code for this article is available on GitHub

The unidecode function found no replacement for the \ue123 character, so it raised an error.

The unidecode package exposes a UnidecodeError object that gives us access to the index of the character that couldn't get translated.

main.py
from unidecode import unidecode, UnidecodeError str_with_accents = 'ÂéüÒÑ\ue123' # ⛔️ unidecode.UnidecodeError: no replacement found for character '\ue123' in position 5 try: str_without_accents = unidecode(str_with_accents, errors='strict') except UnidecodeError as e: print(e.index) # 👉️ 5

The character at index 5 raised the error.

# Replacing characters that cannot be translated

You can also set the errors keyword argument to replace to replace the character that cannot be translated with another string.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' str_without_accents = unidecode( str_with_accents, errors='replace', replace_str='?' ) print(str_without_accents) # 👉️ 'AeuON?'

replacing characters that cannot be translated

The code for this article is available on GitHub
The replace_str keyword argument is used to specify the replacement string.

# Preserving characters that cannot be translated

You can use the preserve keyword argument if you want to preserve the characters that cannot be translated to ASCII-compatible characters.

main.py
from unidecode import unidecode str_with_accents = 'ÂéüÒÑ\ue123' str_without_accents = unidecode( str_with_accents, errors='preserve', ) print(str_without_accents) # 👉️ 'AeuON'

preserving characters that cannot be translated

However, if errors is set to preserve, the unidecode function doesn't produce an ASCII-compatible string.

# Remove accents from a String using unicodedata

You can also use the built-in unicodedata module to remove the accents from a string.

main.py
import unicodedata def remove_accents(string): return ''.join(char for char in unicodedata.normalize('NFD', string) if unicodedata.category(char) != 'Mn') str_with_accents = 'ÂéüÒÑ' print(remove_accents(str_with_accents)) # 👉️ AeuON # 👇️ Noel, Adrian, Sørina, Zoe, Renee print(remove_accents('Noël, Adrián, Sørina, Zoë, Renée'))
The code for this article is available on GitHub

The unicodatata module is a built-in Python module, so you don't have to install anything.

The code sample uses a generator expression to iterate over the characters of the string.

The unicodedata.normalize() method returns the normal form for the given string.

The first argument is the form - NFD in our case. The normal form NFD translates each character into its decomposed form.

main.py
import unicodedata str_with_accents = 'ÂéüÒÑ' result = list((char for char in unicodedata.normalize('NFD', str_with_accents) if unicodedata.category(char) != 'Mn')) print(result) # 👉️ ['A', 'e', 'u', 'O', 'N']

The unicodedata.category() method takes a character as a parameter and returns the general category assigned to the character.

main.py
import unicodedata str_with_accents = 'aeÂéüÒÑ' print(unicodedata.category(str_with_accents[0])) # Ll print(unicodedata.category(str_with_accents[1])) # Ll print(unicodedata.category(str_with_accents[2])) # Lu print(unicodedata.category(str_with_accents[3])) # Ll
The code for this article is available on GitHub

The Mn character category is a non-spacing combining mark.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.