Last updated: Apr 9, 2024
Reading timeยท4 min
Use the unicodedata.normalize()
method to remove \xa0 from a string.
The unicodedata.normalize
method returns the normal form for the provided
Unicode string by replacing all compatibility characters with their
equivalents.
import unicodedata my_str = 'bobby\xa0hadz' result = unicodedata.normalize('NFKD', my_str) print(result) # ๐๏ธ 'bobby hadz'
\xa0
character represents a non-breaking space, so the way to remove it from a string is to replace it with a space.The unicodedata.normalize() method returns the normal form for the provided Unicode string.
The first argument is the form
- NFKD
in our case. The normal form NFDK
replaces all compatibility characters with their equivalents.
\xa0
character is a space, it gets replaced by a space.If you get unexpected results when using the NFKD
form, try using one of
NFC
, NFKC
and NFD
.
The NFKC
form first applies compatibility decomposition, then canonical
decomposition.
import unicodedata my_str = 'bobby\xa0hadz' result = unicodedata.normalize('NFKC', my_str) print(result) # ๐๏ธ 'bobby hadz'
str.replace()
Alternatively, you can use the str.replace()
method.
my_str = 'bobby\xa0hadz' result = my_str.replace('\xa0', ' ') print(result) # ๐๏ธ 'bobby hadz'
Since the \xa0
character represents a non-breaking space, we can simply
replace it with a space.
The str.replace() method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.
The method takes the following parameters:
Name | Description |
---|---|
old | The substring we want to replace in the string |
new | The replacement for each occurrence of old |
count | Only the first count occurrences are replaced (optional) |
Note that the method doesn't change the original string. Strings are immutable in Python.
You can also use the str.split()
and str.join()
methods to remove the \xa0
characters from a string.
my_str = 'bobby\xa0hadz' result = ' '.join(my_str.split()) print(result) # ๐๏ธ bobby hadz
We used the str.split()
method to
split the string on each whitespace character.
my_str = 'bobby\xa0hadz' # ๐๏ธ ['bobby', 'hadz'] print(my_str.split())
str.split()
method, it splits the input string on one or more whitespace characters.We could have also explicitly passed the \xa0
character in the call to the
split()
method.
my_str = 'bobby\xa0hadz' result = ' '.join(my_str.split('\xa0')) print(result) # ๐๏ธ bobby hadz
The last step is to join the list of strings with a space separator.
my_str = 'bobby\xa0hadz' # ๐๏ธ bobby hadz print(' '.join(my_str.split('\xa0')))
The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.
If you use the BeautifulSoup4 module, use
the get_text()
method to remove the \xa0
characters from a string.
from bs4 import BeautifulSoup my_html = 'bobby\xa0hadz' result = BeautifulSoup(my_html, 'lxml').get_text(strip=True) print(result) # ๐๏ธ bobby hadz
Make sure you have beautifulsoup4
and
lxml installed in order to run the code
snippet.
pip install lxml pip install beautifulsoup4 # ๐๏ธ or with pip3 pip3 install lxml pip3 install beautifulsoup4
To remove \xa0
characters from a list of strings:
str.replace()
method to replace occurrences of
\xa0
with a space.\xa0
characters.my_list = ['bobby\xa0', '\xa0hadz'] result = [string.replace('\xa0', ' ') for string in my_list] print(result) # ๐๏ธ ['bobby ', ' hadz']
We used a list comprehension to iterate over the list.
On each iteration, we replace occurrences of the \xa0
character with a space
and return the result.
You can learn more about the related topics by checking out the following tutorials: