Remove duplicate words from a string in Python

avatar

Borislav Hadzhiev

Last updated: Aug 13, 2022

banner

Photo from Unsplash

Remove duplicate words from a string in Python #

To remove the duplicate words from a string:

  1. Use the OrderedDict class to get an ordered dictionary without any duplicates.
  2. Use the join() method to join the keys of the dictionary into a string.
main.py
from collections import OrderedDict my_str = 'one two three one two four' result = ' '.join(OrderedDict.fromkeys(my_str.split())) print(result) # 👉️ 'one two three four'

We used the OrderedDict class to remove the duplicate words from a string.

The OrderedDict collection is an instance of a dict subclass.

main.py
from collections import OrderedDict my_str = 'one two three one two four' # 👇️ OrderedDict([('one', None), ('two', None), ('three', None), ('four', None)]) print(OrderedDict.fromkeys(my_str.split()))

We used an ordered dictionary because dictionary keys are unique.

We used the str.split() method to split the string on each space.

main.py
my_str = 'one two three one two four' # 👇️ ['one', 'two', 'three', 'one', 'two', 'four'] print(my_str.split())

The str.split() method splits the string into a list of substrings using a delimiter.

If no delimiter is provided, the method splits the string on each whitespace character.

The dict.fromkeys method takes an iterable and a value and creates a new dictionary with keys from the iterable and values set to the provided value.

main.py
# 👇️ {'a': None, 'b': None, 'c': None} print(dict.fromkeys(['a', 'b', 'c'])) # 👇️ {'a': 100, 'b': 100, 'c': 100} print(dict.fromkeys(['a', 'b', 'c'], 100))

We only need the keys, so we didn't specify a value in the example.

The last step is to join the keys of the OrderedDict into a string.

main.py
from collections import OrderedDict my_str = 'one two three one two four' result = ' '.join(OrderedDict.fromkeys(my_str.split())) print(result) # 👉️ 'one two three four'

The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

We joined the collection of strings with a space separator.

Note that as of Python 3.7, the standard dict class is guaranteed to preserve the order as well.

We could replace the OrderedDict class with the dict class to achieve the same result.

main.py
my_str = 'one two three one two four' result = ' '.join(dict.fromkeys(my_str.split())) print(result) # 👉️ 'one two three four'

This also allows us to remove the import statement.

Which approach you pick is a matter of personal preference.

The OrderedDict class makes the code a little more readable but requires an extra import statement.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.