Count the unique words in a text File in Python

avatar

Borislav Hadzhiev

Last updated: Sep 22, 2022

banner

Photo from Unsplash

Count the unique words in a text File in Python #

To count the unique words in a text file:

  1. Read the contents of the file into a string and split it into words.
  2. Use the set() class to convert the list to a set object.
  3. Use the len() function to count the unique words in the text file.
main.py
with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # 👉️ ['one', 'one', 'two', 'two', 'three', 'three'] unique_words = set(words) print(len(unique_words)) # 👉️ 3 print(unique_words) # {'three', 'one', 'two'}

The example above assumes that you have a file named example.txt with the following contents.

example.txt
one one two two three three

We opened the file in reading mode and used the read() method to read its contents into a string.

The next step is to use the str.split() method to split the string into a list of words.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # 👉️ ['one', 'one', 'two', 'two', 'three', 'three']

The str.split() method splits the string into a list of substrings using a delimiter.

When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.

We used the set() class to convert the list to a set object.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # 👉️ ['one', 'one', 'two', 'two', 'three', 'three'] unique_words = set(words) print(len(unique_words)) # 👉️ 3 print(unique_words) # {'three', 'one', 'two'}

The set() class takes an iterable optional argument and returns a new set object with elements taken from the iterable.

Set objects are an unordered collection of unique elements, so converting the list to a set removes all duplicate elements.

The last step is to use the len() function to get the count of unique words in the file.

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

Alternatively, you can use a simple for loop.

Count the unique words in a text File using a for loop #

To count the unique words in a text file:

  1. Declare a new variable that stores an empty list.
  2. Read the contents of the file into a string and split it into words.
  3. Use a for loop to iterate over the list.
  4. Use the list.append() method to append all unique words to the list.
  5. Use the len() function to get the length of the list.
main.py
unique_words = [] with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # 👉️ ['one', 'one', 'two', 'two', 'three', 'three'] for word in words: if word not in unique_words: unique_words.append(word) print(len(unique_words)) # 👉️ 3 print(unique_words) # 👉️ ['one', 'two', 'three']

We read the contents of the file into a string and used the str.split() method to split the string into a list of words.

On each iteration, we use the not in operator to check if the word is not present in the list of unique words.

If the condition is met, we use the list.append() method to append the value to the list.

The in operator tests for membership. For example, x in l evaluates to True if x is a member of l, otherwise it evaluates to False.

x not in l returns the negation of x in l.

The list.append() method adds an item to the end of the list.

main.py
my_list = ['bobby', 'hadz'] my_list.append('com') print(my_list) # 👉️ ['bobby', 'hadz', 'com']

The last step is to use the len() function to get the count of unique words in the text file.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.