Count number of unique Words/Characters in String in Python

avatar
Borislav Hadzhiev

Last updated: Apr 9, 2024
8 min

banner

# Table of Contents

  1. Count the number of unique words in a String in Python
  2. Count the unique words in a text File in Python
  3. Count the number of unique characters in a String in Python

# Count the number of unique words in a String in Python

To count the number of unique words in a string:

  1. Use the str.split() method to split the string into a list of words.
  2. Use the set() class to convert the list to a set.
  3. Use the len() function to get the count of unique words in the string.
main.py
my_str = 'one one two two' unique_words = set(my_str.split()) print(unique_words) # ๐Ÿ‘‰๏ธ {'one', 'two'} length = len(unique_words) print(length) # ๐Ÿ‘‰๏ธ 2

count number of unique words in string

The code for this article is available on GitHub

If you need to count the unique words in a file, click on the following subheading:

We first used the str.split() method to split the string into a list of words.

main.py
my_str = 'one one two two' print(my_str.split()) # ๐Ÿ‘‰๏ธ ['one', 'one', 'two', 'two']

The str.split() method splits the string into a list of substrings using a delimiter.

When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.

The next step is to use the set() class to convert the list of words to a set object.

main.py
my_str = 'one one two two' unique_words = set(my_str.split()) print(unique_words) # ๐Ÿ‘‰๏ธ {'one', 'two'}

The set() class takes an iterable optional argument and returns a new set object with elements taken from the iterable.

Set objects store an unordered collection of unique elements, so converting the list to a set removes all duplicate elements.

The last step is to use the len() function to get the number of unique words.

main.py
my_str = 'one one two two' unique_words = set(my_str.split()) print(unique_words) # ๐Ÿ‘‰๏ธ {'one', 'two'} length = len(unique_words) print(length) # ๐Ÿ‘‰๏ธ 2

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

# Count the unique words in a text File in Python

To count the unique words in a text file:

  1. Read the contents of the file into a string and split it into words.
  2. Use the set() class to convert the list to a set object.
  3. Use the len() function to count the unique words in the text file.
main.py
with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # ๐Ÿ‘‰๏ธ ['one', 'one', 'two', 'two', 'three', 'three'] unique_words = set(words) print(len(unique_words)) # ๐Ÿ‘‰๏ธ 3 print(unique_words) # {'three', 'one', 'two'}

count number of unique words in text file

The code for this article is available on GitHub

The example above assumes that you have a file named example.txt with the following contents.

example.txt
one one two two three three

We opened the file in reading mode and used the read() method to read its contents into a string.

The next step is to use the str.split() method to split the string into a list of words.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # ๐Ÿ‘‰๏ธ ['one', 'one', 'two', 'two', 'three', 'three']

The str.split() method splits the string into a list of substrings using a delimiter.

When no separator is passed to the str.split() method, it splits the input string on one or more whitespace characters.

We used the set() class to convert the list to a set object.

main.py
with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # ๐Ÿ‘‰๏ธ ['one', 'one', 'two', 'two', 'three', 'three'] unique_words = set(words) print(len(unique_words)) # ๐Ÿ‘‰๏ธ 3 print(unique_words) # {'three', 'one', 'two'}

The set() class takes an iterable optional argument and returns a new set object with elements taken from the iterable.

Set objects are an unordered collection of unique elements, so converting the list to a set removes all duplicate elements.

The last step is to use the len() function to get the count of unique words in the file.

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

# Count the number of unique words in a String using for loop

This is a five-step process:

  1. Declare a new variable that stores an empty list.
  2. Use the str.split() method to split the string into a list of words.
  3. Use a for loop to iterate over the list.
  4. Use the list.append() method to append all unique words to the list.
  5. Use the len() function to get the length of the list.
main.py
my_str = 'one one two two' unique_words = [] for word in my_str.split(): if word not in unique_words: unique_words.append(word) print(len(unique_words)) # ๐Ÿ‘‰๏ธ 2 print(unique_words) # ๐Ÿ‘‰๏ธ ['one', 'two']

count number of unique words in string using for loop

The code for this article is available on GitHub

We used the str.split() method to split the string into a list of words and used a for loop to iterate over the list.

On each iteration, we use the not in operator to check if the element is not present in the list.

The in operator tests for membership. For example, x in l evaluates to True if x is a member of l, otherwise it evaluates to False.

x not in l returns the negation of x in l.

The list.append() method adds an item to the end of the list.

main.py
my_list = ['bobby', 'hadz'] my_list.append('com') print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

The last step is to use the len() function to get the number of unique words in the string.

# Count the unique words in a text File using a for loop

This is a five-step process:

  1. Declare a new variable that stores an empty list.
  2. Read the contents of the file into a string and split it into words.
  3. Use a for loop to iterate over the list.
  4. Use the list.append() method to append all unique words to the list.
  5. Use the len() function to get the length of the list.
main.py
unique_words = [] with open('example.txt', 'r', encoding='utf-8') as f: words = f.read().split() print(words) # ๐Ÿ‘‰๏ธ ['one', 'one', 'two', 'two', 'three', 'three'] for word in words: if word not in unique_words: unique_words.append(word) print(len(unique_words)) # ๐Ÿ‘‰๏ธ 3 print(unique_words) # ๐Ÿ‘‰๏ธ ['one', 'two', 'three']

count unique words in text file using for loop

The code for this article is available on GitHub

We read the contents of the file into a string and used the str.split() method to split the string into a list of words.

On each iteration, we use the not in operator to check if the word is not present in the list of unique words.

If the condition is met, we use the list.append() method to append the value to the list.

The in operator tests for membership. For example, x in l evaluates to True if x is a member of l, otherwise it evaluates to False.

x not in l returns the negation of x in l.

The list.append() method adds an item to the end of the list.

main.py
my_list = ['bobby', 'hadz'] my_list.append('com') print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

The last step is to use the len() function to get the count of unique words in the text file.

# Count the number of unique characters in a String in Python

To count the number of unique characters in a string:

  1. Use the set() class to convert the string to a set of unique characters.
  2. Use the len() function to get the number of unique characters in the string.
main.py
my_str = 'bobby' # โœ… using set() result = len(set(my_str)) print(result) # ๐Ÿ‘‰๏ธ 3

count number of unique characters in string

The code for this article is available on GitHub

If you need to get the unique characters in a string, use the following code sample instead.

main.py
my_str = 'bobby' # โœ… Get unique characters in a string (order not preserved) result = ''.join(set(my_str)) print(result) # ๐Ÿ‘‰๏ธ byo

The example uses the set() class to count the number of unique characters in a string.

The set() class takes an iterable optional argument and returns a new set object with elements taken from the iterable.

main.py
my_str = 'bobby' print(set(my_str)) # ๐Ÿ‘‰๏ธ {'y', 'b', 'o'}
Set objects store an unordered collection of unique elements, so converting the string to a set removes all duplicate characters.

The last step is to use the len() function to get the total count.

main.py
my_str = 'bobby' result = len(set(my_str)) print(result) # ๐Ÿ‘‰๏ธ 3

The len() function returns the length (the number of items) of an object.

The argument the function takes may be a sequence (a string, tuple, list, range or bytes) or a collection (a dictionary, set, or frozen set).

If you need to get the unique characters in the string, use the str.join() method instead of the len() function.

main.py
my_str = 'bobby' result = ''.join(set(my_str)) print(result) # ๐Ÿ‘‰๏ธ byo

The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

Alternatively, you can use the dict.fromkeys() method.

# Count the number of unique characters in a String using dict.fromkeys()

This is a two-step process:

  1. Use the dict.fromkeys() method to create a dictionary from the string.
  2. Use the len() function to get the number of unique characters in the string.
main.py
my_str = 'bobby' result = len(dict.fromkeys(my_str)) print(result) # ๐Ÿ‘‰๏ธ 3

count number of unique characters using dict fromkeys

The code for this article is available on GitHub

If you need to get the unique characters, use the following code sample instead.

main.py
my_str = 'bobby' result = ''.join(dict.fromkeys(my_str).keys()) print(result) # ๐Ÿ‘‰๏ธ boy

The dict.fromkeys method takes an iterable and a value and creates a new dictionary with keys from the iterable and values set to the provided value.

main.py
my_str = 'bobby' # ๐Ÿ‘‡๏ธ {'b': None, 'o': None, 'y': None} print(dict.fromkeys(my_str))

Dictionary keys are unique, so any duplicate characters get removed.

If you need to get the unique characters in the string, use the str.join() method instead of the len() function.

main.py
my_str = 'bobby' result = ''.join(dict.fromkeys(my_str).keys()) print(result) # ๐Ÿ‘‰๏ธ boy

We used the dict.keys() method to get a view of the dictionary's keys and joined the object into a string.

Dictionaries preserve the insertion order of keys in Python 3.7 and more recent version.

Alternatively, you can use a simple for loop.

# Count the number of unique characters in a String using for loop

This is a four-step process:

  1. Declare a new variable that stores an empty list.
  2. Use a for loop to iterate over the string.
  3. Use the list.append() method to append all unique characters to the list.
  4. Use the len() function to get the length of the list.
main.py
my_str = 'bobby' unique_chars = [] for char in my_str: if char not in unique_chars: unique_chars.append(char) print(len(unique_chars)) # ๐Ÿ‘‰๏ธ 3 print(unique_chars) # ๐Ÿ‘‰๏ธ ['b', 'o', 'y']

count number of unique characters in string using for loop

The code for this article is available on GitHub

We used a for loop to iterate over the string.

On each iteration, we use the not in operator to check if the character is not present in the list.

If the condition is met, we use the list.append() method to append the character to the list.

The in operator tests for membership. For example, x in l evaluates to True if x is a member of l, otherwise it evaluates to False.

x not in l returns the negation of x in l.

The list.append() method adds an item to the end of the list.

main.py
my_list = ['bobby', 'hadz'] my_list.append('com') print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

The last step is to use the len() function to get the length of the list of unique characters.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev