Find a common substring between two strings in Python

avatar
Borislav Hadzhiev

Last updated: Apr 10, 2024
5 min

banner

# Table of Contents

  1. Find a common substring between two strings in Python
  2. Only find a leading common substring between two strings
  3. Find common characters between two Strings in Python

# Find a common substring between two strings in Python

To find a common substring between two strings:

  1. Use the SequenceMatcher class to get a Match object.
  2. Use the find_longest_match() method to find the longest matching substring.
  3. The method returns the longest matching block in the provided strings.
main.py
from difflib import SequenceMatcher string1 = 'one two three four' string2 = 'one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐Ÿ‘‰๏ธ Match(a=0, b=0, size=8) # ๐Ÿ‘‡๏ธ one two print(string1[match.a:match.a + match.size]) # ๐Ÿ‘‡๏ธ one two print(string2[match.b:match.b + match.size])

find common substring between two strings

The code for this article is available on GitHub

We passed the following 3 arguments to the SequenceMatcher class:

NameDescription
isjunkfunction that returns true if the element is junk and should be ignored. We passed None to isjunk, so no elements are ignored.
asequence to be compared. Empty string by default.
bsequence to be compared. Empty string by default.

The SequenceMatcher class is used to compare pairs of sequences of any type, so long as the sequence elements are hashable.

The SequenceMatcher class returns a Match object that implements a find_longest_match() method.

The find_longest_match() method finds the longest matching block in the provided sequences.

The arguments we passed to the method indicate that we want to find the longest match in the entirety of a and b.

main.py
from difflib import SequenceMatcher string1 = 'one two three four' string2 = 'one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐Ÿ‘‰๏ธ Match(a=0, b=0, size=8) # ๐Ÿ‘‡๏ธ one two print(string1[match.a:match.a + match.size]) # ๐Ÿ‘‡๏ธ one two print(string2[match.b:match.b + match.size])
The code for this article is available on GitHub

The common substring doesn't have to be at the beginning of the string.

main.py
from difflib import SequenceMatcher string1 = 'four five one two three four' string2 = 'zero eight one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐Ÿ‘‰๏ธ Match(a=9, b=10, size=9) # ๐Ÿ‘‡๏ธ ' one two ' print(string1[match.a:match.a + match.size]) # ๐Ÿ‘‡๏ธ ' one two ' print(string2[match.b:match.b + match.size])

Notice that the common substring contains leading and trailing whitespace.

# Removing the leading and trailing whitespace

You can use the str.strip() method if you need to remove the leading and trailing whitespace characters.

main.py
from difflib import SequenceMatcher string1 = 'four five one two three four' string2 = 'zero eight one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐Ÿ‘‰๏ธ Match(a=9, b=10, size=9) # ๐Ÿ‘‡๏ธ 'one two' print(string1[match.a:match.a + match.size].strip()) # ๐Ÿ‘‡๏ธ 'one two' print(string2[match.b:match.b + match.size].strip())

removing the leading and trailing whitespace

The code for this article is available on GitHub

The str.strip() method returns a copy of the string with the leading and trailing whitespace removed.

# Only find a leading common substring between two strings

If you only need to find a leading common substring between two strings, you can also use the os.path.commonprefix method.

main.py
import os string1 = 'one two three four' string2 = 'one two nine ten' common_substring = os.path.commonprefix([string1, string2]) print(common_substring) # ๐Ÿ‘‰๏ธ one two
The code for this article is available on GitHub

The os.path.commonprefix() method takes a list of strings and returns the longest path prefix that is a prefix of all paths in the list.

If the list is empty, an empty string is returned.

The commonprefix() method can find the leading common substring between as many strings as necessary.

main.py
import os string1 = 'one two three four' string2 = 'one two nine ten' string3 = 'one two eight' common_substring = os.path.commonprefix([string1, string2, string3]) print(common_substring) # ๐Ÿ‘‰๏ธ one two

However, the method wouldn't work if the common substring is not at the beginning of each string.

main.py
import os string1 = 'one two three four' string2 = 'eight one two nine ten' common_substring = os.path.commonprefix([string1, string2]) print(common_substring) # ๐Ÿ‘‰๏ธ ""

In this case, you have to use the find_longest_match() method from the first example.

# Find common characters between two Strings in Python

To find common characters between two strings:

  1. Use the set() class to convert the first string to a set.
  2. Use the intersection() method to get the common characters.
  3. Use the str.join() method to join the set into a string.
main.py
string1 = 'abcd' string2 = 'abzx' common_characters = ''.join( set(string1).intersection(string2) ) print(common_characters) # ๐Ÿ‘‰๏ธ 'ab'
The code for this article is available on GitHub

The set() class takes an iterable optional argument and returns a new set object with elements taken from the iterable.

main.py
string1 = 'abcd' # ๐Ÿ‘‡๏ธ {'d', 'c', 'b', 'a'} print(set(string1))
Set objects store an unordered collection of unique elements and implement anintersection() method.

The intersection() method returns a new set with elements common to both set objects.

main.py
string1 = 'abcd' string2 = 'abzx' # ๐Ÿ‘‡๏ธ {'a', 'b'} print(set(string1).intersection(string2))

The last step is to use the str.join() method to join the set object into a string.

main.py
string1 = 'abcd' string2 = 'abzx' common_characters = ''.join( set(string1).intersection(string2) ) print(common_characters) # ๐Ÿ‘‰๏ธ 'ab' print(len(common_characters)) # ๐Ÿ‘‰๏ธ 2

The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

You can use the len() function if you need to get the number of common elements between the two strings.

Alternatively, you can use a list comprehension.

# Find common characters between two Strings using a list comprehension

This is a three-step process:

  1. Use a list comprehension to iterate over the first string.
  2. Check if each character is present in the second string.
  3. Use the str.join() method to join the list into a string.
main.py
string1 = 'abcd' string2 = 'abzx' common_characters = ''.join([ char for char in string1 if char in string2 ]) print(common_characters) # ๐Ÿ‘‰๏ธ 'ab' print(len(common_characters)) # ๐Ÿ‘‰๏ธ 2
The code for this article is available on GitHub

We used a list comprehension to iterate over the first string.

List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we check if the current character is present in the other string and return the result.

The in operator tests for membership. For example, x in s evaluates to True if x is a member of s, otherwise it evaluates to False.

If you need to get a list of the common characters between the two strings, remove the call to the str.join() method.

main.py
string1 = 'abcd' string2 = 'abzx' common_characters = [ char for char in string1 if char in string2 ] print(common_characters) # ๐Ÿ‘‰๏ธ ['a', 'b'] print(len(common_characters)) # ๐Ÿ‘‰๏ธ 2

Alternatively, you can use a simple for loop.

# Find common characters between two Strings using a for loop

This is a three-step process:

  1. Use a for loop to iterate over the first string.
  2. Check if each character is present in the second string.
  3. Use the str.join() method to join the list into a string.
main.py
string1 = 'abcd' string2 = 'abzx' common_characters = [] for char in string1: if char in string2: common_characters.append(char) print(common_characters) # ๐Ÿ‘‰๏ธ ['a', 'b'] print(''.join(common_characters)) # ๐Ÿ‘‰๏ธ 'ab'
The code for this article is available on GitHub

We used a for loop to iterate over the first string.

On each iteration, we use the in operator to check if the character is contained in the second string.

If the condition is met, we append the character to a list.

The list.append() method adds an item to the end of the list.

main.py
my_list = ['bobby', 'hadz'] my_list.append('com') print(my_list) # ๐Ÿ‘‰๏ธ ['bobby', 'hadz', 'com']

The method returns None as it mutates the original list.

Which approach you pick is a matter of personal preference. I'd use a list comprehension because I find them quite direct and easy to read.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev