Last updated: Apr 10, 2024
Reading timeยท5 min
To find a common substring between two strings:
SequenceMatcher
class to get a Match
object.find_longest_match()
method to find the longest matching substring.from difflib import SequenceMatcher string1 = 'one two three four' string2 = 'one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐๏ธ Match(a=0, b=0, size=8) # ๐๏ธ one two print(string1[match.a:match.a + match.size]) # ๐๏ธ one two print(string2[match.b:match.b + match.size])
We passed the following 3 arguments to the SequenceMatcher class:
Name | Description |
---|---|
isjunk | function that returns true if the element is junk and should be ignored. We passed None to isjunk , so no elements are ignored. |
a | sequence to be compared. Empty string by default. |
b | sequence to be compared. Empty string by default. |
The SequenceMatcher
class is used to compare pairs of sequences of any type,
so long as the sequence elements are hashable.
SequenceMatcher
class returns a Match
object that implements a find_longest_match()
method.The find_longest_match() method finds the longest matching block in the provided sequences.
The arguments we passed to the method indicate that we want to find the longest
match in the entirety of a
and b
.
from difflib import SequenceMatcher string1 = 'one two three four' string2 = 'one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐๏ธ Match(a=0, b=0, size=8) # ๐๏ธ one two print(string1[match.a:match.a + match.size]) # ๐๏ธ one two print(string2[match.b:match.b + match.size])
The common substring doesn't have to be at the beginning of the string.
from difflib import SequenceMatcher string1 = 'four five one two three four' string2 = 'zero eight one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐๏ธ Match(a=9, b=10, size=9) # ๐๏ธ ' one two ' print(string1[match.a:match.a + match.size]) # ๐๏ธ ' one two ' print(string2[match.b:match.b + match.size])
Notice that the common substring contains leading and trailing whitespace.
You can use the str.strip()
method if you need to remove the leading and
trailing whitespace characters.
from difflib import SequenceMatcher string1 = 'four five one two three four' string2 = 'zero eight one two nine ten' match = SequenceMatcher(None, string1, string2).find_longest_match( 0, len(string1), 0, len(string2)) print(match) # ๐๏ธ Match(a=9, b=10, size=9) # ๐๏ธ 'one two' print(string1[match.a:match.a + match.size].strip()) # ๐๏ธ 'one two' print(string2[match.b:match.b + match.size].strip())
The str.strip() method returns a copy of the string with the leading and trailing whitespace removed.
If you only need to find a leading common substring between two strings, you can
also use the os.path.commonprefix
method.
import os string1 = 'one two three four' string2 = 'one two nine ten' common_substring = os.path.commonprefix([string1, string2]) print(common_substring) # ๐๏ธ one two
The os.path.commonprefix() method takes a list of strings and returns the longest path prefix that is a prefix of all paths in the list.
If the list is empty, an empty string is returned.
The commonprefix()
method can find the leading common substring between as
many strings as necessary.
import os string1 = 'one two three four' string2 = 'one two nine ten' string3 = 'one two eight' common_substring = os.path.commonprefix([string1, string2, string3]) print(common_substring) # ๐๏ธ one two
However, the method wouldn't work if the common substring is not at the beginning of each string.
import os string1 = 'one two three four' string2 = 'eight one two nine ten' common_substring = os.path.commonprefix([string1, string2]) print(common_substring) # ๐๏ธ ""
In this case, you have to use the find_longest_match()
method from the first
example.
To find common characters between two strings:
set()
class to convert the first string to a set
.intersection()
method to get the common characters.str.join()
method to join the set
into a string.string1 = 'abcd' string2 = 'abzx' common_characters = ''.join( set(string1).intersection(string2) ) print(common_characters) # ๐๏ธ 'ab'
The set() class takes an
iterable optional argument and returns a new set
object with elements taken
from the iterable.
string1 = 'abcd' # ๐๏ธ {'d', 'c', 'b', 'a'} print(set(string1))
intersection()
method.The
intersection()
method returns a new set
with elements common to both set
objects.
string1 = 'abcd' string2 = 'abzx' # ๐๏ธ {'a', 'b'} print(set(string1).intersection(string2))
The last step is to use the str.join()
method to join the set
object into a
string.
string1 = 'abcd' string2 = 'abzx' common_characters = ''.join( set(string1).intersection(string2) ) print(common_characters) # ๐๏ธ 'ab' print(len(common_characters)) # ๐๏ธ 2
The str.join() method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.
You can use the len()
function if you need to get the number of common
elements between the two strings.
Alternatively, you can use a list comprehension.
This is a three-step process:
str.join()
method to join the list into a string.string1 = 'abcd' string2 = 'abzx' common_characters = ''.join([ char for char in string1 if char in string2 ]) print(common_characters) # ๐๏ธ 'ab' print(len(common_characters)) # ๐๏ธ 2
We used a list comprehension to iterate over the first string.
On each iteration, we check if the current character is present in the other string and return the result.
The in operator tests
for membership. For example, x in s
evaluates to True
if x
is a member of
s
, otherwise it evaluates to False
.
If you need to get a list of the common characters between the two strings,
remove the call to the str.join()
method.
string1 = 'abcd' string2 = 'abzx' common_characters = [ char for char in string1 if char in string2 ] print(common_characters) # ๐๏ธ ['a', 'b'] print(len(common_characters)) # ๐๏ธ 2
Alternatively, you can use a simple for loop.
This is a three-step process:
for
loop to iterate over the first string.str.join()
method to join the list into a string.string1 = 'abcd' string2 = 'abzx' common_characters = [] for char in string1: if char in string2: common_characters.append(char) print(common_characters) # ๐๏ธ ['a', 'b'] print(''.join(common_characters)) # ๐๏ธ 'ab'
We used a for
loop to iterate over the first string.
On each iteration, we use the in
operator to check if the character is
contained in the second string.
If the condition is met, we append the character to a list.
The list.append() method adds an item to the end of the list.
my_list = ['bobby', 'hadz'] my_list.append('com') print(my_list) # ๐๏ธ ['bobby', 'hadz', 'com']
The method returns None as it mutates the original list.
Which approach you pick is a matter of personal preference. I'd use a list comprehension because I find them quite direct and easy to read.
You can learn more about the related topics by checking out the following tutorials: