Split a string on uppercase letters in Python

avatar

Borislav Hadzhiev

Last updated: Jun 25, 2022

banner

Photo from Unsplash

Split a string on uppercase letters in Python #

Use the re.findall() method to split a string on uppercase letters, e.g. re.findall('[a-zA-Z][^A-Z]*', my_str). The re.findall() method will split the string on uppercase letters and will return a list containing the results.

main.py
import re my_str = 'OneTwoThreeFour' my_list = re.findall('[a-zA-Z][^A-Z]*', my_str) print(my_list) # 👉️ ['One', 'Two', 'Three', 'Four']

The re.findall method takes a pattern and a string as arguments and returns a list of strings containing all non-overlapping matches of the pattern in the string.

The square brackets are used to indicate a set of characters.

We have two ranges in the first set of characters - a-z (lowercase letters) and A-Z.

The range of lowercase letters (a-z) is useful in case your string starts with lowercase letters.

main.py
import re my_str = 'oneTwoThreeFour' my_list = re.findall('[a-zA-Z][^A-Z]*', my_str) print(my_list) # 👉️ ['one', 'Two', 'Three', 'Four']

Notice that the string starts with lowercase characters, but the substring one still gets included in the list of strings.

If you were to remove the a-z range from the set of characters, the substring would get excluded.

main.py
import re my_str = 'oneTwoThreeFour' my_list = re.findall('[A-Z][^A-Z]*', my_str) print(my_list) # 👉️ ['Two', 'Three', 'Four']

The caret ^ at the beginning of the second set means "NOT". In other words, split on lowercase or uppercase characters that are followed by one or more non-uppercase characters.

main.py
import re my_str = 'oneTwoThreeFour' my_list = re.findall('[a-zA-Z][^A-Z]*', my_str) print(my_list) # 👉️ ['one', 'Two', 'Three', 'Four']

The asterisk * matches the preceding regular expression (non-uppercase characters) zero or more times.

In its entirety, the regular expression matches lowercase or uppercase characters that are followed by zero or more non-uppercase characters.

This approach would also work with strings that only consist of uppercase characters (e.g. ABCDE).

main.py
import re my_list = re.findall('[a-zA-Z][^A-Z]*', 'ABDE') print(my_list) # 👉️ ['A', 'B', 'D', 'E']
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.