Sometimes while working with Strings, we can have a problem in which we have to find substrings that occur between the uppercases and find their frequencies. This is a very unique problem and has less of applications. Let us discuss certain ways in which this task can be performed.
Method #1 : Using groupby() + regex + loop
The combination of the above functions can be used to perform this task. In this, we use groupby() to group the regex-extracted substrings. And all is compiled using loop.
Python3
# Python3 code to demonstrate working of# Substring Frequency between Uppercases# Using groupby() + regex + loopfrom itertools import groupbyimport re# initializing stringtest_str = "GeEkSForGeEkS"# printing original stringprint("The original string is : " + test_str)# Substring Frequency between Uppercases# Using groupby() + regex + loopres = {}for i, j in groupby(re.findall(r'[A-Z][a-z]*\d*', test_str)): res[i] = res[i] + 1 if i in res.keys() else 1# printing resultprint("The grouped Substring Frequency : " + str(res)) |
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Method #2: Using dictionary comprehension + sorted() + groupby() + regex
The combination of the above functions can be used to perform this task. In this, the task of cumulation is performed using dictionary comprehension.
Python3
# Python3 code to demonstrate working of# Substring Frequency between Uppercases# Using dictionary comprehension + sorted() + groupby() + regexfrom itertools import groupbyimport re# initializing stringtest_str = "GeEkSForGeEkS"# printing original stringprint("The original string is : " + test_str)# Substring Frequency between Uppercases# Using dictionary comprehension + sorted() + groupby() + regexres = {i: len(list(j)) for i, j in groupby( sorted(re.findall(r'[A-Z][a-z]*\d*', test_str)))}# printing resultprint("The grouped Substring Frequency : " + str(res)) |
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ek': 2, 'For': 1, 'Ge': 2, 'S': 2}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #3 : Using isupper(),split(),list(),set(),remove(),count() methods
Approach:
- Replace uppercase character with “*”+uppercase character(using isupper())
- Split on * that will result in a list(using split())
- Create a new list with unique strings of above list(using list(),set())
- Create a dictionary with unique strings as key and its count in original list. (using count() method)
Example:
Python3
# Python3 code to demonstrate working of# Substring Frequency between Uppercases# initializing stringtest_str = "GeEkSForGeEkS"# printing original stringprint("The original string is : " + test_str)# Substring Frequency between Uppercasesres = dict()x = ""for i in test_str: if i.isupper(): x += "*"+i else: x += iy = x.split("*")y.remove('')z = list(set(y))for i in z: res[i] = y.count(i)# printing resultprint("The grouped Substring Frequency : " + str(res)) |
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'S': 2, 'Ge': 2, 'Ek': 2, 'For': 1}
Time Complexity : O(N)
Auxiliary Space : O(N)
Method #4: Using regular expression and Counter()
- A regular expression pattern is defined to match all the substrings that start with an uppercase letter followed by zero or more lowercase letters.
- The re.findall() method is used to find all the matches of the pattern in the given string. This method returns a list of all the matches.
- The Counter() method is used to count the frequency of each match in the list.
- The resulting dictionary of the Counter() method is the desired output.
Example:
Python3
import refrom collections import Counter# initializing stringtest_str = "GeEkSForGeEkS"# printing original stringprint("The original string is : " + test_str)# Substring Frequency between Uppercasespattern = r'[A-Z][a-z]*'matches = re.findall(pattern, test_str)res = dict(Counter(matches))# printing resultprint("The grouped Substring Frequency : " + str(res)) |
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n), where n is the length of the input string.
Auxiliary space: O(n), where n is the length of the input string.
Method #5: Using a for loop and string slicing
- Initialize an empty dictionary res to store the substring frequencies.
- Initialize a variable start to 0.
- Loop through each character in the test_str starting from the second character.
- If the current character is uppercase, slice the substring between start and the current index (exclusive) and add it to the res dictionary. Update the start variable to the current index.
- After the loop, slice the substring between the last uppercase letter and the end of the string and add it to the res dictionary.
- Print the res dictionary.
Example:
Python3
# initializing stringtest_str = "GeEkSForGeEkS"# printing original stringprint("The original string is : " + test_str)# Substring Frequency between Uppercasesres = {}start = 0for i in range(1, len(test_str)): if test_str[i].isupper(): substr = test_str[start:i] if substr in res: res[substr] += 1 else: res[substr] = 1 start = isubstr = test_str[start:]if substr in res: res[substr] += 1else: res[substr] = 1# printing resultprint("The grouped Substring Frequency : " + str(res)) |
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n), where n is the length of test_str.
Auxiliary space: O(k), where k is the number of unique substrings between uppercase letters in test_str.
Method #6: Using a list comprehension and zip
Steps:
- Create a list comprehension that uses zip to iterate over the string and create substrings between uppercase letters.
- Use a for loop to iterate over the list of substrings.
- Use the count method to count the occurrences of the current substring in the original string.
- Add the substring and its count to a dictionary.
- Print the resulting dictionary.
Example:
Python3
# initializing stringtest_str = "GeEkSForGeEkS"# printing original stringprint("The original string is : " + test_str)# create a list of substrings between uppercase letters using a list comprehension and zipsubstr_list = [test_str[i:j] for i, j in zip([0] + [j for j in range(1, len(test_str)) if test_str[j].isupper()], [j for j in range(1, len(test_str)) if test_str[j].isupper()] + [None])]# create a dictionary to store the counts of each substringres = {}for substr in substr_list: res[substr] = test_str.count(substr)# printing resultprint("The grouped Substring Frequency : " + str(res)) |
The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n^2) (due to the use of the count method)
Auxiliary space: O(k) (where k is the number of distinct substrings)
