Sometimes while working with Strings, we can have a problem in which we have to find substrings that occur between the uppercases and find their frequencies. This is a very unique problem and has less of applications. Let us discuss certain ways in which this task can be performed.
Method #1 : Using groupby() + regex + loop
The combination of the above functions can be used to perform this task. In this, we use groupby() to group the regex-extracted substrings. And all is compiled using loop.
Python3
# Python3 code to demonstrate working of # Substring Frequency between Uppercases # Using groupby() + regex + loop from itertools import groupby import re # initializing string test_str = "GeEkSForGeEkS" # printing original string print ( "The original string is : " + test_str) # Substring Frequency between Uppercases # Using groupby() + regex + loop res = {} for i, j in groupby(re.findall(r '[A-Z][a-z]*\d*' , test_str)): res[i] = res[i] + 1 if i in res.keys() else 1 # printing result print ( "The grouped Substring Frequency : " + str (res)) |
The original string is : GeEkSForGeEkS The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Method #2: Using dictionary comprehension + sorted() + groupby() + regex
The combination of the above functions can be used to perform this task. In this, the task of cumulation is performed using dictionary comprehension.
Python3
# Python3 code to demonstrate working of # Substring Frequency between Uppercases # Using dictionary comprehension + sorted() + groupby() + regex from itertools import groupby import re # initializing string test_str = "GeEkSForGeEkS" # printing original string print ( "The original string is : " + test_str) # Substring Frequency between Uppercases # Using dictionary comprehension + sorted() + groupby() + regex res = {i: len ( list (j)) for i, j in groupby( sorted (re.findall(r '[A-Z][a-z]*\d*' , test_str)))} # printing result print ( "The grouped Substring Frequency : " + str (res)) |
The original string is : GeEkSForGeEkS The grouped Substring Frequency : {'Ek': 2, 'For': 1, 'Ge': 2, 'S': 2}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #3 : Using isupper(),split(),list(),set(),remove(),count() methods
Approach:
- Replace uppercase character with “*”+uppercase character(using isupper())
- Split on * that will result in a list(using split())
- Create a new list with unique strings of above list(using list(),set())
- Create a dictionary with unique strings as key and its count in original list. (using count() method)
Example:
Python3
# Python3 code to demonstrate working of # Substring Frequency between Uppercases # initializing string test_str = "GeEkSForGeEkS" # printing original string print ( "The original string is : " + test_str) # Substring Frequency between Uppercases res = dict () x = "" for i in test_str: if i.isupper(): x + = "*" + i else : x + = i y = x.split( "*" ) y.remove('') z = list ( set (y)) for i in z: res[i] = y.count(i) # printing result print ( "The grouped Substring Frequency : " + str (res)) |
The original string is : GeEkSForGeEkS The grouped Substring Frequency : {'S': 2, 'Ge': 2, 'Ek': 2, 'For': 1}
Time Complexity : O(N)
Auxiliary Space : O(N)
Method #4: Using regular expression and Counter()
- A regular expression pattern is defined to match all the substrings that start with an uppercase letter followed by zero or more lowercase letters.
- The re.findall() method is used to find all the matches of the pattern in the given string. This method returns a list of all the matches.
- The Counter() method is used to count the frequency of each match in the list.
- The resulting dictionary of the Counter() method is the desired output.
Example:
Python3
import re from collections import Counter # initializing string test_str = "GeEkSForGeEkS" # printing original string print ( "The original string is : " + test_str) # Substring Frequency between Uppercases pattern = r '[A-Z][a-z]*' matches = re.findall(pattern, test_str) res = dict (Counter(matches)) # printing result print ( "The grouped Substring Frequency : " + str (res)) |
The original string is : GeEkSForGeEkS The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n), where n is the length of the input string.
Auxiliary space: O(n), where n is the length of the input string.
Method #5: Using a for loop and string slicing
- Initialize an empty dictionary res to store the substring frequencies.
- Initialize a variable start to 0.
- Loop through each character in the test_str starting from the second character.
- If the current character is uppercase, slice the substring between start and the current index (exclusive) and add it to the res dictionary. Update the start variable to the current index.
- After the loop, slice the substring between the last uppercase letter and the end of the string and add it to the res dictionary.
- Print the res dictionary.
Example:
Python3
# initializing string test_str = "GeEkSForGeEkS" # printing original string print ( "The original string is : " + test_str) # Substring Frequency between Uppercases res = {} start = 0 for i in range ( 1 , len (test_str)): if test_str[i].isupper(): substr = test_str[start:i] if substr in res: res[substr] + = 1 else : res[substr] = 1 start = i substr = test_str[start:] if substr in res: res[substr] + = 1 else : res[substr] = 1 # printing result print ( "The grouped Substring Frequency : " + str (res)) |
The original string is : GeEkSForGeEkS The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n), where n is the length of test_str.
Auxiliary space: O(k), where k is the number of unique substrings between uppercase letters in test_str.
Method #6: Using a list comprehension and zip
Steps:
- Create a list comprehension that uses zip to iterate over the string and create substrings between uppercase letters.
- Use a for loop to iterate over the list of substrings.
- Use the count method to count the occurrences of the current substring in the original string.
- Add the substring and its count to a dictionary.
- Print the resulting dictionary.
Example:
Python3
# initializing string test_str = "GeEkSForGeEkS" # printing original string print ( "The original string is : " + test_str) # create a list of substrings between uppercase letters using a list comprehension and zip substr_list = [test_str[i:j] for i, j in zip ([ 0 ] + [j for j in range ( 1 , len (test_str)) if test_str[j].isupper()], [j for j in range ( 1 , len (test_str)) if test_str[j].isupper()] + [ None ])] # create a dictionary to store the counts of each substring res = {} for substr in substr_list: res[substr] = test_str.count(substr) # printing result print ( "The grouped Substring Frequency : " + str (res)) |
The original string is : GeEkSForGeEkS The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}
Time complexity: O(n^2) (due to the use of the count method)
Auxiliary space: O(k) (where k is the number of distinct substrings)