Given a String, the task is to write a Python program to extract strings of N size and have K distinct characters.
Examples:
Input : test_str = ‘neveropenforneveropen’, N = 3, K = 2
Output : [‘gee’, ‘eek’, ‘gee’, ‘eek’, ‘gee’, ‘eek’]
Explanation : 3 lengths have 2 unique characters are extracted.Input : test_str = ‘neveropenforneveropen’, N = 4, K = 2
Output : []
Explanation : No strings with 4 length and just have 2 elements.
Method #1 : Using slicing + set() + loop
In this we perform task of getting N chunks using slicing, set() is used to check for unique elements. Loop is used to iterate for all the possible chunks.
Python3
# Python3 code to demonstrate working of # N sized substrings with K distinct characters # Using slicing + set() + loop # initializing string test_str = 'neveropenforneveropen' # printing original string print ( "The original string is : " + str (test_str)) # initializing N N = 3 # initializing K K = 2 res = [] for idx in range ( 0 , len (test_str) - N + 1 ): # getting unique elements off sliced string if ( len ( set (test_str[idx: idx + N])) = = K): res.append(test_str[idx: idx + N]) # printing result print ( "Extracted Strings : " + str (res)) |
The original string is :neveropenforneveropen Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Time Complexity: O(n*n)
Auxiliary Space: O(n)
Method #2 : Using list comprehension + len() + set() + slicing
Similar to above method, only difference being list comprehension is used instead of loop, just to provide shorthand to solve this task.
Python3
# Python3 code to demonstrate working of # N sized substrings with K distinct characters # Using list comprehension + len() + set() + slicing # initializing string test_str = 'neveropenforneveropen' # printing original string print ( "The original string is : " + str (test_str)) # initializing N N = 3 # initializing K K = 2 # list comprehension used to slice res = [test_str[idx: idx + N] for idx in range ( 0 , len (test_str) - N + 1 ) if len ( set (test_str[idx: idx + N])) = = K] # printing result print ( "Extracted Strings : " + str (res)) |
The original string is :neveropenforneveropen Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
The time and space complexity for all the methods are the same:
Time Complexity: O(n)
Space Complexity: O(n)
Approach 3 : Using Counter and Sliding Window
This approach uses the Counter method from the collection module, which allows us to count the frequency of elements in an iterable. We maintain a sliding window with the length of N and move it to the right, updating the count of characters in the window as we move. If the count of unique characters in the window becomes less than or equal to K, we append the window to the result list.
Python3
# Python3 code to demonstrate working of # N sized substrings with K distinct characters # Using Counter and Sliding Window from collections import Counter # initializing string test_str = 'neveropenforneveropen' # printing original string print ( "The original string is : " + str (test_str)) # initializing N N = 3 # initializing K K = 2 res = [] # Sliding window of length N for i in range ( len (test_str) - N + 1 ): window = test_str[i:i + N] c = Counter(window) # if count of unique characters in the window <= K if len (c) < = K: res.append(window) # printing result print ( "Extracted Strings : " + str (res)) |
The original string is :neveropenforneveropen Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Time Complexity: O(n)
Auxiliary Space: O(n)
Approach 4: Using sliding window and hash map
- Initialize a dictionary char_count to store the count of characters in the current window.
- Initialize two pointers start and end to represent the left and right ends of the window. Also, initialize a variable count to keep track of the number of distinct characters in the window.
- Traverse the string from left to right using the end pointer end and do the following:
a. If the count of the character at the end index is 0 in the dictionary, increment the count variable.
b. Increment the count of the character at the end index in the dictionary.
c. If the length of the current window is greater than N, move the start pointer start to the right and do the following:
i. Decrement the count of the character at the start index in the dictionary.
ii. If the count of the character at the start index becomes 0 in the dictionary, decrement the count variable.
d. If the length of the current window is equal to N and the count variable is equal to K, add the current substring to the result list. - Return the result list.
Python3
def n_size_substrings_with_k_distinct_characters(s, N, K): res = [] char_count = {} start = 0 count = 0 for end in range ( len (s)): if s[end] not in char_count or char_count[s[end]] = = 0 : count + = 1 char_count[s[end]] = char_count.get(s[end], 0 ) + 1 if end - start + 1 > N: char_count[s[start]] - = 1 if char_count[s[start]] = = 0 : count - = 1 start + = 1 if end - start + 1 = = N and count = = K: res.append(s[start:end + 1 ]) return res # example usage test_str = 'neveropenforneveropen' print (n_size_substrings_with_k_distinct_characters(test_str, 3 , 2 )) |
['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Time complexity: O(n), where n is the length of the string.
Auxiliary space: O(k), where k is the number of distinct