Friday, November 15, 2024
Google search engine
HomeLanguagesSecond most repeated word in a sequence in Python

Second most repeated word in a sequence in Python

Given a sequence of strings, the task is to find out the second most repeated (or frequent) string in the given sequence. (Considering no two words are the second most repeated, there will be always a single word).

Examples: 

Input : {"aaa", "bbb", "ccc", "bbb", 
         "aaa", "aaa"}
Output : bbb

Input : {"Lazyroar", "for", "Lazyroar", "for", 
          "Lazyroar", "aaa"}
Output : for

This problem has existing solution please refer Second most repeated word in a sequence link. We can solve this problem quickly in Python using Counter(iterator) method. 
Approach is very simple – 

  1. Create a dictionary using Counter(iterator) method which contains words as keys and it’s frequency as value.
  2. Now get a list of all values in dictionary and sort it in descending order. Choose second element from the sorted list because it will be the second largest.
  3. Now traverse dictionary again and print key whose value is equal to second largest element.

Implementation

Python3




# Python code to print Second most repeated
# word in a sequence in Python
from collections import Counter
 
 
def secondFrequent(input):
 
    # Convert given list into dictionary
    # it's output will be like {'ccc':1,'aaa':3,'bbb':2}
    dict = Counter(input)
 
    # Get the list of all values and sort it in ascending order
    value = sorted(dict.values(), reverse=True)
 
    # Pick second largest element
    secondLarge = value[1]
 
    # Traverse dictionary and print key whose
    # value is equal to second large element
    for (key, val) in dict.items():
        if val == secondLarge:
            print(key)
            return
 
 
# Driver program
if __name__ == "__main__":
    input = ['aaa', 'bbb', 'ccc', 'bbb', 'aaa', 'aaa']
    secondFrequent(input)


Output

bbb

Time complexity: O(nlogn) where n is the length of the input list
Auxiliary space: O(n) where n is the length of the input list

Alternate Implementation : 

Python3




# returns the second most repeated word
from collections import Counter
class Solution:
    def secFrequent(self, arr, n):
        all_freq = dict(Counter(arr))
        store = []
        for w in sorted(all_freq, key=all_freq.get):
            # if add key=all_freq.get will sort according to values
            # without key=all_freq.get will sort according to keys
            if w not in store:
                store.append(w)
             
        return store[-2]
# driver code or main function
if __name__ == '__main__':
    # no. of test cases
    t = 1
    for _ in range(t):
        # no of words
        n = 7
        # String of words
        arr = ["cat","mat","cat","mat","cat",'ball',"tall"]
        ob = Solution()
        ans = ob.secFrequent(arr,n)
        print(ans)


Output

mat

Time complexity: O(nlogn)
Auxiliary space: O(n)

Approach#3: using dictionary

We can use a dictionary to count the frequency of each word in the sequence. Then, we can find the second most repeated word by iterating over the dictionary and keeping track of the maximum and second maximum frequency.

Steps that were to follow the above approach:

  • Create an empty dictionary to count the frequency of each word in the sequence.
  • Iterate over each word in the sequence and update its frequency in the dictionary.
  • Initialize the maximum and second maximum frequency to 0 and -1, respectively.
  • Iterate over the items in the dictionary and update the maximum and second maximum frequency if necessary.
  • Return the word corresponding to the second maximum frequency.

Python3




def second_most_repeated_word(sequence):
    word_count = {}
    for word in sequence:
        if word in word_count:
            word_count[word] += 1
        else:
            word_count[word] = 1
    max_freq = 0
    second_max_freq = -1
    for word, freq in word_count.items():
        if freq > max_freq:
            second_max_freq = max_freq
            max_freq = freq
        elif freq > second_max_freq and freq < max_freq:
            second_max_freq = freq
    for word, freq in word_count.items():
        if freq == second_max_freq:
            return word
 
# Example usage
sequence = ["aaa", "bbb", "ccc", "bbb", "aaa", "aaa"]
print(second_most_repeated_word(sequence)) # Output: bbb


Output

bbb

Time complexity: O(n), where n is the number of words in the sequence. This is due to the iteration over each word in the sequence and the items in the dictionary.

Space complexity: O(n), where n is the number of words in the sequence. This is due to the storage of the dictionary.

RELATED ARTICLES

Most Popular

Recent Comments