Given Strings List, write a Python program to get word with most number of occurrences.
Example:
Input : test_list = [“gfg is best for Lazyroar”, “Lazyroar love gfg”, “gfg is best”]
Output : gfg
Explanation : gfg occurs 3 times, most in strings in total.Input : test_list = [“Lazyroar love gfg”, “Lazyroar are best”]
Output : Lazyroar
Explanation : Lazyroar occurs 2 times, most in strings in total.
Method #1 : Using loop + max() + split() + defaultdict()
In this, we perform task of getting each word using split(), and increase its frequency by memorizing it using defaultdict(). At last, max(), is used with parameter to get count of maximum frequency string.
Python3
# Python3 code to demonstrate working of # Most frequent word in Strings List # Using loop + max() + split() + defaultdict() from collections import defaultdict # initializing Matrix test_list = [ "gfg is best for Lazyroar" , "Lazyroar love gfg" , "gfg is best" ] # printing original list print ( "The original list is : " + str (test_list)) temp = defaultdict( int ) # memoizing count for sub in test_list: for wrd in sub.split(): temp[wrd] + = 1 # getting max frequency res = max (temp, key = temp.get) # printing result print ( "Word with maximum frequency : " + str (res)) |
The original list is : ['gfg is best for Lazyroar', 'Lazyroar love gfg', 'gfg is best'] Word with maximum frequency : gfg
Time Complexity: O(n*n)
Auxiliary Space: O(n)
Method #2 : Using list comprehension + mode()
In this, we get all the words using list comprehension and get maximum frequency using mode().
Python3
# Python3 code to demonstrate working of # Most frequent word in Strings List # Using list comprehension + mode() from statistics import mode # initializing Matrix test_list = [ "gfg is best for Lazyroar" , "Lazyroar love gfg" , "gfg is best" ] # printing original list print ( "The original list is : " + str (test_list)) # getting all words temp = [wrd for sub in test_list for wrd in sub.split()] # getting frequency res = mode(temp) # printing result print ( "Word with maximum frequency : " + str (res)) |
The original list is : ['gfg is best for Lazyroar', 'Lazyroar love gfg', 'gfg is best'] Word with maximum frequency : gfg
Method #3: Using list() and Counter()
- Append all words to empty list and calculate frequency of all words using Counter() function.
- Find max count and print that key.
Below is the implementation:
Python3
# Python3 code to demonstrate working of # Most frequent word in Strings List from collections import Counter # function which returns # most frequent word def mostFrequentWord(words): # Taking empty list lis = [] for i in words: # Getting all words for j in i.split(): lis.append(j) # Calculating frequency of all words freq = Counter(lis) # find max count and print that key max = 0 for i in freq: if (freq[i] > max ): max = freq[i] word = i return word # Driver code # initializing strings list words = [ "gfg is best for Lazyroar" , "Lazyroar love gfg" , "gfg is best" ] # printing original list print ( "The original list is : " + str (words)) # passing this words to mostFrequencyWord function # printing result print ( "Word with maximum frequency : " + mostFrequentWord(words)) # This code is contributed by vikkycirus |
The original list is : ['gfg is best for Lazyroar', 'Lazyroar love gfg', 'gfg is best'] Word with maximum frequency : gfg
The time and space complexity for all the methods are the same:
Time Complexity: O(n2)
Space Complexity: O(n)
Method #4: Using Counter() and reduce()
Here is an approach to solve the problem using the most_common() function of the collections module’s Counter class and the reduce() function from the functools module:
Python3
from collections import Counter from functools import reduce def most_frequent_word(test_list): all_words = reduce ( lambda a, b: a + b, [sub.split() for sub in test_list]) word_counts = Counter(all_words) return word_counts.most_common( 1 )[ 0 ][ 0 ] test_list = [ "gfg is best for Lazyroar" , "Lazyroar love gfg" , "gfg is best" ] print ( "The original list is: " , test_list) print ( "Word with most frequency: " , most_frequent_word(test_list)) |
The original list is: ['gfg is best for Lazyroar', 'Lazyroar love gfg', 'gfg is best'] Word with most frequency: gfg
Explanation:
We use the reduce() function to concatenate the list of all words from each string in the test_list.
We then create a Counter object from the list of all words to get a count of the frequency of each word.
Finally, we use the most_common() function to get the word with the highest frequency and return it.
Time complexity: O(n * k), where n is the number of strings in the test_list and k is the average number of words in each string.
Auxiliary Space: O(n * k), since we are storing the words in a list before creating a Counter object.
Method #5: Using heapq:
- We start by initializing an empty list all_words, which will be used to store all the individual words from the input list.
- We iterate over each string in the input list using a list comprehension and split each string into individual words using the split() method.
- We add the resulting list of words to all_words using the extend() method.
- We create a Counter object from the list of words. A Counter object is a dictionary that stores the frequency of each element in the list.
- We use the heapq.nlargest() function to get the word with the highest frequency from the Counter object.
- We return the most frequent word.
Python3
import heapq from collections import Counter def most_frequent_word(test_list): all_words = [sub.split() for sub in test_list] word_counts = Counter(word for sublist in all_words for word in sublist) return heapq.nlargest( 1 , word_counts, key = word_counts.get)[ 0 ] test_list = [ "gfg is best for Lazyroar" , "Lazyroar love gfg" , "gfg is best" ] print ( "The original list is: " , test_list) print ( "Word with most frequency: " , most_frequent_word(test_list)) #This code is contributed by Pushpa. |
The original list is: ['gfg is best for Lazyroar', 'Lazyroar love gfg', 'gfg is best'] Word with most frequency: gfg
The time complexity : O(n log k), where n is the total number of words in the input list and k is the number of unique words. The most time-consuming operation in this algorithm is the creation of the Counter object, which has a time complexity of O(n). The heapq.nlargest() function has a time complexity of O(k log k), as it maintains a heap of size k.
The auxiliary space : O(k), where k is the number of unique words in the input list. This is because we create a Counter object and a heap of size k to store the k most frequent words.