Wednesday, January 8, 2025
Google search engine
HomeLanguagesPython program to extract Keywords from a list

Python program to extract Keywords from a list

Given a List of strings, extract all the words that are keywords.

Input : test_list = [“Gfg is True”, “Its a global win”, “try Gfg”], 
Output : [‘is’, ‘True’, ‘global’, ‘try’] 
Explanation : All strings in result list is valid Python keyword.

Input : test_list = [“try Gfg”], 
Output : [‘try’] 
Explanation : try is used in try/except block, hence a keyword. 

Method #1 : Using iskeyword() + split() + loop

This is one of the ways in which this task can be performed. In this, we check for keyword using iskeyword() and convert a string to words using split(). The logic of extension to all strings happens using loop.

Python3




# Python3 code to demonstrate working of
# Extract Keywords from String List
 
# Using iskeyword() + loop + split()
import keyword
 
# initializing list
test_list = ["Gfg is True", "Gfg will yield a return",
             "Its a global win", "try Gfg"]
 
# printing original list
print("The original list is : " + str(test_list))
 
 
# iterating using loop
res = []
for sub in test_list:
   for word in sub.split():
 
       # check for keyword using iskeyword()
       if keyword.iskeyword(word):
           res.append(word)
 
# printing result
print("Extracted Keywords : " + str(res))


Output

The original list is : ['Gfg is True', 'Gfg will yield a return', 'Its a global win', 'try Gfg']
Extracted Keywords : ['is', 'True', 'yield', 'return', 'global', 'try']

Time Complexity: O(n2)
Auxiliary Space: O(n)

Method #2: Using list comprehension

This is yet another way in which this task can be performed. Similar to the above method but much compact on paper, use similar functionalities as the above method.

Python3




# Python3 code to demonstrate working of
# Extract Keywords from String List
 
# Using list comprehension
import keyword
 
# initializing list
test_list = ["Gfg is True", "Gfg will yield a return",
             "Its a global win", "try Gfg"]
 
# printing original list
print("The original list is : " + str(test_list))
 
# One-liner using list comprehension
res = [ele for sub in test_list for ele in sub.split() if keyword.iskeyword(ele)]
 
# printing result
print("Extracted Keywords : " + str(res))


Output

The original list is : ['Gfg is True', 'Gfg will yield a return', 'Its a global win', 'try Gfg']
Extracted Keywords : ['is', 'True', 'yield', 'return', 'global', 'try']

Output:

The original list is : [‘Gfg is True’, ‘Gfg will yield a return’, ‘Its a global win’, ‘try Gfg’] Extracted Keywords : [‘is’, ‘True’, ‘yield’, ‘return’, ‘global’, ‘try’]

Time Complexity: O(n2)

Auxiliary Space: O(n)

Approach#3: Using re.findall(): This approach to solving this problem is to use regular expressions to extract words that match Python keywords. We can use the re-module to create a regular expression that matches Python keywords. Then, we can iterate over the given list and use the re.findall() function to extract all words that match the regular expression. Finally, we can remove any duplicates from the list of extracted keywords.

  1. Define a function to extract keywords from a list using regular expressions.
  2. Create a regular expression that matches Python keywords.
  3. Iterate over the given list and use the re.findall() function to extract all words that match the regular expression.
  4. Remove any duplicates from the list of extracted keywords.
  5. Return the list of extracted keywords.

Python3




import re
import keyword
 
# Function to extract the keywords
def extract_keywords(string_list):
    python_keywords = set(keyword.kwlist)
    pattern = re.compile(r'\b(' + '|'.join(python_keywords) + r')\b')
    extracted_keywords = []
     
    for string in string_list:
        words = pattern.findall(string)
        extracted_keywords.extend(words)
         
    return list(set(extracted_keywords))
 
 
# Driver Code
string_list = ["Gfg is True", "Gfg will yield a return",
               "Its a global win", "try Gfg"]
print(extract_keywords(string_list))


Output

['True', 'yield', 'return', 'try', 'is', 'global']

Time Complexity: O(n*m), where n is the number of strings in the list and m is the average length of each string.
Space Complexity: O(k), where k is the number of unique Python keywords.

Approach 4: Using a set intersection method. 

Steps-by-step approach:

  • Create a set of all Python keywords using the keyword module.
  • Loop through each string in the string_list.
  • Split the string into words using the split() method.
  • Convert the list of words into a set using the set() method.
  • Find the intersection of the sets created in steps 2 and 4 using the & operator.
  • Add the intersecting words to a list.
  • Remove duplicates from the list using the list(set()) method.
  • Return the final list of extracted keywords.

Python3




import keyword
 
def extract_keywords(string_list):
    python_keywords = set(keyword.kwlist)
    extracted_keywords = []
     
    for string in string_list:
        words = set(string.split())
        intersect = words & python_keywords
        extracted_keywords += list(intersect)
         
    return list(set(extracted_keywords))
 
string_list = ["Gfg is True", "Gfg will yield a return",
               "Its a global win", "try Gfg"]
 
print(extract_keywords(string_list))


Output

['return', 'try', 'global', 'yield', 'is', 'True']

Time Complexity: O(n * m), where n is the number of strings in string_list and m is the average number of words in each string.
Auxiliary Space: O(k), where k is the number of unique keywords extracted from the string_list.

Approach 5: Using numpy:

  1. Convert the list of strings into a NumPy array of strings using np.array(test_list, dtype=’U’).
  2. Split the array of strings into an array of arrays of words using np.char.split(arr).
  3. Flatten the array of arrays of words into a 1D array of words using np.concatenate(words).
  4. Use np.vectorize(keyword.iskeyword) to vectorize the keyword.iskeyword function for use with NumPy arrays.
  5. Extract the keywords by applying the vectorized keyword.iskeyword function to the 1D array of words using is_kw(flat_words).
  6. Filter out the non-keywords from the flattened array of words using boolean indexing,
  7. flat_words[is_kw(flat_words)].

Python3




import numpy as np
import keyword
 
# initializing list
test_list = ["Gfg is True", "Gfg will yield a return",
            "Its a global win", "try Gfg"]
# printing original list
print("The original list is : " + str(test_list))
# Convert list to NumPy array of strings
arr = np.array(test_list, dtype='U')
 
# Split array into words
words = np.char.split(arr)
 
# Flatten array of words into a 1D array
flat_words = np.concatenate(words)
 
# Extract keywords using np.vectorize() and keyword.iskeyword()
is_kw = np.vectorize(keyword.iskeyword)
keywords = flat_words[is_kw(flat_words)]
 
# printing result
print("Extracted Keywords : " + str(keywords))
#This code is contributed by Pushpa.


Output:
The original list is : ['Gfg is True', 'Gfg will yield a return', 'Its a global win', 'try Gfg']
Extracted Keywords : ['is' 'True' 'yield' 'return' 'global' 'try']

The time complexity : O(nm), where n is the number of strings in the input list, and m is the maximum number of words in any string. 

The space complexity: O(nm), because we are creating a new NumPy array for the words and a flattened 1D array of words.

RELATED ARTICLES

Most Popular

Recent Comments