In Python, we sometimes come through situations where we require to get all the words present in the string, this can be a tedious task done using the native method. Hence having shorthand to perform this task is always useful. Additionally, this article also includes the cases in which punctuation marks have to be ignored.
Input: GeeksForGeeks is the best Computer Science Portal
Output: ['GeeksForGeeks', 'is', 'the', 'best', 'Computer', 'Science', 'Portal']
Explanation: In this, we are extracting each word from a given string
Python Extract Words From String
- Using Split()
- Using Find()
- Using List comprehension
- Using Regex
- Using regex() + String.punctuation
- Using NLP Libraries
Python Extract String Words using Split()
In Python, using the split() function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.
Python3
# initializing string test_string = "GeeksforLazyroar is best Computer Science Portal" # printing original string print ( "The original string is : " + test_string) # using split() # to extract words from string res = test_string.split() # printing result print ( "The list of words is : " + str (res)) |
Output
The original string is : GeeksForGeeks is best Computer Science Portal
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(1)
Python Extract String Words using Find()
In Python, using the find() function, we can extract string words. The find()
method is called on a string and takes a single argument, which is the substring you want to search for. It returns the lowest index of the substring if found, or -1 if the substring is not present.
Python3
def extract_words_using_find(input_string): words = [input_string[start:space_index] for start, space_index in enumerate (input_string.split( ' ' ))] return words sentence = "GeeksForGeeks is best Computer Science Portal" result_words = extract_words_using_find(sentence) print (result_words) |
Output
['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(1)
Python Extract String Words using List Comprehension
In Python, you can extract words from a string using list comprehension. List comprehension provides a concise and efficient way to iterate over the characters of a string and extract words based on specific conditions.
Python3
# Initializing string import string test_string = "GeeksForGeeks, is best @# Computer Science Portal.!!!" # Using list comprehension and isalnum() method to extract words from string res = [word.strip(string.punctuation) for word in test_string.split() if word.strip(string.punctuation).isalnum()] # Printing result print ( "The list of words is:" , res) |
Output
The list of words is: ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
The time complexity of the program is O(n), where n is the length of the test string.
The space complexity of the program is also O(n), where n is the length of the test string.
Python Extract String Words using Regex
In Python we can extract using Regular Expression. In the cases which contain all the special characters and punctuation marks, the conventional method of finding words in string using split can fail and hence requires regular expressions to perform this task. Findall() function returns the list after filtering the string and extracting words ignoring punctuation marks.
Python3
# using regex( findall() ) import re # initializing string test_string = "GeeksForGeeks, is best @# Computer Science Portal.!!!" # printing original string print ( "The original string is : " + test_string) # using regex( findall() ) # to extract words from string res = re.findall(r '\w+' , test_string) # printing result print ( "The list of words is : " + str (res)) |
Output
The original string is : GeeksForGeeks, is best @# Computer Science Portal.!!!
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(n)
Python Extract String Words using Regex() + String.Punctuation
This method also used regular expressions, but string function of getting all the punctuations is used to ignore all the punctuation marks and get the filtered result string.
Python3
# using regex() + string.punctuation import re import string # initializing string test_string = "GeeksForGeeks, is best @# Computer Science Portal.!!!" # printing original string print ( "The original string is : " + test_string) # using regex() + string.punctuation # to extract words from string res = re.sub( '[' + string.punctuation + ']' , '', test_string).split() # printing result print ( "The list of words is : " + str (res)) |
Output
The original string is : GeeksForGeeks, is best @# Computer Science Portal.!!!
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(n)
Python Extract String Words using NLP Libraries
Python has a number of natural language processing (NLP) packages that enable sophisticated word extraction features. The NLTK (Natural Language Toolkit) is one such library. Here is an illustration of word extraction using NLTK.
Python3
import nltk string = "GeeksForGeeks is the best Computer Science Portal ." words = nltk.word_tokenize(string) print (words) |
Output
['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(n)