Data preprocessing is an important task in text classification. With the emergence of Python in the field of data science, it is essential to have certain shorthands to have the upper hand among others. This article discusses ways to count words in a sentence, it starts with space-separated words but also includes ways to in presence of special characters as well. Let’s discuss certain ways to perform this.
Quick Ninja Methods: One line Code to find count words in a sentence with Static and Dynamic Inputs.
Python3
# Quick Two Line Codes countOfWords = len ( "GeeksforLazyroar is best Computer Science Portal" .split()) print ( "Count of Words in the given Sentence:" , countOfWords) # Quick One Line Codes print ( len ( "GeeksforLazyroar is best Computer Science Portal" .split())) # Quick One Line Code with User Input print ( len ( input ( "Enter Input:" ).split())) |
Output:
Method #1: Using split() split function is quite useful and usually quite generic method to get words out of the list, but this approach fails once we introduce special characters in the list.
Python3
# Python3 code to demonstrate # to count words in string # using split() # initializing string test_string = "GeeksforLazyroar is best Computer Science Portal" # printing original string print ( "The original string is : " + test_string) # using split() # to count words in string res = len (test_string.split()) # printing result print ( "The number of words in string are : " + str (res)) |
The original string is : GeeksforLazyroar is best Computer Science Portal The number of words in string are : 6
Method #2 : Using regex(findall()) Regular expressions have to be used in case we require to handle the cases of punctuation marks or special characters in the string. This is the most elegant way in which this task can be performed.
Example
Python3
# Python3 code to demonstrate # to count words in string # using regex (findall()) import re # initializing string test_string = "GeeksforLazyroar, is best @# Computer Science Portal.!!!" # printing original string print ( "The original string is : " + test_string) # using regex (findall()) # to count words in string res = len (re.findall(r '\w+' , test_string)) # printing result print ( "The number of words in string are : " + str (res)) |
The original string is : GeeksforLazyroar, is best @# Computer Science Portal.!!! The number of words in string are : 6
Method #3 : Using sum() + strip() + split() This method performs this particular task without using regex. In this method we first check all the words consisting of all the alphabets, if so they are added to sum and then returned.
Python3
# Python3 code to demonstrate # to count words in string # using sum() + strip() + split() import string # initializing string test_string = "GeeksforLazyroar, is best @# Computer Science Portal.!!!" # printing original string print ( "The original string is : " + test_string) # using sum() + strip() + split() # to count words in string res = sum ([i.strip(string.punctuation).isalpha() for i in test_string.split()]) # printing result print ( "The number of words in string are : " + str (res)) |
The original string is : GeeksforLazyroar, is best @# Computer Science Portal.!!! The number of words in string are : 6
Method #4: Using count() method
Python3
# Python3 code to demonstrate # to count words in string # initializing string test_string = "GeeksforLazyroar is best Computer Science Portal" # printing original string print ( "The original string is : " + test_string) # to count words in string res = test_string.count( " " ) + 1 # printing result print ( "The number of words in string are : " + str (res)) |
The original string is : GeeksforLazyroar is best Computer Science Portal The number of words in string are : 6
Method #5 : Using the shlex module:
Here is a new approach using the split() method in shlex module:
Python3
import shlex # Initialize a string test_string = "GeeksforLazyroar is best Computer Science Portal" # Split the string into a list of words using shlex words = shlex.split(test_string) # Count the number of words in the list count = len (words) print (count) # Output: 6 #This code is contributed by Edula Vinay Kumar Reddy |
6
The shlex module provides a lexical analyzer for simple shell-like syntaxes. It can be used to split a string into a list of words while taking into account quotes, escapes, and other special characters. This makes it a good choice for counting words in a sentence that may contain such characters.
Note: The shlex.split function returns a list of words, so you can use the len function to count the number of words in the list. The count method can also be used on the list to achieve the same result.
Method #6: Using operator.countOf() method
Python3
# Python3 code to demonstrate # to count words in string import operator as op # initializing string test_string = "GeeksforLazyroar is best Computer Science Portal" # printing original string print ( "The original string is : " + test_string) # to count words in string res = op.countOf(test_string, " " ) + 1 # printing result print ( "The number of words in string are : " + str (res)) |
The original string is : GeeksforLazyroar is best Computer Science Portal The number of words in string are : 6
The time complexity of this approach is O(n), where n is the length of the input string.
The Auxiliary space is also O(n), as the shlex.split function creates a new list of words from the input string. This approach is efficient for small to medium-sized inputs, but may not be suitable for very large inputs due to the use of additional memory.
Method #7:Using reduce()
- Initialize a variable res to 1 to account for the first word in the string.
- For each character ch in the string, do the following:
a. If ch is a space, increment res by 1. - Return the value of res as the result.
Python3
# Python3 code to demonstrate # to count words in string using reduce from functools import reduce # initializing string test_string = "GeeksforLazyroar is best Computer Science Portal" # printing original string print ( "The original string is : " + test_string) # to count words in string using reduce res = reduce ( lambda x, y: x + 1 if y = = ' ' else x, test_string, 1 ) # printing result print ( "The number of words in string are : " + str (res)) #This code is contributed Vinay Pinjala. |
The original string is : GeeksforLazyroar is best Computer Science Portal The number of words in string are : 6
The time complexity of the algorithm for counting the number of words in a string using the count method or reduce function is O(n), where n is the length of the string. This is because we iterate over each character in the string once to count the number of spaces.
The auxiliary space of the algorithm is O(1), since we only need to store a few variables (res and ch) at any given time during the execution of the algorithm. The space required is independent of the length of the input string.
Method #8: Using numpy:
Algorithm:
- Initialize the input string ‘test_string’
- Print the original string
- Use the numpy ‘char.count()’ method to count the number of spaces in the string and add 1 to it to get the count of words.
- Print the count of words.
Python3
import numpy as np # initializing string test_string = "GeeksforLazyroar is best Computer Science Portal" # printing original string print ( "The original string is : " + test_string) # using numpy to count words in string res = np.char.count(test_string, ' ' ) + 1 # printing result print ( "The number of words in string are : " + str (res)) # This code is contributed Rayudu. |
Output: The original string is : GeeksforLazyroar is best Computer Science Portal The number of words in string are : 6
Time complexity: O(n)
The time complexity of the ‘char.count()’ method is O(n), where n is the length of the input string. The addition operation takes constant time. Therefore, the time complexity of the code is O(n).
Auxiliary Space: O(1)
The space complexity of the code is constant, as we are not using any additional data structures or variables that are dependent on the input size. Therefore, the space complexity of the code is O(1).