Friday, November 21, 2025
HomeLanguagesPython NLTK | tokenize.regexp()

Python NLTK | tokenize.regexp()

With the help of NLTK tokenize.regexp() module, we are able to extract the tokens from string by using regular expression with RegexpTokenizer() method.

Syntax : tokenize.RegexpTokenizer()
Return : Return array of tokens using regular expression

Example #1 :
In this example we are using RegexpTokenizer() method to extract the stream of tokens with the help of regular expressions.




# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
    
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
    
# Create a string input
gfg = "I love Python"
    
# Use tokenize method
geek = tk.tokenize(gfg)
    
print(geek)


Output :

[‘I’, ‘love’, ‘Python’]

Example #2 :




# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
    
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
    
# Create a string input
gfg = "Geeks for Geeks"
    
# Use tokenize method
geek = tk.tokenize(gfg)
    
print(geek)


Output :

[‘Geeks’, ‘for’, ‘Geeks’]

RELATED ARTICLES

Most Popular

Dominic
32407 POSTS0 COMMENTS
Milvus
97 POSTS0 COMMENTS
Nango Kala
6783 POSTS0 COMMENTS
Nicole Veronica
11929 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11999 POSTS0 COMMENTS
Shaida Kate Naidoo
6907 POSTS0 COMMENTS
Ted Musemwa
7168 POSTS0 COMMENTS
Thapelo Manthata
6863 POSTS0 COMMENTS
Umr Jansen
6847 POSTS0 COMMENTS