Python | Lemmatization with NLTK

25 June 2025

0

Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meanings to one word.
Text preprocessing includes both Stemming as well as Lemmatization. Many times people find these two terms confusing. Some treat these two as the same. Actually, lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.
Applications of lemmatization are:

Used in comprehensive retrieval systems like search engines.
Used in compact indexing

Examples of lemmatization:

-> rocks : rock
-> corpora : corpus
-> better : good

One major difference with stemming is that lemmatize takes a part of speech parameter, “pos” If not supplied, the default is “noun.”
Below is the implementation of lemmatization words using NLTK:

Python3

# import these modules
from nltk.stem import WordNetLemmatizer
 
lemmatizer = WordNetLemmatizer()
 
print("rocks :", lemmatizer.lemmatize("rocks"))
print("corpora :", lemmatizer.lemmatize("corpora"))
 
# a denotes adjective in "pos"
print("better :", lemmatizer.lemmatize("better", pos ="a"))

Output :

rocks : rock
corpora : corpus
better : good

NLTK (Natural Language Toolkit) is a Python library used for natural language processing. One of its modules is the WordNet Lemmatizer, which can be used to perform lemmatization on words.

Lemmatization is the process of reducing a word to its base or dictionary form, known as the lemma. For example, the lemma of the word “cats” is “cat”, and the lemma of “running” is “run”.

Advantages of Lemmatization with NLTK:

Improves text analysis accuracy: Lemmatization helps in improving the accuracy of text analysis by reducing words to their base or dictionary form. This makes it easier to identify and analyze words that have similar meanings.
Reduces data size: Since lemmatization reduces words to their base form, it helps in reducing the data size of the text, which makes it easier to handle large datasets.
Better search results: Lemmatization helps in retrieving better search results since it reduces different forms of a word to a common base form, making it easier to match different forms of a word in the text.
Useful for feature extraction: Lemmatization can be useful in feature extraction tasks, where the aim is to extract meaningful features from text for machine learning tasks.

Disadvantages of Lemmatization with NLTK:

Requires prior knowledge: Lemmatization requires prior knowledge of the language and the rules governing the formation of words. If the rules for a specific language are not available, then the accuracy of the lemmatizer may be affected.
Time-consuming: Lemmatization can be time-consuming since it involves parsing the text and performing a lookup in a dictionary or a database of word forms.
Not suitable for real-time applications: Since lemmatization is time-consuming, it may not be suitable for real-time applications that require quick response times.
May lead to ambiguity: Lemmatization may lead to ambiguity, as a single word may have multiple meanings depending on the context in which it is used. In such cases, the lemmatizer may not be able to determine the correct meaning of the word.

Python | Lemmatization with NLTK

Python3

Working with Titles and Heading – Python docx Module

Creating a Receipt Calculator using Python

One Liner for Python if-elif-else Statements

LEAVE A REPLY Cancel reply

Most Popular

Google Gemini learns a new trick using reference images for Veo 3.1 video creations

Vine gets a reboot that prioritizes human-made videos

Google Play Store makes it easier to find the shows you’re looking for

Divine debuts as the new Vine, reviving the popular six-second short videos

EDITOR PICKS

Google Gemini learns a new trick using reference images for Veo 3.1 video creations

Vine gets a reboot that prioritizes human-made videos

Google Play Store makes it easier to find the shows you’re looking for

POPULAR POSTS

Google Gemini learns a new trick using reference images for Veo 3.1 video creations

Vine gets a reboot that prioritizes human-made videos

Google Play Store makes it easier to find the shows you’re looking for

POPULAR CATEGORY

ABOUT US

FOLLOW US