Saturday, October 11, 2025
HomeLanguagesNLP | Likely Word Tags

NLP | Likely Word Tags

nltk.probability.FreqDist is used to find the most common words by counting word frequencies in the treebank corpus. ConditionalFreqDist class is created for tagged words, where we count the frequency of every tag for every word. These counts are then used too construct a model of the frequent words as keys, with the most frequent tag for each word as a value. Code #1 : Creating function 

Python3




# Loading Libraries
from nltk.probability import FreqDist, ConditionalFreqDist
 
# Making function
def word_tag_model(words, tagged_words, limit = 200):
     
    fd = FreqDist(words)
    cfd = ConditionalFreqDist(tagged_words)
    most_freq = (word for word, count in fd.most_common(limit))
     
return dict((word, cfd[word].max())
             for word in most_freq)


  Code #2 : Using the function with UnigramTagger 

Python3




# loading libraries
from tag_util import word_tag_model
from nltk.corpus import treebank
from nltk.tag import UnigramTagger
 
# initializing training and testing set   
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
 
# Initializing the model
model = word_tag_model(treebank.words(),
                       treebank.tagged_words())
 
# Initializing the Unigram
tag = UnigramTagger(model = model)
 
print ("Accuracy : ", tag.evaluate(test_data))


Output :

Accuracy : 0.559680552557738

  Code #3 : Let’s try backoff chain 

Python3




# Loading libraries
from nltk.tag import UnigramTagger
from nltk.tag import DefaultTagger
 
default_tagger = DefaultTagger('NN')
 
likely_tagger = UnigramTagger(
        model = model, backoff = default_tagger)
 
tag = backoff_tagger(train_sents, [
        UnigramTagger, BigramTagger,
        TrigramTagger], backoff = likely_tagger)
     
print ("Accuracy : ", tag.evaluate(test_data))


Output :

Accuracy : 0.8806820634578028

Note : Backoff chain has increases the accuracy. We can improve this result further by effectively using UnigramTagger class.   Code #4 : Manual Override of Trained Taggers 

Python3




# Loading libraries
from nltk.tag import UnigramTagger
from nltk.tag import DefaultTagger
 
default_tagger = DefaultTagger('NN')
 
tagger = backoff_tagger(train_sents, [
        UnigramTagger, BigramTagger,
        TrigramTagger], backoff = default_tagger)
     
likely_tag = UnigramTagger(model = model, backoff = tagger)
 
print ("Accuracy : ", likely_tag.evaluate(test_data))


Output :

Accuracy : 0.8824088063889488
Dominic
Dominichttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Dominic
32350 POSTS0 COMMENTS
Milvus
87 POSTS0 COMMENTS
Nango Kala
6720 POSTS0 COMMENTS
Nicole Veronica
11882 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11941 POSTS0 COMMENTS
Shaida Kate Naidoo
6839 POSTS0 COMMENTS
Ted Musemwa
7101 POSTS0 COMMENTS
Thapelo Manthata
6794 POSTS0 COMMENTS
Umr Jansen
6794 POSTS0 COMMENTS