nltk.probability.FreqDist is used to find the most common words by counting word frequencies in the treebank corpus. ConditionalFreqDist class is created for tagged words, where we count the frequency of every tag for every word. These counts are then used too construct a model of the frequent words as keys, with the most frequent tag for each word as a value. Code #1 : Creating functionÂ
Python3
# Loading Librariesfrom nltk.probability import FreqDist, ConditionalFreqDistÂ
# Making functiondef word_tag_model(words, tagged_words, limit = 200):Â Â Â Â Â Â Â Â Â fd = FreqDist(words)Â Â Â Â cfd = ConditionalFreqDist(tagged_words)Â Â Â Â most_freq = (word for word, count in fd.most_common(limit))Â Â Â Â Â return dict((word, cfd[word].max()) Â Â Â Â Â Â Â Â Â Â Â Â Â for word in most_freq) |
 Code #2 : Using the function with UnigramTaggerÂ
Python3
# loading librariesfrom tag_util import word_tag_modelfrom nltk.corpus import treebankfrom nltk.tag import UnigramTaggerÂ
# initializing training and testing set   train_data = treebank.tagged_sents()[:3000]test_data = treebank.tagged_sents()[3000:]Â
# Initializing the modelmodel = word_tag_model(treebank.words(), Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â treebank.tagged_words())Â
# Initializing the Unigramtag = UnigramTagger(model = model)Â
print ("Accuracy : ", tag.evaluate(test_data)) |
Output :
Accuracy : 0.559680552557738
 Code #3 : Let’s try backoff chainÂ
Python3
# Loading librariesfrom nltk.tag import UnigramTaggerfrom nltk.tag import DefaultTaggerÂ
default_tagger = DefaultTagger('NN')Â
likely_tagger = UnigramTagger(Â Â Â Â Â Â Â Â model = model, backoff = default_tagger)Â
tag = backoff_tagger(train_sents, [Â Â Â Â Â Â Â Â UnigramTagger, BigramTagger, Â Â Â Â Â Â Â Â TrigramTagger], backoff = likely_tagger)Â Â Â Â Â print ("Accuracy : ", tag.evaluate(test_data)) |
Output :
Accuracy : 0.8806820634578028
Note : Backoff chain has increases the accuracy. We can improve this result further by effectively using UnigramTagger class. Â Â Code #4 : Manual Override of Trained TaggersÂ
Python3
# Loading librariesfrom nltk.tag import UnigramTaggerfrom nltk.tag import DefaultTaggerÂ
default_tagger = DefaultTagger('NN')Â
tagger = backoff_tagger(train_sents, [        UnigramTagger, BigramTagger,        TrigramTagger], backoff = default_tagger)     likely_tag = UnigramTagger(model = model, backoff = tagger)Â
print ("Accuracy : ", likely_tag.evaluate(test_data)) |
Output :
Accuracy : 0.8824088063889488
