NLP | Trigrams’n’Tags (TnT) Tagging

By Dominic Rubhabha-Wardslaus

26 July 2024

0

1

TnT Tagger : It is a statistical tagger that works on second-order Markov models.

It is a very efficient part-of-speech tagger that can be trained on different languages and on any tagset.
For parameter generation, the component trains on tagged corpora. It incorporates different methods of smoothing and handling unknown words
Linear interpolation is used for smoothing, the respective weights are determined by deleted interpolation.

TnT tagger has different API than the normal taggers. One can explicitly use the train() method after creating it.

Code #1 : Using train() method

from nltk.tag import tnt 
from nltk.corpus import treebank 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
# initializing tagger 
tnt_tagging = tnt.TnT() 
  
# training 
tnt_tagging.train(train_data) 
  
# evaluating 
a = tnt_tagging.evaluate(test_data) 
  
print ("Accuracy of TnT Tagging : ", a) 

Output :

Accuracy of TnT Tagging : 0.8756313403842003

Understanding the working of TnT tagger :

It maintains the number of
- internal FreqDist
- ConditionalFreqDist, which is based on the training data.
Frequency Distribution (FreqDist) counts the unigrams, bigrams and trigrams.
These frequencies are used for calculations of the probabilities of possible tags for each word.
TnT tagger uses all the ngram models together to choose the best tag instead of constructing a backoff chain of NgramTagger subclasses.
Based on the probabilities of each possible tag, it chooses the most likely model for entire sentence.

Code #2 : Using tagger for unknown words as ‘unk’

from nltk.tag import tnt 
from nltk.corpus import treebank 
from nltk.tag import DefaultTagger 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
# initializing tagger 
unk = DefaultTagger('NN') 
tnt_tagging = tnt.TnT(unk = unk, Trained = True) 
  
# training  
tnt_tagging.train(train_data) 
  
# evaluating 
a = tnt_tagging.evaluate(test_data) 
  
print ("Accuracy of TnT Tagging : ", a) 

Output :

Accuracy of TnT Tagging : 0.892467083962875

unknown tagger’s tag() method is only called with a single word sentence.
TnT tagger can pass in a tagger for unknown words as unk.
One can pass in Trained = True, if this tagger is already trained.
Otherwise, it will call unk.train(data) with the same data one can pass into the train() method.

Controlling Beam Search :

Another parameter to modify for TnT is N i.e. it controls the no. of possible solutions the tagger maintains.
By defaults N = 1000.
Amount of memory will increase if increase the value of N, without any specific increase of accuracy.
Amount of memory will decrease if decrease the value of N, but can decrease the accuracy.

Code #3 : Using N = 100

from nltk.tag import tnt 
from nltk.corpus import treebank 
from nltk.tag import DefaultTagger 
  
# initializing training and testing set     
train_data = treebank.tagged_sents()[:3000] 
test_data = treebank.tagged_sents()[3000:] 
  
# initializing tagger 
tnt_tagger = tnt.TnT(N = 100) 
  
# training  
tnt_tagging.train(train_data) 
  
# evaluating 
a = tnt_tagging.evaluate(test_data) 
  
print ("Accuracy of TnT Tagging : ", a) 

Output :

Accuracy of TnT Tagging : 0.8756313403842003

NLP | Trigrams’n’Tags (TnT) Tagging

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Recent Comments

EDITOR PICKS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR POSTS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR CATEGORY

ABOUT US

FOLLOW US