Saturday, November 16, 2024
Google search engine
HomeLanguagesNLP | Trigrams’n’Tags (TnT) Tagging

NLP | Trigrams’n’Tags (TnT) Tagging

TnT Tagger : It is a statistical tagger that works on second-order Markov models.

  • It is a very efficient part-of-speech tagger that can be trained on different languages and on any tagset.
  • For parameter generation, the component trains on tagged corpora. It incorporates different methods of smoothing and handling unknown words
  • Linear interpolation is used for smoothing, the respective weights are determined by deleted interpolation.

TnT tagger has different API than the normal taggers. One can explicitly use the train() method after creating it.

Code #1 : Using train() method




from nltk.tag import tnt
from nltk.corpus import treebank
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# initializing tagger
tnt_tagging = tnt.TnT()
  
# training
tnt_tagging.train(train_data)
  
# evaluating
a = tnt_tagging.evaluate(test_data)
  
print ("Accuracy of TnT Tagging : ", a)


Output :

Accuracy of TnT Tagging : 0.8756313403842003

Understanding the working of TnT tagger :

  • It maintains the number of
    • internal FreqDist
    • ConditionalFreqDist, which is based on the training data.
  • Frequency Distribution (FreqDist) counts the unigrams, bigrams and trigrams.
  • These frequencies are used for calculations of the probabilities of possible tags for each word.
  • TnT tagger uses all the ngram models together to choose the best tag instead of constructing a backoff chain of NgramTagger subclasses.
  • Based on the probabilities of each possible tag, it chooses the most likely model for entire sentence.

Code #2 : Using tagger for unknown words as ‘unk’




from nltk.tag import tnt
from nltk.corpus import treebank
from nltk.tag import DefaultTagger
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# initializing tagger
unk = DefaultTagger('NN')
tnt_tagging = tnt.TnT(unk = unk, Trained = True)
  
# training 
tnt_tagging.train(train_data)
  
# evaluating
a = tnt_tagging.evaluate(test_data)
  
print ("Accuracy of TnT Tagging : ", a)


Output :

Accuracy of TnT Tagging : 0.892467083962875
  • unknown tagger’s tag() method is only called with a single word sentence.
  • TnT tagger can pass in a tagger for unknown words as unk.
  • One can pass in Trained = True, if this tagger is already trained.
  • Otherwise, it will call unk.train(data) with the same data one can pass into the train() method.

Controlling Beam Search :

  • Another parameter to modify for TnT is N i.e. it controls the no. of possible solutions the tagger maintains.
  • By defaults N = 1000.
  • Amount of memory will increase if increase the value of N, without any specific increase of accuracy.
  • Amount of memory will decrease if decrease the value of N, but can decrease the accuracy.

Code #3 : Using N = 100




from nltk.tag import tnt
from nltk.corpus import treebank
from nltk.tag import DefaultTagger
  
# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
  
# initializing tagger
tnt_tagger = tnt.TnT(N = 100)
  
# training 
tnt_tagging.train(train_data)
  
# evaluating
a = tnt_tagging.evaluate(test_data)
  
print ("Accuracy of TnT Tagging : ", a)


Output :

Accuracy of TnT Tagging : 0.8756313403842003

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments