NLP | Training Unigram Tagger

26 July 2024

0

A single token is referred to as a Unigram, for example – hello; movie; coding. This article is focused on unigram tagger. Unigram Tagger: For determining the Part of Speech tag, it only uses a single word. UnigramTagger inherits from NgramTagger, which is a subclass of ContextTagger, which inherits from SequentialBackoffTagger. So, UnigramTagger is a single word context-based tagger. Code #1 : Training UnigramTagger.

Python3

# Loading Libraries
from nltk.tag import UnigramTagger
from nltk.corpus import treebank

Code #2 : Training using first 1000 tagged sentences of the treebank corpus as data.

Python3

# Using data
train_sents = treebank.tagged_sents()[:1000]
 
# Initializing
tagger = UnigramTagger(train_sents)
 
# Lets see the first sentence 
# (of the treebank corpus) as list   
treebank.sents()[0]

Output :

['Pierre',
 'Vinken',
 ', ',
 '61',
 'years',
 'old',
 ', ',
 'will',
 'join',
 'the',
 'board',
 'as',
 'a',
 'nonexecutive',
 'director',
 'Nov.',
 '29',
 '.']

Code #3 : Finding the tagged results after training.

Python3

tagger.tag(treebank.sents()[0])

Output :

[('Pierre', 'NNP'),
 ('Vinken', 'NNP'),
 (', ', ', '),
 ('61', 'CD'),
 ('years', 'NNS'),
 ('old', 'JJ'),
 (', ', ', '),
 ('will', 'MD'),
 ('join', 'VB'),
 ('the', 'DT'),
 ('board', 'NN'),
 ('as', 'IN'),
 ('a', 'DT'),
 ('nonexecutive', 'JJ'),
 ('director', 'NN'),
 ('Nov.', 'NNP'),
 ('29', 'CD'),
 ('.', '.')]

How does the code work? UnigramTagger builds a context model from the list of tagged sentences. Because UnigramTagger inherits from ContextTagger, instead of providing a choose_tag() method, it must implement a context() method, which takes the same three arguments a choose_tag(). The context token is used to create the model, and also to look up the best tag once the model is created. This is explained graphically in the above diagram also. Overriding the context model – All taggers, inherited from ContextTagger instead of training their own model can take a pre-built model. This model is simply a Python dictionary mapping a context key to a tag. The context keys (individual words in case of UnigramTagger) will depend on what the ContextTagger subclass returns from its context() method. Code #4 : Overriding the context model

Python3

tagger = UnigramTagger(model ={'Pierre': 'NN'})
 
tagger.tag(treebank.sents()[0])

Output :

[('Pierre', 'NN'),
 ('Vinken', None),
 (', ', None),
 ('61', None),
 ('years', None),
 ('old', None),
 (', ', None),
 ('will', None),
 ('join', None),
 ('the', None),
 ('board', None),
 ('as', None),
 ('a', None),
 ('nonexecutive', None),
 ('director', None),
 ('Nov.', None),
 ('29', None),
 ('.', None)]

NLP | Training Unigram Tagger

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US