NLP | Training Tagger Based Chunker | Set 2

By Dominic Wardslaus

22 July 2024

0

2

Conll2000 corpus

defines the chunks using IOB tags.

It specifies where the chunk begins and ends, along with its types.
A part-of-speech tagger can be trained on these IOB tags to further power a ChunkerI subclass.
First using the chunked_sents() method of corpus, a tree is obtained and is then transformed to a format usable by a part-of-speech tagger.
conll_tag_chunks() uses tree2conlltags() to convert a sentence Tree into a list of three tuples of the form (word, pos, iob).
- pos: part-of-speech tag
- IOB: IOB tag for example – B_NP, I_NP to tell that work is in the beginning and inside the noun phrase respectively.
conlltags2tree() is reversal of tree2conlltags()
3-tuples are then converted into 2-tuples that the tagger can recognize.
RegexpParser class uses part-of-speech tags for chunk patterns, so part-of-speech tags are used as if they were words to tag.
conll_tag_chunks() function takes 3-tuples (word, pos, iob) and returns a list of 2-tuples of the form (pos, iob)

Code #1: Let’s understand

Python3

from nltk.chunk.util import tree2conlltags, conlltags2tree
from nltk.tree import Tree
 
t = Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')])])
 
print ("Tree2conlltags : \n", tree2conlltags(t))
 
c = conlltags2tree([('the', 'DT', 'B-NP'), ('book', 'NN', 'I-NP')])
 
print ("\nconlltags2tree : \n", c)
 
# Converting 3 tuples to 2 tuples.
print ("\nconll_tag_chunnks for tree : \n", conll_tag_chunks([t]))

Output :

Tree2conlltags : 
[('the', 'DT', 'B-NP'), ('book', 'NN', 'I-NP')]

conlltags2tree : 
Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')])])

conll_tag_chunnks for tree : 
[[('DT', 'B-NP'), ('NN', 'I-NP')]]

Code #2: TagChunker class using the conll2000 corpus

Python3

from chunkers import TagChunker
from nltk.corpus import conll2000
 
# data
conll_train = conll2000.chunked_sents('train.txt')
conll_test = conll2000.chunked_sents('test.txt')
 
# initializing the chunker
chunker = TagChunker(conll_train)
 
# testing 
score = chunker.evaluate(conll_test)
 
a = score.accuracy()
p = score.precision()
r = recall
  
print ("Accuracy of TagChunker : ", a)
print ("\nPrecision of TagChunker : ", p)
print ("\nRecall of TagChunker : ", r)

Output :

Accuracy of TagChunker : 0.8950545623403762

Precision of TagChunker : 0.8114841974355675

Recall of TagChunker : 0.8644191676944863

Note: The performance of conll2000 is not too good as treebank_chunk but conll2000 is a much larger corpus.

Code #3 : TagChunker using UnigramTagger Class

Python3

# loading libraries
from chunkers import TagChunker
from nltk.tag import UnigramTagger
 
uni_chunker = TagChunker(train_chunks,
                         tagger_classes =[UnigramTagger])
 
score = uni_chunker.evaluate(test_chunks)
 
a = score.accuracy()
 
print ("Accuracy of TagChunker : ", a)

Output :

Accuracy of TagChunker : 0.9674925924335466

The tagger_classes argument is passed directly to the backoff_tagger() function, so that means they must be subclasses of SequentialBackoffTagger. In testing, the default of tagger_classes = [UnigramTagger, BigramTagger] generally produces the best results, but it can vary with different corpuses.

NLP | Training Tagger Based Chunker | Set 2

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Watch Modern Family From Anywhere in 2025 by Gjurgjica Panova

Is Wattpad Safe for Kids? What Parents Need to Know in 2025 by Toma Novakovic

5 Best Free VPNs for Europe in 2025: Fast & Safe by Raven Wu

How to Get a CyberGhost Free Trial: Works 100% in 2025 by Raven Wu

Recent Comments

EDITOR PICKS

How to Watch Modern Family From Anywhere in 2025 by Gjurgjica Panova

Is Wattpad Safe for Kids? What Parents Need to Know in 2025 by Toma Novakovic

5 Best Free VPNs for Europe in 2025: Fast & Safe by Raven Wu

POPULAR POSTS

How to Watch Modern Family From Anywhere in 2025 by Gjurgjica Panova

Is Wattpad Safe for Kids? What Parents Need to Know in 2025 by Toma Novakovic

5 Best Free VPNs for Europe in 2025: Fast & Safe by Raven Wu

POPULAR CATEGORY

ABOUT US

FOLLOW US