NLP | Training Tagger Based Chunker | Set 1

20 June 2025

0

To train a chunker is an alternative to manually specifying regular expression (regex) chunk patterns. But manually training to specify the expression is a tedious task to do as it follows the hit and trial method to get the exact right patterns. So, existing corpus data can be used to train chunkers.

In the codes below, we are using treebank_chunk corpus to produce chunked sentences in the form of trees.
-> To train a tagger-based chunker – chunked_sents() methods are used by a TagChunker class.
-> To extract a list of (pos, iob) tuples from a list of Trees – the TagChunker class uses a helper function, conll_tag_chunks().

These tuples are then finally used to train a tagger. and it learns IOB tags for part-of-speech tags.

Code #1 : Let’s understand the Chunker class for training.

from nltk.chunk import ChunkParserI 
from nltk.chunk.util import tree2conlltags, conlltags2tree 
from nltk.tag import UnigramTagger, BigramTagger 
from tag_util import backoff_tagger 
  
  
def conll_tag_chunks(chunk_data): 
      
    tagged_data = [tree2conlltags(tree) for 
                    tree in chunk_data] 
      
    return [[(t, c) for (w, t, c) in sent]  
            for sent in tagged_data] 
      
class TagChunker(ChunkParserI): 
      
    def __init__(self, train_chunks,  
                 tagger_classes =[UnigramTagger, BigramTagger]): 
          
        train_data = conll_tag_chunks(train_chunks) 
        self.tagger = backoff_tagger(train_data, tagger_classes) 
          
    def parse(self, tagged_sent): 
        if not tagged_sent:  
            return None
          
        (words, tags) = zip(*tagged_sent) 
        chunks = self.tagger.tag(tags) 
        wtc = zip(words, chunks) 
          
        return conlltags2tree([(w, t, c) for (w, (t, c)) in wtc]) 

Output :

Training TagChunker

Code #2 : Using the Tag Chunker.

# loading libraries 
from chunkers import TagChunker 
from nltk.corpus import treebank_chunk 
  
# data from treebank_chunk corpus 
train_data = treebank_chunk.chunked_sents()[:3000] 
test_data = treebank_chunk.chunked_sents()[3000:] 
  
# Initailazing  
chunker = TagChunker(train_data) 

Code #3 : Evaluating the TagChunker

# testing 
score = chunker.evaluate(test_data) 
  
a = score.accuracy() 
p = score.precision() 
r = recall 
  
print ("Accuracy of TagChunker : ", a) 
print ("\nPrecision of TagChunker : ", p) 
print ("\nRecall of TagChunker : ", r) 

Output :

Accuracy of TagChunker : 0.9732039335251428

Precision of TagChunker : 0.9166534370535006

Recall of TagChunker : 0.9465573770491803

2 COMMENTS

บริการส่ง SMS 20 November 2025 At 3:51 am

… [Trackback]

[…] Info to that Topic: geeksforgeeks.org/nlp-training-tagger-based-chunker-set-1/ […]

Log in to leave a comment
เน็ตบ้าน ais 18 December 2025 At 6:49 pm

… [Trackback]

[…] Read More Info here to that Topic: geeksforgeeks.org/nlp-training-tagger-based-chunker-set-1/ […]

Log in to leave a comment

NLP | Training Tagger Based Chunker | Set 1

Working with Titles and Heading – Python docx Module

Creating a Receipt Calculator using Python

One Liner for Python if-elif-else Statements

2 COMMENTS

LEAVE A REPLY Cancel reply

Most Popular

A Brief Introduction to the ScaNN Index

Building AI Agents in 10 Minutes Using Natural Language with LangSmith Agent Builder + Milvus

How Anthropic Skills Change Agent Tooling — and How to Build a Custom Skill for Milvus to Quickly Spin Up RAG

Google Pixel phones have been caught leaking audio to callers

EDITOR PICKS

A Brief Introduction to the ScaNN Index

Building AI Agents in 10 Minutes Using Natural Language with LangSmith Agent Builder + Milvus

How Anthropic Skills Change Agent Tooling — and How to Build a Custom Skill for Milvus to Quickly Spin Up RAG

POPULAR POSTS

A Brief Introduction to the ScaNN Index

Building AI Agents in 10 Minutes Using Natural Language with LangSmith Agent Builder + Milvus

How Anthropic Skills Change Agent Tooling — and How to Build a Custom Skill for Milvus to Quickly Spin Up RAG

POPULAR CATEGORY

ABOUT US

FOLLOW US