NLP | Classifier-based Chunking | Set 1

26 July 2024

1

The ClassifierBasedTagger class learns from the features, unlike most part-of-speech taggers. ClassifierChunker class can be created such that it can learn from both the words and part-of-speech tags, instead of just from the part-of-speech tags as the TagChunker class does.

The (word, pos, iob) 3-tuples is converted into ((word, pos), iob) 2-tuples using the chunk_trees2train_chunks() from tree2conlltags(), to remain compatible with the 2-tuple (word, pos) format required for training a ClassiferBasedTagger class.

Code #1 : Let’s understand

# Loading Libraries 
from nltk.chunk import ChunkParserI 
from nltk.chunk.util import tree2conlltags, conlltags2tree 
from nltk.tag import ClassifierBasedTagger 
  
def chunk_trees2train_chunks(chunk_sents): 
  
    # Using tree2conlltags 
    tag_sents = [tree2conlltags(sent) for 
                 sent in chunk_sents] 
  
    3-tuple is converted to 2-tuple
    return [[((w, t), c) for 
             (w, t, c) in sent] for sent in tag_sents] 

Now, a feature detector function is needed to pass into ClassifierBasedTagger. Any feature detector function used with the ClassifierChunker class (defined next) should recognize that tokens are a list of (word, pos) tuples, and have the same function signature as prev_next_pos_iob(). To give the classifier as much information as we can, this feature set contains the current, previous, and next word and part-of-speech tag, along with the previous IOB tag.

Code #2 : detector function

def prev_next_pos_iob(tokens, index, history): 
      
    word, pos = tokens[index] 
    if index == 0: 
        prevword, prevpos, previob = ('<START>', )*3
    else: 
        prevword, prevpos = tokens[index-1] 
        previob = history[index-1] 
          
    if index == len(tokens) - 1: 
        nextword, nextpos = ('<END>', )*2
    else: 
        nextword, nextpos = tokens[index + 1] 
        feats = {'word': word, 
                 'pos': pos, 
                 'nextword': nextword, 
                 'nextpos': nextpos, 
                 'prevword': prevword, 
                 'prevpos': prevpos, 
                 'previob': previob 
                 } 
    return feats 

Now, ClassifierChunker class is need which uses an internal ClassifierBasedTagger with training sentences from chunk_trees2train_chunks() and features extracted using prev_next_pos_iob(). As a subclass of ChunkerParserI, ClassifierChunker implements the parse() method to convert the ((w, t), c) tuples, produced by the internal tagger into Trees using conlltags2tree()

Code #3 :

class ClassifierChunker(ChunkParserI): 
    def __init__(self, train_sents,  
                 feature_detector = prev_next_pos_iob, **kwargs): 
          
        if not feature_detector: 
            feature_detector = self.feature_detector 
            train_chunks = chunk_trees2train_chunks(train_sents) 
            self.tagger = ClassifierBasedTagger(train = train_chunks, 
            feature_detector = feature_detector, **kwargs) 
              
    def parse(self, tagged_sent): 
          
        if not tagged_sent: return None
        chunks = self.tagger.tag(tagged_sent) 
          
        return conlltags2tree( 
                [(w, t, c) for ((w, t), c) in chunks]) 

NLP | Classifier-based Chunking | Set 1

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US