Saturday, September 27, 2025
HomeLanguagesNLP | Classifier-based Chunking | Set 2

NLP | Classifier-based Chunking | Set 2

Using the data from the treebank_chunk corpus let us evaluate the chunkers (prepared in the previous article). Code #1 : 

Python3




# loading libraries
from chunkers import ClassifierChunker
from nltk.corpus import treebank_chunk
 
train_data = treebank_chunk.chunked_sents()[:3000]
test_data = treebank_chunk.chunked_sents()[3000:]
 
# initializing
chunker = ClassifierChunker(train_data)
 
# evaluation
score = chunker.evaluate(test_data)
 
a = score.accuracy()
p = score.precision()
r = recall
   
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)


Output :

Accuracy of ClassifierChunker : 0.9721733155838022

Precision of ClassifierChunker : 0.9258838793383068

Recall of ClassifierChunker : 0.9359016393442623

  Code #2 : Let’s compare the performance of conll_train 

Python3




chunker = ClassifierChunker(conll_train)
score = chunker.evaluate(conll_test)
 
a = score.accuracy()
p = score.precision()
r = score.recall()
   
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)


Output :

Accuracy of ClassifierChunker : 0.9264622074002153

Precision of ClassifierChunker : 0.8737924310910219

Recall of ClassifierChunker : 0.9007354620620346

the word can be passed through the tagger into our feature detector function, by creating nested 2-tuples of the form ((word, pos), iob), The chunk_trees2train_chunks() method produces these nested 2-tuples. The following features are extracted:

  • The current word and part-of-speech tag
  • The previous word and IOB tag, part-of-speech tag
  • The next word and part-of-speech tag

The ClassifierChunker class uses an internal ClassifierBasedTagger and prev_next_pos_iob() as its default feature_detector. The results from the tagger, which are in the same nested 2-tuple form, are then reformatted into 3-tuples to return a final Tree using conlltags2tree().   Code #3 : different classifier builder 

Python3




# loading libraries
from chunkers import ClassifierChunker
from nltk.corpus import treebank_chunk
from nltk.classify import MaxentClassifier
 
train_data = treebank_chunk.chunked_sents()[:3000]
test_data = treebank_chunk.chunked_sents()[3000:]
 
 
builder = lambda toks: MaxentClassifier.train(
            toks, trace = 0, max_iter = 10, min_lldelta = 0.01)
 
chunker = ClassifierChunker(
        train_data, classifier_builder = builder)
 
score = chunker.evaluate(test_data)
   
a = score.accuracy()
p = score.precision()
r = score.recall()
 
print ("Accuracy of ClassifierChunker : ", a)
print ("\nPrecision of ClassifierChunker : ", p)
print ("\nRecall of ClassifierChunker : ", r)


Output :

Accuracy of ClassifierChunker : 0.9743204362949285

Precision of ClassifierChunker : 0.9334423548650859

Recall of ClassifierChunker : 0.9357377049180328

ClassifierBasedTagger class defaults to using NaiveBayesClassifier.train as its classifier_builder. But any classifier can be used by overriding the classifier_builder keyword argument.

Dominic
Dominichttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Dominic
32322 POSTS0 COMMENTS
Milvus
84 POSTS0 COMMENTS
Nango Kala
6690 POSTS0 COMMENTS
Nicole Veronica
11857 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11913 POSTS0 COMMENTS
Shaida Kate Naidoo
6804 POSTS0 COMMENTS
Ted Musemwa
7073 POSTS0 COMMENTS
Thapelo Manthata
6763 POSTS0 COMMENTS
Umr Jansen
6768 POSTS0 COMMENTS