NLP | Partial parsing with Regex

22 July 2024

1

Defining a grammar to parse 3 phrase types.
ChunkRule class that looks for an optional determiner followed by one or more nouns is used for noun phrases.
To add an adjective to the front of a noun chunk, MergeRule class is used.
Any IN word is simply chunked for the prepositional phrases.
an optional modal word (such as should) followed by a verb is chunked for the verb phrases.

Code #1 :

chunker = RegexpParser(r''' 
                       NP: 
                           # chunk optional determiner with nouns 
                           {<DT>?<NN.*>+} 
                             
                           # merge adjective with noun chunk 
                           <JJ>{}<NN.*>                             
                             
                       PP: 
                           # chunk preposition 
                           {<IN>}  
                       VP: 
                           # chunk optional modal with verb 
                           {<MD>?<VB.*>} ''') 
      
from nltk.corpus import conll2000 
  
score = chunker.evaluate(conll2000.chunked_sents()) 
  
print ("Accuracy : ", score.accuracy()) 

Output :

Accuracy : 0.6148573545757688

treebank_chunk corpus is a special version of the treebank corpus and it provides a chunked_sents() method. Duw to its file format, the regular treebank corpus cannot provide that method.

Code #2 : Using treebank_chunk

from nltk.corpus import treebank_chunk 
  
treebank_score = chunker.evaluate( 
        treebank_chunk.chunked_sents()) 
  
print ("Accuracy : ", treebank_score.accuracy() 

Output :

Accuracy : 0.49033970276008493

Chunk Score Metrices
It provides metrics other than accuracy. Of the chunks
Precision means how many were correct.
Recall means how well the chunker did at finding correct chunks compared to how many total chunks there were.

Code #3 : Chunk Score Metrices

print ("Precision : ", score.precision()) 
  
print ("\nRecall : ", score.recall()) 
  
print ("\nLength for missed one : ", len(score.missed())) 
  
print ("\nLength for incorrect one : ", len(score.incorrect())) 
  
print ("\nLength for correct one : ", len(score.correct())) 
  
print ("\nLength for guessed one : ", len(score.guessed()))

Output :

Precision : 0.60201948127375

Recall : 0.606072502505847

Length for missed one : 47161

Length for incorrect one : 47967

Length for correct one : 119720

Length for guessed one : 120526

NLP | Partial parsing with Regex

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

OnePlus’ decision to ditch Samsung’s OLED screens could backfire in the US

Recent Comments

EDITOR PICKS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR POSTS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR CATEGORY

ABOUT US

FOLLOW US