NLP | Chunking and chinking with RegEx

27 July 2024

0

Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with Part-of-Speech).
Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks. A ChunkRule class specifies what words or patterns to include and exclude in a chunk.

Defining Chunk patterns :
Chuck patterns are normal regular expressions which are modified and designed to match the part-of-speech tag designed to match sequences of part-of-speech tags. Angle brackets are used to specify an individual tag for example – to match a noun tag. One can define multiple tags in the same way.

Code #1 : Converting chunks to RegEx Pattern.

Python3

# Laading Library
from nltk.chunk.regexp import tag_pattern2re_pattern
 
# Chunk Pattern to RegEx Pattern 
print("Chunk Pattern : ", tag_pattern2re_pattern('<DT>?<NN.*>+'))

Output :

Chunk Pattern :  ()?(<(NN[^\{\}]*)>)+

Curly Braces are used to specify a chunk like {} and to specify the chink pattern one can just flip the braces }{. For a particular phrase type, these rules (chunk and a chink pattern) can be combined into grammar.

Code #2 : Parsing the sentence with RegExParser.

Note: To obtain a tree representation of parsed chunks and chinks, install third party `svgling` helper library.

Python3

from nltk.chunk import RegexpParser
 
# Introducing the Pattern
chunker = RegexpParser(r'''
NP:
{<DT><NN.*><.*>*<NN.*>} 
}<VB.*>{
''')
 
chunker.parse([('the', 'DT'), ('book', 'NN'), (
    'has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')])

Output :

Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), 
Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])

Tree representation of chunks and chinks

NLP | Chunking and chinking with RegEx

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US