NLP | Splitting and Merging Chunks

26 July 2024

0

SplitRule class : It splits a chunk based on the specified split pattern for the purpose. It is specified like <NN.*>}{<.*> i.e. two opposing curly braces surrounded by a pattern on either side.

MergeRule class : It merges two chunks together based on the ending of the first chunk and the beginning of the second chunk. It is specified like <NN.*>{}<.*> i.e. curly braces facing each other.

Example of how the steps are performed

Starting with the sentence tree.

Chunking complete sentence.

Chunks are split into multiple chunks.

Chunk with a determiner is split into separate chunks.

Chunks ending with a noun are merged with the next chunk.

Code #1 – Constructing Tree

from nltk.chunk import RegexpParser 
chunker = RegexpParser(r''' 
NP: 
{<DT><.*>*<NN.*>} 
<NN.*>}{<.*> 
<.*>}{<DT> 
<NN.*>{}<NN.*> 
''') 
sent = [('the', 'DT'), ('sushi', 'NN'), ('roll', 'NN'), ('was', 'VBD'),  
        ('filled', 'VBN'), ('with', 'IN'), ('the', 'DT'), ('fish', 'NN')] 
chunker.parse(sent) 

Output:

Tree('S', [Tree('NP', [('the', 'DT'), ('sushi', 'NN'), ('roll', 'NN')]), 
Tree('NP', [('was', 'VBD'), ('filled', 'VBN'), ('with', 'IN')]), 
Tree('NP', [('the', 'DT'), ('fish', 'NN')])])

Code #2 – Splitting and Merging

# Loading Libraries 
from nltk.chunk.regexp import ChunkString, ChunkRule, ChinkRule 
from nltk.tree import Tree 
from nltk.chunk.regexp import MergeRule, SplitRule 
  
# Chunk String 
chunk_string = ChunkString(Tree('S', sent)) 
print ("Chunk String : ", chunk_string) 
  
# Applying Chunk Rule 
ur = ChunkRule('<DT><.*>*<NN.*>', 'chunk determiner to noun') 
ur.apply(chunk_string) 
print ("\nApplied ChunkRule : ", chunk_string) 
  
# Splitting 
sr1 = SplitRule('<NN.*>', '<.*>', 'split after noun') 
sr1.apply(chunk_string) 
print ("\nSplitting Chunk String : ", chunk_string) 
  
  
sr2 = SplitRule('<.*>', '<DT>', 'split before determiner') 
sr2.apply(chunk_string) 
print ("\nFurther Splitting Chunk String : ", chunk_string) 
  
# Merging 
mr = MergeRule('<NN.*>', '<NN.*>', 'merge nouns') 
mr.apply(chunk_string) 
print ("\nMerging Chunk String : ", chunk_string) 
  
# Back to Tree 
chunk_string.to_chunkstruct() 

Output:

Chunk String :   <DT>  <NN>  <NN>  <VBD>  <VBN>  <IN>  <DT>  <NN> 

Applied ChunkRule :  {<DT>  <NN>  <NN>  <VBD>  <VBN>  <IN>  <DT>  <NN>}

Splitting Chunk String :  {<DT>  <NN>}{<NN>}{<VBD>  <VBN>  <IN>  <DT>  <NN>}

Further Splitting Chunk String :  {<DT>  <NN>}{<NN>}{<VBD>  <VBN>  <IN>}{<DT>  <NN>}

Merging Chunk String :  {<DT>  <NN>  <NN>}{<VBD>  <VBN>  <IN>}{<DT>  <NN>}

Tree('S', [Tree('CHUNK', [('the', 'DT'), ('sushi', 'NN'), ('roll', 'NN')]), 
          Tree('CHUNK', [('was', 'VBD'), ('filled', 'VBN'), ('with', 'IN')]), 
          Tree('CHUNK', [('the', 'DT'), ('fish', 'NN')])])

NLP | Splitting and Merging Chunks

Example of how the steps are performed

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US