NLP | Chunking using Corpus Reader

23 July 2024

2

What are Chunks?
These are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks. A ChunkRule class specifies what words or patterns to include and exclude in a chunk.
How it works :

The ChunkedCorpusReader class works similar to the TaggedCorpusReader for getting tagged tokens, plus it also provides three new methods for getting chunks.
An instance of nltk.tree.Tree represents each chunk.
Noun phrase trees look like Tree(‘NP’, […]) where as Sentence level trees look like Tree(‘S’, […]).
A list of sentence trees, with each noun phrase as a subtree of the sentence is obtained in n chunked_sents()
A list of noun phrase trees alongside tagged tokens of words that were not in a chunk is obtained in chunked_words().

Diagram listing the major methods:

Code #1 : Creating a ChunkedCorpusReader for words

Python3

# Using ChunkedCorpusReader
from nltk.corpus.reader import ChunkedCorpusReader
 
# initializing
x = ChunkedCorpusReader('.', r'.*\.chunk')
 
words = x.chunked_words()
print ("Words : \n", words)

Output :

Words : 
[Tree('NP', [('Earlier', 'JJR'), ('staff-reduction', 'NN'), 
('moves', 'NNS')]), ('have', 'VBP'), ...]

Code #2 : For sentence

Python3

Chunked Sentence = x.chunked_sents()
print ("Chunked Sentence : \n", tagged_sent)

Output :

Chunked Sentence : 
[Tree('S', [Tree('NP', [('Earlier', 'JJR'), ('staff-reduction', 'NN'), 
('moves', 'NNS')]), ('have', 'VBP'), ('trimmed', 'VBN'), ('about', 'IN'), 
Tree('NP', [('300', 'CD'), ('jobs', 'NNS')]), (', ', ', '),
Tree('NP', [('the', 'DT'), ('spokesman', 'NN')]), ('said', 'VBD'), ('.', '.')])]

Code #3 : For paragraphs

Python3

para = x.chunked_paras()()
print ("para : \n", para)

Output :

[[Tree('S', [Tree('NP', [('Earlier', 'JJR'), ('staff-reduction',
'NN'), ('moves', 'NNS')]), ('have', 'VBP'), ('trimmed', 'VBN'),
('about', 'IN'), 
Tree('NP', [('300', 'CD'), ('jobs', 'NNS')]), (', ', ', '), 
Tree('NP', [('the', 'DT'), ('spokesman', 'NN')]), ('said', 'VBD'), ('.', '.')])]]

NLP | Chunking using Corpus Reader

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Robin Bolton – Head of Product at Friend MTS by Shauli Zacks

5 Best VPNs for Los Angeles in 2024: Fast & Secure by Gjurgjica Panova

How to Change Your Smart TV Region: Full 2024 Guide by Raven Wu

Samsung Galaxy S25 series bags FCC certification

Recent Comments

EDITOR PICKS

Interview With Robin Bolton – Head of Product at Friend MTS by Shauli Zacks

5 Best VPNs for Los Angeles in 2024: Fast & Secure by Gjurgjica Panova

How to Change Your Smart TV Region: Full 2024 Guide by Raven Wu

POPULAR POSTS

Interview With Robin Bolton – Head of Product at Friend MTS by Shauli Zacks

5 Best VPNs for Los Angeles in 2024: Fast & Secure by Gjurgjica Panova

How to Change Your Smart TV Region: Full 2024 Guide by Raven Wu

POPULAR CATEGORY

ABOUT US

FOLLOW US