NLP | Extracting Named Entities

27 July 2024

0

Recognizing named entity is a specific kind of chunk extraction that uses entity tags along with chunk tags. Common entity tags include PERSON, LOCATION and ORGANIZATION. POS tagged sentences are parsed into chunk trees with normal chunking but the trees labels can be entity tags in place of chunk phrase tags. NLTK has already a pre-trained named entity chunker which can be used using ne_chunk() method in the nltk.chunk module. This method chunks a single sentence into a Tree. Code #1 : Using ne-chunk() on tagged sentence of the treebank_chunk corpus

Python3

from nltk.corpus import treebank_chunk
from nltk.chunk import ne_chunk
 
ne_chunk(treebank_chunk.tagged_sents()[0])

Output :

Tree('S', [Tree('PERSON', [('Pierre', 'NNP')]), Tree('ORGANIZATION', 
[('Vinken', 'NNP')]), (', ', ', '), ('61', 'CD'), ('years', 'NNS'), 
('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'),
('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), 
('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')])

two entity tags are found: PERSON and ORGANIZATION. Each of these subtrees contains a list of the words that are recognized as a PERSON or ORGANIZATION. Code #2 : Method to extract named entities using leaves of all the subtrees

Python3

def sub_leaves(tree, label):
    return [t.leaves() 
            for t in tree.subtrees(
                    lambda s: label() == label)]

Code #3 : using method to get all the PERSON or ORGANIZATION leaves from a tree

Python3

tree = ne_chunk(treebank_chunk.tagged_sents()[0])
 
from chunkers import sub_leaves
print ("Named entities of PERSON : ", 
       sub_leaves(tree, 'PERSON'))
 
print ("\nNamed entities of ORGANIZATION : ", 
       sub_leaves(tree, 'ORGANIZATION'))

Output :

Named entities of PERSON : [[('Pierre', 'NNP')]]

Named entities of ORGANIZATION : [[('Vinken', 'NNP')]]

To process multiple sentences at a time, chunk_ne_sents() is used. In the code below, first 10 sentences from treebank_chunk.tagged_sents() are processed to get ORGANIZATION sub_leaves(). Code #4 : Let’s understand chunk_ne_sents()

Python3

from nltk.chunk import chunk_ne_sents
from nltk.corpus import treebank_chunk
 
trees = chunk_ne_sents(treebank_chunk.tagged_sents()[:10])
[sub_leaves(t, 'ORGANIZATION') for t in trees]

Output :

[[[('Vinken', 'NNP')]], [[('Elsevier', 'NNP')]], [[('Consolidated', 'NNP'), 
('Gold', 'NNP'), ('Fields', 'NNP')]], [], [], [[('Inc.', 'NNP')], 
[('Micronite', 'NN')]], [[('New', 'NNP'), ('England', 'NNP'),
('Journal', 'NNP')]], [[('Lorillard', 'NNP')]], [], []]

NLP | Extracting Named Entities

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best VPNs for Binance in 2025: Safe & Secure by Gjurgjica Panova

5 Best VPNs for Roblox in 2025: Lag-Free & Secure Gaming by Gjurgjica Panova

5 Best VPNs for the Philippines in 2025: Private & Fast by Gjurgjica Panova

5 Best VPNs for Venezuela in 2025: Access a Free Web by Raven Wu

Recent Comments

EDITOR PICKS

5 Best VPNs for Binance in 2025: Safe & Secure by Gjurgjica Panova

5 Best VPNs for Roblox in 2025: Lag-Free & Secure Gaming by Gjurgjica Panova

5 Best VPNs for the Philippines in 2025: Private & Fast by Gjurgjica Panova

POPULAR POSTS

5 Best VPNs for Binance in 2025: Safe & Secure by Gjurgjica Panova

5 Best VPNs for Roblox in 2025: Lag-Free & Secure Gaming by Gjurgjica Panova

5 Best VPNs for the Philippines in 2025: Private & Fast by Gjurgjica Panova

POPULAR CATEGORY

ABOUT US

FOLLOW US