NLP | Flattening Deep Tree

22 July 2024

0

Some of the corpora that we use are often deep trees of nested phrases. But working on such deep trees is a tedious job for training the chunker. As IOB tag parsing is not designed for nested chunks. So, in order to use these trees for chunker training, we must flatten them.
Well, POS (part of Speech) are actually part of the tree structure instead of being in the word. These are used with Tree.pos() method, designed specifically for combining words with preterminal Tree labels such as part-of-speech tags.

Code #1 : Class for flattening the deep tree

from nltk.tree import Tree 
  
def flatten_childtrees(trees): 
    children = [] 
      
    for t in trees: 
        if t.height() < 3: 
            children.extend(t.pos()) 
              
        elif t.height() == 3: 
            children.append(Tree(t.label(), t.pos())) 
              
        else: 
            children.extend( 
                    flatten_childtrees()) 
              
    return children 
  
  
def flatten_deeptree(tree): 
    return Tree(tree.label(),  
                flatten_childtrees()) 
     

Code #2 : Evaluating flatten_deeptree()

from nltk.corpus import treebank 
from transforms import flatten_deeptree 
  
print ("Deep Tree : \n", treebank.parsed_sents()[0]) 
  
print ("\nFlattened Tree : \n",  
       flatten_deeptree(treebank.parsed_sents()[0]))     

Output :

Deep Tree : 
 (S
  (NP-SBJ
    (NP (NNP Pierre) (NNP Vinken))
    (,, )
    (ADJP (NP (CD 61) (NNS years)) (JJ old))
    (,, ))
  (VP
    (MD will)
    (VP
      (VB join)
      (NP (DT the) (NN board))
      (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
      (NP-TMP (NNP Nov.) (CD 29))))
  (. .))

Flattened Tree : 
Tree('S', [Tree('NP', [('Pierre', 'NNP'), ('Vinken', 'NNP')]), (', ',
', '), Tree('NP', [('61', 'CD'), ('years', 'NNS')]), ('old', 'JJ'),
(', ', ', '), ('will', 'MD'), ('join', 'VB'), Tree('NP', [('the',
'DT'), ('board', 'NN')]), ('as', 'IN'), Tree('NP', [('a', 'DT'),
('nonexecutive', 'JJ'), ('director', 'NN')]), Tree('NP-TMP', [('Nov.',
'NNP'), ('29', 'CD')]), ('.', '.')])

The result is a much flatter Tree that only includes NP phrases. Words that are not part of an NP phrase are separated

How it works ?

flatten_deeptree() : returns a new Tree from the given tree by calling flatten_childtrees() on each of the given tree’s children.
flatten_childtrees() : Recursively drills down into the Tree until it finds child trees whose height() is equal to or less than 3.

Code #3 : height()

from nltk.corpus import treebank 
from transforms import flatten_deeptree 
  
from nltk.tree import Tree 
  
print ("Height : ",  
       Tree('NNP', ['Pierre']).height()) 
  
print ("\nHeight : ", Tree( 
        'NP', [Tree('NNP', ['Pierre']),  
                    Tree('NNP', ['Vinken'])]). height()) 

Output :

Height : 2

Height : 3

NLP | Flattening Deep Tree

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

Google wants to hear your thoughts on the Android 15 QPR2 Beta

Recent Comments

EDITOR PICKS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR POSTS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR CATEGORY

ABOUT US

FOLLOW US