Thursday, January 9, 2025
Google search engine
HomeLanguagesNLP | Chunk Tree to Text and Chaining Chunk Transformation

NLP | Chunk Tree to Text and Chaining Chunk Transformation

We can convert a tree or subtree back to a sentence or chunk string. To understand how to do it – the code below uses the first tree of the treebank_chunk corpus. 

Code #1: Joining the words in a tree with space. 

Python3




# Loading library   
from nltk.corpus import treebank_chunk
 
# tree
tree = treebank_chunk.chunked_sents()[0]
 
print ("Tree : \n", tree)
 
print ("\nTree leaves : \n", tree.leaves())
 
print ("\nSentence from tree : \n", ' '.join(
        [w for w, t in tree.leaves()]))


Output :

Tree : 
 (S
  (NP Pierre/NNP Vinken/NNP), /,
  (NP 61/CD years/NNS)
  old/JJ, /,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

Tree leaves : 
 [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), 
 ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'),
 ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
 ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

Sentence from tree : 
 Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29 .

As in the code above, the punctuations are not right because the period and commas are treated as special words. So, they get the surrounding spaces as well. But in the code below we can fix this using regular expression substitution. 

Code #2 : chunk_tree_to_sent() function to improve Code 1 

Python3




import re
 
# defining regex expression
punct_re = re.compile(r'\s([, \.;\?])')
 
def chunk_tree_to_sent(tree, concat =' '):
 
    s = concat.join([w for w, t in tree.leaves()])
    return re.sub(punct_re, r'\g<1>', s)


  Code #3 : Evaluating chunk_tree_to_sent() 

Python3




# Loading library   
from nltk.corpus import treebank_chunk
from transforms import chunk_tree_to_sent
 
# tree
tree = treebank_chunk.chunked_sents()[0]
 
print ("Tree : \n", tree)
 
print ("\nTree leaves : \n", tree.leaves())
 
print ("Tree to sentence : ", chunk_tree_to_sent(tree))


Output :

Tree : 
 (S
  (NP Pierre/NNP Vinken/NNP), /,
  (NP 61/CD years/NNS)
  old/JJ, /,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

Tree leaves : 
 [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), 
 ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'),
 ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
 ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

Tree to sentence : 
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

Chaining Chunk Transformation 
The transformation functions can be chained together to normalize chunks and the resulting chunks are often shorter and it still holds the same meaning.

 In the code below – a single chunk and an optional list of transform functions is passed to the function. This function will call each transform function on the chunk and will return the final chunk. 

Code #4 : 

Python3




def transform_chunk(
        chunk, chain = [filter_insignificant,
                        swap_verb_phrase, swap_infinitive_phrase,
                        singularize_plural_noun], trace = 0):
    for f in chain:
        chunk = f(chunk)
         
        if trace:
            print (f.__name__, ':', chunk)
             
    return chunk


  Code #5 : Evaluating transform_chunk 

Python3




from transforms import transform_chunk
 
chunk = [('the', 'DT'), ('book', 'NN'), ('of', 'IN'),
         ('recipes', 'NNS'), ('is', 'VBZ'), ('delicious', 'JJ')]
 
print ("Chunk : \n", chunk)
 
print ("\nTransformed Chunk : \n", transform_chunk(chunk))


Output :

Chunk :  
[('the', 'DT'), ('book', 'NN'), ('of', 'IN'), ('recipes', 'NNS'), 
('is', 'VBZ'), ('delicious', 'JJ')]

Transformed Chunk : 
[('delicious', 'JJ'), ('recipe', 'NN'), ('book', 'NN')]

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments