We can convert a tree or subtree back to a sentence or chunk string. To understand how to do it – the code below uses the first tree of the treebank_chunk corpus.
Code #1: Joining the words in a tree with space.
Python3
# Loading library from nltk.corpus import treebank_chunk # tree tree = treebank_chunk.chunked_sents()[ 0 ] print ("Tree : \n", tree) print ("\nTree leaves : \n", tree.leaves()) print ("\nSentence from tree : \n", ' ' .join( [w for w, t in tree.leaves()])) |
Output :
Tree : (S (NP Pierre/NNP Vinken/NNP), /, (NP 61/CD years/NNS) old/JJ, /, will/MD join/VB (NP the/DT board/NN) as/IN (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD) ./.) Tree leaves : [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')] Sentence from tree : Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29 .
As in the code above, the punctuations are not right because the period and commas are treated as special words. So, they get the surrounding spaces as well. But in the code below we can fix this using regular expression substitution.
Code #2 : chunk_tree_to_sent() function to improve Code 1
Python3
import re # defining regex expression punct_re = re. compile (r '\s([, \.;\?])' ) def chunk_tree_to_sent(tree, concat = ' ' ): s = concat.join([w for w, t in tree.leaves()]) return re.sub(punct_re, r '\g<1>' , s) |
Code #3 : Evaluating chunk_tree_to_sent()
Python3
# Loading library from nltk.corpus import treebank_chunk from transforms import chunk_tree_to_sent # tree tree = treebank_chunk.chunked_sents()[ 0 ] print ("Tree : \n", tree) print ("\nTree leaves : \n", tree.leaves()) print ("Tree to sentence : ", chunk_tree_to_sent(tree)) |
Output :
Tree : (S (NP Pierre/NNP Vinken/NNP), /, (NP 61/CD years/NNS) old/JJ, /, will/MD join/VB (NP the/DT board/NN) as/IN (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD) ./.) Tree leaves : [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')] Tree to sentence : Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
Chaining Chunk Transformation
The transformation functions can be chained together to normalize chunks and the resulting chunks are often shorter and it still holds the same meaning.
In the code below – a single chunk and an optional list of transform functions is passed to the function. This function will call each transform function on the chunk and will return the final chunk.
Code #4 :
Python3
def transform_chunk( chunk, chain = [filter_insignificant, swap_verb_phrase, swap_infinitive_phrase, singularize_plural_noun], trace = 0 ): for f in chain: chunk = f(chunk) if trace: print (f.__name__, ':' , chunk) return chunk |
Code #5 : Evaluating transform_chunk
Python3
from transforms import transform_chunk chunk = [( 'the' , 'DT' ), ( 'book' , 'NN' ), ( 'of' , 'IN' ), ( 'recipes' , 'NNS' ), ( 'is' , 'VBZ' ), ( 'delicious' , 'JJ' )] print ("Chunk : \n", chunk) print ("\nTransformed Chunk : \n", transform_chunk(chunk)) |
Output :
Chunk : [('the', 'DT'), ('book', 'NN'), ('of', 'IN'), ('recipes', 'NNS'), ('is', 'VBZ'), ('delicious', 'JJ')] Transformed Chunk : [('delicious', 'JJ'), ('recipe', 'NN'), ('book', 'NN')]