What is Part-of-speech (POS) tagging ?
It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.
Example of Part-of-speech (POS) tagged corpus
The/at-tl expense/nn and/cc time/nn involved/vbn are/ber astronomical/jj ./.
format for a tagged corpus is of the form word/tag. Each word is with a tag denoting its POS. For example, nn refers to a noun, vb is a verb.
Code #1 : Creating a TaggedCorpusReader. for words
Python3
# Using TaggedCorpusReader from nltk.corpus.reader import TaggedCorpusReader # initializing x = TaggedCorpusReader( '.' , r '.*\.pos' ) words = x.words() print ( "Words : \n" , words) tag_words = x.tagged_words() print ( "\ntag_words : \n" , tag_words) |
Output :
Words : ['The', 'expense', 'and', 'time', 'involved', 'are', ...] tag_words : [('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ...]
Code #2 : For sentence
Python3
tagged_sent = x.tagged_sents() print ( "tagged_sent : \n" , tagged_sent) |
Output :
tagged_sent : [[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'), ('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]
Code #3 : For paragraphs
Python3
para = x.para() print ( "para : \n" , para) tagged_para = x.tagged_paras() print ( "\ntagged_paras : \n" , tagged_paras) |
Output :
para: [[['The', 'expense', 'and', 'time', 'involved', 'are', 'astronomical', '.']]] tagged_paras : [[[('The', 'AT-TL'), ('expense', 'NN'), ('and', 'CC'), ('time', 'NN'), ('involved', 'VBN'), ('are', 'BER'), ('astronomical', 'JJ'), ('.', '.')]]]