What is Part-of-speech (POS) tagging ? It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on. Default tagging is a basic step for the part-of-speech tagging. It is performed using the DefaultTagger class. The DefaultTagger class takes ‘tag’ as a single argument. NN is the tag for a singular noun. DefaultTagger is most useful when it gets to work with most common part-of-speech tag. that’s why a noun tag is recommended. Code #1 : How it works ?
Python3
# Loading Libraries from nltk.tag import DefaultTagger # Defining Tag tagging = DefaultTagger( 'NN' ) # Tagging tagging.tag([ 'Hello' , 'Geeks' ]) |
Output :
[('Hello', 'NN'), ('Geeks', 'NN')]
Each tagger has a tag() method that takes a list of tokens (usually list of words produced by a word tokenizer), where each token is a single word. tag() returns a list of tagged tokens – a tuple of (word, tag). How DefaultTagger works ? It is a subclass of SequentialBackoffTagger and implements the choose_tag() method, having three arguments.
- list of tokens
- index of the current token, to choose the tag.
- list of the previous tags
Code #2 : Tagging Sentences
Python3
# Loading Libraries from nltk.tag import DefaultTagger # Defining Tag tagging = DefaultTagger( 'NN' ) tagging.tag_sents([[ 'welcome' , 'to' , '.' ], [ 'Geeks' , 'for' , 'Geeks' ]]) |
Output :
[[('welcome', 'NN'), ('to', 'NN'), ('.', 'NN')], [('Geeks', 'NN'), ('for', 'NN'), ('Geeks', 'NN')]]
Note: Every tag in the list of tagged sentences (in the above code) is NN as we have used DefaultTagger class. Code #3 : Illustrating how to untag.
Python3
from nltk.tag import untag untag([( 'Geeks' , 'NN' ), ( 'for' , 'NN' ), ( 'Geeks' , 'NN' )]) |
Output :
['Geeks', 'for', 'Geeks']