Python | PoS Tagging and Lemmatization using spaCy

27 July 2024

1

spaCy is one of the best text analysis library. spaCy excels at large-scale information extraction tasks and is one of the fastest in the world. It is also the best way to prepare text for deep learning. spaCy is much faster and accurate than NLTKTagger and TextBlob.

How to Install ?

pip install spacy
python -m spacy download en_core_web_sm

Top Features of spaCy:
1. Non-destructive tokenization
2. Named entity recognition
3. Support for 49+ languages
4. 16 statistical models for 9 languages
5. Pre-trained word vectors
6. Part-of-speech tagging
7. Labeled dependency parsing
8. Syntax-driven sentence segmentation

Import and Load Library:

import spacy 
  
# python -m spacy download en_core_web_sm 
nlp = spacy.load("en_core_web_sm") 

POS-Tagging for Reviews:

It is a method of identifying words as nouns, verbs, adjectives, adverbs, etc.

import spacy 
  
# Load English tokenizer, tagger,  
# parser, NER and word vectors 
nlp = spacy.load("en_core_web_sm") 
  
# Process whole documents 
text = ("""My name is Shaurya Uppal.  
I enjoy writing articles on Lazyroar checkout 
my other article by going to my profile section.""") 
  
doc = nlp(text) 
  
# Token and Tag 
for token in doc: 
  print(token, token.pos_) 
  
# You want list of Verb tokens 
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"]) 

Output:

My DET
name NOUN
is VERB
Shaurya PROPN
Uppal PROPN
. PUNCT
I PRON
enjoy VERB
writing VERB
articles NOUN
on ADP
Lazyroar PROPN
checkout VERB
my DET
other ADJ
article NOUN
by ADP
going VERB
to ADP
my DET
profile NOUN
section NOUN
. PUNCT

# Verb based Tagged Reviews:-
Verbs: ['is', 'enjoy', 'writing', 'checkout', 'going']

Lemmatization:

It is a process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word’s lemma, or dictionary form.

import spacy 
  
# Load English tokenizer, tagger, 
# parser, NER and word vectors 
nlp = spacy.load("en_core_web_sm") 
  
# Process whole documents 
text = ("""My name is Shaurya Uppal. I enjoy writing 
          articles on Lazyroar checkout my other 
          article by going to my profile section.""") 
  
doc = nlp(text) 
  
for token in doc: 
  print(token, token.lemma_) 

Output:

My -PRON-
name name
is be
Shaurya Shaurya
Uppal Uppal
. .
I -PRON-
enjoy enjoy
writing write
articles article
on on
Lazyroar Lazyroar
checkout checkout
my -PRON-
other other
article article
by by
going go
to to
my -PRON-
profile profile
section section
. .

Python | PoS Tagging and Lemmatization using spaCy

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Surfshark Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

7 Best Offline Password Managers in 2024: Just Updated by Manual Thomas

7 Best Parental Controls for WhatsApp in 2024 by Penka Hristovska

NordVPN Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

Recent Comments

EDITOR PICKS

Surfshark Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

7 Best Offline Password Managers in 2024: Just Updated by Manual Thomas

7 Best Parental Controls for WhatsApp in 2024 by Penka Hristovska

POPULAR POSTS

Surfshark Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

7 Best Offline Password Managers in 2024: Just Updated by Manual Thomas

7 Best Parental Controls for WhatsApp in 2024 by Penka Hristovska

POPULAR CATEGORY

ABOUT US

FOLLOW US