spaCy is one of the best text analysis library. spaCy excels at large-scale information extraction tasks and is one of the fastest in the world. It is also the best way to prepare text for deep learning. spaCy is much faster and accurate than NLTKTagger and TextBlob.
How to Install ?
pip install spacy python -m spacy download en_core_web_sm
Top Features of spaCy:
1. Non-destructive tokenization
2. Named entity recognition
3. Support for 49+ languages
4. 16 statistical models for 9 languages
5. Pre-trained word vectors
6. Part-of-speech tagging
7. Labeled dependency parsing
8. Syntax-driven sentence segmentation
Import and Load Library:
import spacy # python -m spacy download en_core_web_sm nlp = spacy.load( "en_core_web_sm" ) |
POS-Tagging for Reviews:
It is a method of identifying words as nouns, verbs, adjectives, adverbs, etc.
import spacy # Load English tokenizer, tagger, # parser, NER and word vectors nlp = spacy.load( "en_core_web_sm" ) # Process whole documents text = ( """My name is Shaurya Uppal. I enjoy writing articles on Lazyroar checkout my other article by going to my profile section.""" ) doc = nlp(text) # Token and Tag for token in doc: print (token, token.pos_) # You want list of Verb tokens print ( "Verbs:" , [token.text for token in doc if token.pos_ = = "VERB" ]) |
Output:
My DET name NOUN is VERB Shaurya PROPN Uppal PROPN . PUNCT I PRON enjoy VERB writing VERB articles NOUN on ADP Lazyroar PROPN checkout VERB my DET other ADJ article NOUN by ADP going VERB to ADP my DET profile NOUN section NOUN . PUNCT # Verb based Tagged Reviews:- Verbs: ['is', 'enjoy', 'writing', 'checkout', 'going']
Lemmatization:
It is a process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word’s lemma, or dictionary form.
import spacy # Load English tokenizer, tagger, # parser, NER and word vectors nlp = spacy.load( "en_core_web_sm" ) # Process whole documents text = ( """My name is Shaurya Uppal. I enjoy writing articles on Lazyroar checkout my other article by going to my profile section.""" ) doc = nlp(text) for token in doc: print (token, token.lemma_) |
Output:
My -PRON- name name is be Shaurya Shaurya Uppal Uppal . . I -PRON- enjoy enjoy writing write articles article on on Lazyroar Lazyroar checkout checkout my -PRON- other other article article by by going go to to my -PRON- profile profile section section . .