This article is a Facebook sentiment analysis using Vader, nowadays many government institutions and companies need to know their customers’ feedback and comment on social media such as Facebook.
What is sentiment analysis?
Sentiment analysis is one of the best modern branches of machine learning, which is mainly used to analyze the data in order to know one’s own idea, nowadays it is used by many companies to their own feedback from customers.
Why should we use sentiment analysis?
- Invaluable Marketing:
Using sentiment analysis companies and product owners use can use sentiment analysis to know the demand and supply of their products through comments and feedback from the customers.
- Identifying key emotional triggers:
In psychology and other medical treatment institutions, sentiment analysis can be used to detect whether the individuals’ emotion is normal or abnormal, and based on the data record they can decide person health.
- Politics:
In the political field, candidates to be elected can use sentiment analysis to predict their political status, to measure people’s acceptance. It can also be used to predict election results for electoral board commissions.
- Education:
Universities and other higher institutes like colleges can use sentiment analysis to know their student’s feedback and comment, therefore they can take consideration to revise or improve their education curriculum.
Installations in Anaconda
- NLTK:is used for understanding of human natural language.
Installation Using conda command.
conda install -c anaconda nltk
- Installation Using pip.
pip install nltk
- NumPy: is a python package used for scientific and computional methods in python.
Installation Using conda.
conda install -c conda-forge numpy
- Using pip.
pip install numpy
- Pandas: is a python module used for data preprocessing and analysis .
Installation Using conda
conda install -c anaconda pandas
- Installation Using pip.
pip install pandas
- Matplotlib: is a python module used for data visualization and and 2D plotting for representation of data.
Installation Using conda.
conda install -c conda-forge matplotlib
- Installation Using pip.
pip install matplotlib
Authentication
There are many ways to fetch Facebook comments those are:
- Facebook graph API
- Direct download from Facebook
- Downloading from another dataset provider sites
Among the above methods, we used downloading the Facebook comment dataset from the Kaggle website which is the best dataset provider. For the code we already used kindle.txt for analysis of kindle amazon facebook comment, you can use your own Facebook comment using this code to analyze your own comments or create a file in text format and try it for simplification.
Below is the implementation.
Python3
import time import pandas as pd import numpy as np import matplotlib.pyplot as plt import nltk import io import unicodedata import numpy as np import re import string from numpy import linalg from nltk.sentiment.vader import SentimentIntensityAnalyzer from nltk.tokenize import sent_tokenize, word_tokenize from nltk.tokenize import PunktSentenceTokenizer from nltk.tokenize import PunktSentenceTokenizer from nltk.corpus import webtext from nltk.stem.porter import PorterStemmer from nltk.stem.wordnet import WordNetLemmatizer with open ( 'kindle.txt' , encoding = 'ISO-8859-2' ) as f: text = f.read() sent_tokenizer = PunktSentenceTokenizer(text) sents = sent_tokenizer.tokenize(text) print (word_tokenize(text)) print (sent_tokenize(text)) porter_stemmer = PorterStemmer() nltk_tokens = nltk.word_tokenize(text) for w in nltk_tokens: print ( "Actual: % s Stem: % s" % (w, porter_stemmer.stem(w))) wordnet_lemmatizer = WordNetLemmatizer() nltk_tokens = nltk.word_tokenize(text) for w in nltk_tokens: print ( "Actual: % s Lemma: % s" % (w, wordnet_lemmatizer.lemmatize(w))) text = nltk.word_tokenize(text) print (nltk.pos_tag(text)) sid = SentimentIntensityAnalyzer() tokenizer = nltk.data.load( 'tokenizers / punkt / english.pickle' ) with open ( 'kindle.txt' , encoding = 'ISO-8859-2' ) as f: for text in f.read().split( '\n' ): print (text) scores = sid.polarity_scores(text) for key in sorted (scores): print ( '{0}: {1}, ' . format (key, scores[key]), end = '') print () |
Output:
here is the sample output of the code: ['i', 'love', 'my', 'kindle'] ['i love my kindle'] Actual: i Stem: i Actual: love Stem: love Actual: my Stem: my Actual: kindle Stem: kindl Actual: i Lemma: i Actual: love Lemma: love Actual: my Lemma: my Actual: kindle Lemma: kindle [('i', 'NN'), ('love', 'VBP'), ('my', 'PRP$'), ('kindle', 'NN')] i love my kindle compound: 0.6369, neg: 0.0, neu: 0.323, pos: 0.677,
We follow these major steps in our program:
- Downloading(fetching) facebook comment from Kaggle site and save it as text format.
- Preprocessing the data through SkLearn and nltk libraries .we first tokenize the data and then after tokenizing we stemize and lemmatize.
- Parse the comments using Vader library . Classify each comment as positive, negative or neutral.
Now, let us try to understand the above piece of code:
- First we open a file named kindle which is downloaded from Kaggle site and saved in local disk.
with open(‘kindle.txt’, encoding=’ISO-8859-2′) as f:
- After we open a file we preprocess the text through tokenize, stemize and then lemmatize:
- Tokenize the text, i.e split words from text.
- Tokenize the text, i.e split words from text.
sent_tokenizer = PunktSentenceTokenizer(text)
sents = sent_tokenizer.tokenize(text)
print(word_tokenize(text))
print(sent_tokenize(text))
- Stemize and lemmatize the text for normalization of the text:
1) For stemize we use PorterStemmer() function:
from nltk.stem.porter import PorterStemmer
porter_stemmer = PorterStemmer()
nltk_tokens = nltk.word_tokenize(text)
for w in nltk_tokens:
print (“Actual: %s Stem: %s” % (w, porter_stemmer.stem(w)))
- 2) For lemmatize we use WordNetLemmatizer() function :
from nltk.stem.wordnet import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
nltk_tokens = nltk.word_tokenize(text)
for w in nltk_tokens:
print (“Actual: %s Lemma: %s” % (w, wordnet_lemmatizer.lemmatize(w)))
- POS( part of speech) tagging of the tokens and select only significant features/tokens like adjectives, adverbs, and verbs, etc.
text = nltk.word_tokenize(text) print(nltk.pos_tag(text))
- Pass the tokens to a sentiment intensity analyzer which classifies the Facebook comments as positive, negative or neutral.
Here is how vader sentiment analyzer works:
- VADER uses a combination of A sentiment lexicon which is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative.
- sentiment analyzer not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.
- Then, We used the polarity_scores() method to obtain the polarity indices for the given sentence.
Then, we build the comment intensity and polarity as:
sid = SentimentIntensityAnalyzer()
tokenizer = nltk.data.load(‘tokenizers/punkt/english.pickle’)
with open(‘kindle.txt’, encoding=’ISO-8859-2′) as f:
for text in f.read().split(‘\n’):
print(text)
scores = sid.polarity_scores(text)
for key in sorted(scores):
print(‘{0}: {1}, ‘.format(key, scores[key]), end=”)
print()
- Let us to understand what the sentiment code is and how VADER performs on the output of the above code:
i love my kindle compound: 0.6369, neg: 0.0, neu: 0.323, pos: 0.677
- The Positive(pos), Negative(neg) and Neutral(neu) scores represent the proportion of text that falls in these categories. This means our sentence was rated as 67% Positive, 32% Neutral and 0% Negative. Hence all these should add up to 1.
- The Compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1( extreme negative) and +1 ( extreme positive).
- Finally, sentiment scores of comments are returned.