Sentiment Analysis for Indic Language:
This article exhibits how to use the library VADER for doing the sentiment analysis of the Indic Language’Hindi’.
Sentiment analysis is a metric that conveys how positive or negative or neutral the text or data is. It is performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs. It is a time-efficient, cost-friendly solution to analyse huge data. Python avails great support for doing sentiment analysis of data. Few of the libraries available for this purpose are NLTK, TextBlob and VADER.
For doing sentiment analysis of Indic languages such as Hindi we need to do the following tasks.
- Read the text file which is in Hindi.
- Translate the sentences in Hindi to the sentences in English as the python libraries do support text-analysis in the English language. (Even if you give the Hindi sentences to such functions the ‘compound score’ which is metric of the sentiment if the sentence is calculated in a wrong manner. So before computing this metric conversion to the equivalent sentence in the English language is appropriate.) The Google Translator helps in this task.
- Do sentiment analysis of the translated text using any of the libraries mentioned above.
The following steps need to be done.
Step 1: Import the necessary libraries/packages.
Python3
# codecs provides access to the internal Python codec registry import codecs # This is to translate the text from Hindi to English from deep_translator import GoogleTranslator # This is to analyse the sentiment of text from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer |
Step 2: Read the file data. The ‘codecs’ library provides access to the internal Python codec registry. Most standard codecs are text encodings, which encode text to bytes. Custom codecs may encode and decode between arbitrary types
Python3
# Read the hindi text into 'sentences' with codecs. open ( 'SampleHindiText.txt' , encoding = 'utf-8' ) as f: sentences = f.readlines() |
Step 3: Translate the sentences read into English so that the VADER library can process the translated text for sentiment analysis. The polarity_scores() returns the sentiment dictionary of the text which includes the ‘’compound’’ score that tells about the sentiment of the sentence as given below.
- positive sentiment: compound score >= 0.05
- Neutral sentiment : compound score > -0.05 and compound score < 0.05
- Negative sentiment : compound score <= -0.05
Python3
for sentence in sentences: translated_text = GoogleTranslator(source = 'auto' , target = 'en' ).translate(sentence) #print(translated_text) analyzer = SentimentIntensityAnalyzer() sentiment_dict = analyzer.polarity_scores(translated_text) print ( "\nTranslated Sentence=" ,translated_text, "\nDictionary=" ,sentiment_dict) if sentiment_dict[ 'compound' ] > = 0.05 : print ( "It is a Positive Sentence" ) elif sentiment_dict[ 'compound' ] < = - 0.05 : print ( "It is a Negative Sentence" ) else : print ( "It is a Neutral Sentence" ) |
• The source file ‘SampleHindiText.txt’ is as given below.
गोवा की यात्रा बहुत अच्छी रही। समुद्र तट बहुत गर्म थे। मुझे समुद्र तट पर खेलने में बहुत मजा आया। मेरी बेटी बहुत गुस्से में थी।
• The output of the code is shown below.
Translated Sentence= The trip to Goa was great. Dictionary= {'neg': 0.0, 'neu': 0.549, 'pos': 0.451, 'compound': 0.6249} It is a Positive Sentence Translated Sentence= The beaches were very hot. Dictionary= {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0} It is a Neutral Sentence Translated Sentence= I really enjoyed playing on the beach. Dictionary= {'neg': 0.0, 'neu': 0.469, 'pos': 0.531, 'compound': 0.688} It is a Positive Sentence Translated Sentence= My daughter was very angry. Dictionary= {'neg': 0.473, 'neu': 0.527, 'pos': 0.0, 'compound': -0.5563} It is a Negative Sentence