For most text mining or classification projects, extracting tweets is one of the most important and initial steps. The well-known method is extracting tweets with tweepy and creating a developer account in twitter. Due to certain security reasons, Twitter takes nearly 15 days to verify the process of making a developer account. Thus using this python library makes the process easy. Another advantage of using this library is that the tweets are pretty recent tweets. One can get tweets from previous months or even weeks.
Let’s understand the working better with a code –
- Installing the library GetOldTweets3 – The following command can be installed in a Jupyter Notebook or any command prompt.
pip install GetOldTweets3
- Specify the needed hashtag – After installing the library it will be imported with another variable. With the inbuilt method TweetCriteria() the needed specifications in our dataset of tweets can be mentioned. The method setQuerySearch() allows getting any tweets with respect to our hashtag
import GetOldTweets3 as got gettweet = got.manager.TweetCriteria().setQuerySearch(hashtag) \ #'\' is similar to ', in a sentence' i.e used to separate
- More specifications can be added. There are many
.setSince("2020-01-01") \ .setUntil("2020-05-01") \ .setMaxTweets(100)\ .setLocation("Pune")\ .setUsername("Raj")\...etc.
Let’s see the complete code
Python3
import GetOldTweets3 as got def extract_tweets(hashtag): gettweet = got.manager.TweetCriteria().setQuerySearch(hashtag) \ .setSince( "2020-01-01" ) \ .setUntil( "2020-05-01" ) \ .setMaxTweets( 100 ) # Creation of list that contains all tweets tweets = got.manager.TweetManager.getTweets(gettweet) # Creating list of chosen tweet data text_tweets = [[tweet.text] for tweet in tweets] print (text_tweets) # calling the function extract_tweets( 'COVID19' ) |
Output:
Some tweets are in different languages. One can use the translator() function of Python to convert those tweets in a distinct language.