Prerequisite- GUI Application using Tkinter
In this article, we are going to write scripts to extract information from the article in the given URL. Information like Title, Meta information, Articles Description, etc., will be extracted.
We are going to use Goose Module.
Goose module helps to extract the following information:
- The main text of an article.
- Main image of the article.
- Any YouTube/Vimeo movies embedded in the article.
- Meta Description.
- Meta tags.
To start with, install the module required using the following command.
pip install goose3
Approach
- Import the module.
- Create an object with Goose().extract(URL) function.
- Get Title with obj.title attribute.
- Get meta description with obj.meta_description attribute.
- Get text with obj.article.cleaned_text attribute.
Implementation
Step 1: Initializing the requirements.
Python3
# import module from goose3 import Goose # var for URL # initialization with article = Goose().extract(url) |
Step 2: Extracting the title.
Python3
print ( "Title of the article :\n" ,article.title) |
Output:
Step 3: Extracting meta information
Python3
print ( "Meta information :\n" ,article.meta_description) |
Output:
Step 4: Extracting article
Python3
print ( "Article Text :\n" ,article.cleaned_text[: 300 ]) |
Output:
Step 5: Visualizing using Tkinter
Python3
# import modules from tkinter import * from goose3 import Goose # for getting information def info(): article = Goose().extract(e1.get()) title. set (article.title) meta. set (article.meta_description) string = article.cleaned_text[: 150 ] art_dec. set (string.split( "\n" )) # object of tkinter # and background set to grey master = Tk() master.configure(bg = 'light grey' ) # Variable Classes in tkinter title = StringVar(); meta = StringVar(); art_dec = StringVar(); # Creating label for each information # name using widget Label Label(master, text = "Website URL : " , bg = "light grey" ).grid(row = 0 , sticky = W) Label(master, text = "Title :" , bg = "light grey" ).grid(row = 3 , sticky = W) Label(master, text = "Meta information :" , bg = "light grey" ).grid(row = 4 , sticky = W) Label(master, text = "Article description :" , bg = "light grey" ).grid(row = 5 , sticky = W) # Creating label for class variable # name using widget Entry Label(master, text = "", textvariable = title, bg = "light grey" ).grid(row = 3 ,column = 1 , sticky = W) Label(master, text = "", textvariable = meta, bg = "light grey" ).grid(row = 4 ,column = 1 , sticky = W) Label(master, text = "", textvariable = art_dec, bg = "light grey" ).grid(row = 5 ,column = 1 , sticky = W) e1 = Entry(master, width = 100 ) e1.grid(row = 0 , column = 1 ) # creating a button using the widget # to call the submit function b = Button(master, text = "Show" , command = info , bg = "Blue" ) b.grid(row = 0 , column = 2 ,columnspan = 2 , rowspan = 2 ,padx = 5 , pady = 5 ,) mainloop() |
Output: