Prerequisite Implementing Web Scraping in Python with BeautifulSoup
In this article, we are going to write a python script to get Flight Status.
Module needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- Requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Import module
- Create a URL get function
- Now merge the information into URL and pass the URL into the getdata() function and Convert that data into HTML code.
- Now find the required tag from the HTML code and traverse the result
Implementation:
Python3
# import module import requests from bs4 import BeautifulSoup # UDF for get HTML code # from URL def get_html(Airline_code, Flight_number, Date, Month, Year): def getdata(url): r = requests.get(url) return r.text # url "/" + Flight_number + "?year=" + Year + "&month=" + Month + "&date=" + Date # pass the url # into getdata function htmldata = getdata(url) soup = BeautifulSoup(htmldata, 'html.parser' ) return (soup) # Get Flight number # from Html code def flight_no(soup): Flight_no = "" # Find div tag with # unique class name for i in soup.find( "div" , class_ = "ticket__FlightNumberContainer-s1rrbl5o-4 hgbvHg" ): Flight_no = Flight_no + (i.get_text()) + " " return (Flight_no) # Get Airport name # from HTML code def airport(soup): Airport_name = [] # Find div tag with # unique class name for i in soup.find_all( "div" , class_ = "text-helper__TextHelper-s8bko4a-0 CPamx" ): Airport_name.append(i.get_text()) return (Airport_name) # get status # from HTML code def status(soup, Airport_list): Time_status = [] Airport_List = [] Status_str = [] Gate = [] Gate_no = [] # Find div tag with # unique class name # to get Gate number for data in soup.find_all( "div" , class_ = "ticket__TGBLabel-s1rrbl5o-15 gcbyEH text-helper__TextHelper-s8bko4a-0 dfeqpK" ): Gate.append(data.get_text()) for data in soup.find_all( "div" , class_ = "ticket__TGBValue-s1rrbl5o-16 icyRae text-helper__TextHelper-s8bko4a-0 cCfBRT" ): Gate_no.append(data.get_text()) # Get status from # html code for i in soup.find_all( "div" , class_ = "text-helper__TextHelper-s8bko4a-0 bcmzUJ" ): Status_str.append(i.get_text()) for i in soup.find_all( "div" , class_ = "text-helper__TextHelper-s8bko4a-0 cCfBRT" ): Time_status.append(i.get_text()) # traverse the Data # from scraping data for item in range ( 4 ): if item = = 0 : print (Airport_list[ 0 ]) if item = = 2 : print ("") print (Airport_list[ 1 ]) print (Status_str[item] + " : " + Time_status[item]) print (Gate[item] + " : " + Gate_no[item]) for item in range ( len (Gate)): print (Gate[item] + " : " + Gate_no[item]) # Driver code if __name__ = = '__main__' : # Input Data from geek Airline_code = 'G8' Flight_number = '134' Date = '23' Month = '10' Year = '2020' # Calling the get_html # with argument # function calling soup = get_html(Airline_code, Flight_number, Date, Month, Year) print ( "Flight number : " , flight_no(soup)) Airport_list = airport(soup) status(soup, Airport_list) |
Output:
Flight number : G8 134 GoAir Jay Prakash Narayan International Airport Scheduled : 21:00 IST Terminal : N/A Estimated : 21:00 IST Gate : N/A Indira Gandhi International Airport Scheduled : 22:40 IST Terminal : T2 Estimated : 22:40 IST Gate : 205 Terminal : N/A Gate : N/A Terminal : T2 Gate : 205