Prerequisite Implementing Web Scraping in Python with BeautifulSoup, Python Urllib Module, Tools for Web Scraping
In this article, we are going to write python scripts to extract the title form the webpage from the given webpage URL.
Method 1: bs4 Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
requests module allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Import the modules
- Make requests instance and pass into URL
- Pass the requests into a Beautifulsoup() function
- Use the ‘title’ tag to find them all tag (‘title’)
Code:
Python3
# importing the modules import requests from bs4 import BeautifulSoup # target url # making requests instance reqs = requests.get(url) # using the BeautifulSoup module soup = BeautifulSoup(reqs.text, 'html.parser' ) # displaying the title print ( "Title of the website is : " ) for title in soup.find_all( 'title' ): print (title.get_text()) |
Output:
Title of the website is : Lazyroar | A computer science portal for Lazyroar
Methods 2: In this method, we will use urllib and Beautifulsoup modules to extract the title of the website. urllib is a package that allows you to access the webpage with the program.
Installation:
pip install urllib
Approach:
- Import module
- Read the URL with the request.urlopen(URL).
- Find the title with soup.title from the HTML document
Implementation:
Python3
# importing the modules from urllib.request import urlopen from bs4 import BeautifulSoup # target url # using the BeautifulSoup module soup = BeautifulSoup(urlopen(url)) # displaying the title print ( "Title of the website is : " ) print (soup.title.get_text()) |
Output:
Title of the website is : Lazyroar | A computer science portal for Lazyroar
Method 3: In this method, we will use the mechanize module. It is stateful programmatic web browsing in Python. Browse pages programmatically with easy HTML form filling and clicking of links.
Installation:
pip install mechanize
Approach:
- Import module.
- Initialize the Browser() instance.
- Retrieves the webpage content Browser.open().
- Display the title with Browser.title()
Implementation:
Python3
# importing the module from mechanize import Browser # target url # creating a Browser instance br = Browser() br. open (url) # displaying the title print ( "Title of the website is : " ) print ( br.title()) |
Output:
Title of the website is : Lazyroar | A computer science portal for Lazyroar