Prerequisite:
In this article, we will discuss web scraping of videos using python. For web scraping, we will use requests and BeautifulSoup Module in Python. The requests library is an integral part of Python for making HTTP requests to a specified URL. Whether it be REST APIs or Web Scraping, requests are must be learned for proceeding further with these technologies. When one makes a request to a URI, it returns a response. Python requests provide inbuilt functionalities for managing both the request and response.
pip install requests
Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping.
pip install bs4
Let’s Understand Step by step implementation:
- Import Required Module
Python3
# Import Required Module import requests from bs4 import BeautifulSoup |
- Parse HTML Content
Python3
# Web URL Web_url = "Enter WEB URL" # Get URL Content r = requests.get(Web_url) # Parse HTML Code soup = BeautifulSoup(r.content, 'html5lib' ) |
- Count How many videos are there on the web page. In HTML For displaying video, we use video tag.
Python3
# List of all video tag video_tags = soup.findAll( 'video' ) print ( "Total " , len (video_tags), "videos found" ) |
- Iterate through all video tags and fetch video URL
Python3
for video_tag in video_tags: video_url = video_tag.find( "a" )[ 'href' ] print (video_url) |
Below is the Implementation:
Python3
# Import Required Module import requests from bs4 import BeautifulSoup # Web URL # Get URL Content r = requests.get(Web_url) # Parse HTML Code soup = BeautifulSoup(r.content, 'html.parser' ) # List of all video tag video_tags = soup.findAll( 'video' ) print ( "Total " , len (video_tags), "videos found" ) if len (video_tags) ! = 0 : for video_tag in video_tags: video_url = video_tag.find( "a" )[ 'href' ] print (video_url) else : print ( "no videos found" ) |
Output:
Total 1 videos found https://media.geeksforgeeks.org/wp-content/uploads/15.webm