Extract all the URLs from the webpage Using Python

28 July 2024

0

Scraping is a very essential skill for everyone to get data from any website. In this article, we are going to write Python scripts to extract all the URLs from the website or you can save it as a CSV file.

Module Needed:

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.

Example 1:

Python3

import requests
from bs4 import BeautifulSoup
 
 
url = 'https://www.geeksforgeeks.org/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
 
urls = []
for link in soup.find_all('a'):
    print(link.get('href'))

Output:

Extract all the URLs from the webpage

Explanation:

Here we are importing the beautifulsoup from bs4 to convert the document to it’s Unicode, and then further HTML entities are converted to Unicode characters. Then we just iterate through the list of all those links and print one by one. The reqs here is of response type i.e. we are fetching it as a response for the http request of our URL. We are then passing that string as one the parameter to the beautifulsoup and then finally iterating all the links found.

Example 2:

Python3

import requests
from bs4 import BeautifulSoup
 
urls = 'https://www.geeksforgeeks.org/'
grab = requests.get(urls)
soup = BeautifulSoup(grab.text, 'html.parser')
 
# opening a file in write mode
f = open("test1.txt", "w")
# traverse paragraphs from soup
for link in soup.find_all("a"):
   data = link.get('href')
   f.write(data)
   f.write("\n")
 
f.close()

Output:

Extract all the URLs from the webpage

Explanation:

Here we are importing the beautifulsoup from bs4 to convert the document to it’s Unicode, and then further HTML entities are converted to Unicode characters. Here we want to Extracting URLs and save as CSV files. sowe just iterate through the list of all those links and print one by one. The reqs here is of response type i.e. we are fetching it as a response for the http request of our url. We are then passing that string as one the parameter to the beautifulsoup and writing it into a file. And then finally reading the entire file.

Extract all the URLs from the webpage Using Python

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Recent Comments

EDITOR PICKS

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

POPULAR POSTS

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

POPULAR CATEGORY

ABOUT US

FOLLOW US