How to Extract Script and CSS Files from Web Pages in Python ?

26 July 2024

2

Prerequisite:

In this article, we will discuss how to extract Script and CSS Files from Web Pages using Python.

For this, we will be downloading the CSS and JavaScript files that were attached to the source code of the website during its coding process. Firstly, the URL of the website needed to be scraped is determined and a request is sent to it. After retrieving Websites’ content two folders for two file types are created and the files are placed into them and then we can perform various operations on them according to our need.

Module Needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come in built-in with Python.
requests: Requests allow you to send HTTP/1.1 requests extremely easily. This module also does not come in built-in with Python.

Example 1:

Here we are counting the number of fetched links for each respective type.

Python3

# Import Required Library 
import requests 
from bs4 import BeautifulSoup 
  
# Web URL 
web_url = "https://www.geeksforgeeks.org/"
  
# get HTML content 
html = requests.get(web_url).content 
  
# parse HTML Content 
soup = BeautifulSoup(html, "html.parser") 
  
js_files = [] 
cs_files = [] 
  
for script in soup.find_all("script"): 
    if script.attrs.get("src"): 
          
        # if the tag has the attribute  
        # 'src' 
        url = script.attrs.get("src") 
        js_files.append(web_url+url) 
  
  
for css in soup.find_all("link"): 
    if css.attrs.get("href"): 
          
        # if the link tag has the 'href'  
        # attribute 
        _url = css.attrs.get("href") 
        cs_files.append(web_url+_url) 
  
print(f"Total {len(js_files)} javascript files found") 
print(f"Total {len(cs_files)} CSS files found") 

Output:

Total 7 javascript files found

Total 14 CSS files found

We can also use file handling to import fetched links into the text files.

Example 2:

Python3

# Import Required Library 
import requests 
from bs4 import BeautifulSoup 
  
# Web URL 
web_url = "https://www.geeksforgeeks.org/"
  
# get HTML content 
html = requests.get(web_url).content 
  
# parse HTML Content 
soup = BeautifulSoup(html, "html.parser") 
  
js_files = [] 
cs_files = [] 
  
for script in soup.find_all("script"): 
    if script.attrs.get("src"): 
        
        # if the tag has the attribute  
        # 'src' 
        url = script.attrs.get("src") 
        js_files.append(web_url+url) 
  
  
for css in soup.find_all("link"): 
    if css.attrs.get("href"): 
        
        # if the link tag has the 'href' 
        # attribute 
        _url = css.attrs.get("href") 
        cs_files.append(web_url+_url) 
  
# adding links to the txt files 
with open("javajavascript_files.txt", "w") as f: 
    for js_file in js_files: 
        print(js_file, file=f) 
  
with open("css_files.txt", "w") as f: 
    for css_file in cs_files: 
        print(css_file, file=f) 

Output:

How to Extract Script and CSS Files from Web Pages in Python ?

Module Needed

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US