Friday, December 27, 2024
Google search engine
HomeLanguagesRetrieve children of the html tag using BeautifulSoup

Retrieve children of the html tag using BeautifulSoup

Prerequisites: Beautifulsoup

Beautifulsoup is a Python module that is used for web scraping. This article discusses how children tags of given HTML tags can be scraped and displayed.

Sample Website:https://www.geeksforgeeks.org/caching-page-tables/

For First Child:

Approach

  • Import module
  • Pass the URL
  • Request page
  • Display the first child using findChild() function

Syntax:

findChild()

Example:

Python3




from bs4 import BeautifulSoup
import requests
  
# sample web page
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.findChild())


Output:

child :   <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>

For all children:

For Retrieving children of the HTML tag we have to option either we use .children or .contents. The difference between children and contents is children do not take any memory it gives us an iterable list and the contents give the child tag, but it uses the memory. For large HTML files use children is a better option and for storing value need contents will be better.

Approach

  • Import module
  • Pass the website URL
  • Request page
  • Use either keyword to display the child tags

Using .children:

For Retrieve all the children .children will used.

Example:

Python3




from bs4 import BeautifulSoup
import requests
  
# sample web page
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('p')
  
for i in child_soup.children:
    print("child :  ", i)


Output:

child :   <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>

child :    is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called 

child :   <strong>frames</strong>

child :    and logical memory into blocks of same size called 

child :   <strong>pages</strong>

child :   . While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.

Using .contents

It will also return all the child tag and store them in memory.

Example

Python3




from bs4 import BeautifulSoup
import requests
  
# sample web page
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.contents)


Output

child :   [<a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>, ‘ is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called ‘, <strong>frames</strong>, ‘ and logical memory into blocks of same size called ‘, <strong>pages</strong>, ‘. While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.’]

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments