Prerequisites: Beautifulsoup
Beautifulsoup is a Python module that is used for web scraping. This article discusses how children tags of given HTML tags can be scraped and displayed.
Sample Website:https://www.geeksforgeeks.org/caching-page-tables/
For First Child:
Approach
- Import module
- Pass the URL
- Request page
- Display the first child using findChild() function
Syntax:
findChild()
Example:
Python3
from bs4 import BeautifulSoup import requests # sample web page # call get method to request that page page = requests.get(sample_web_page) # with the help of beautifulSoup and html parser create soup soup = BeautifulSoup(page.content, "html.parser" ) child_soup = soup.find( 'p' ) print ( "child : " , child_soup.findChild()) |
Output:
child : <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>
For all children:
For Retrieving children of the HTML tag we have to option either we use .children or .contents. The difference between children and contents is children do not take any memory it gives us an iterable list and the contents give the child tag, but it uses the memory. For large HTML files use children is a better option and for storing value need contents will be better.
Approach
- Import module
- Pass the website URL
- Request page
- Use either keyword to display the child tags
Using .children:
For Retrieve all the children .children will used.
Example:
Python3
from bs4 import BeautifulSoup import requests # sample web page # call get method to request that page page = requests.get(sample_web_page) # with the help of beautifulSoup and html parser create soup soup = BeautifulSoup(page.content, "html.parser" ) child_soup = soup.find( 'p' ) for i in child_soup.children: print ( "child : " , i) |
Output:
child : <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>
child : is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called
child : <strong>frames</strong>
child : and logical memory into blocks of same size called
child : <strong>pages</strong>
child : . While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.
Using .contents
It will also return all the child tag and store them in memory.
Example
Python3
from bs4 import BeautifulSoup import requests # sample web page # call get method to request that page page = requests.get(sample_web_page) # with the help of beautifulSoup and html parser create soup soup = BeautifulSoup(page.content, "html.parser" ) child_soup = soup.find( 'p' ) print ( "child : " , child_soup.contents) |
Output
child : [<a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>, ‘ is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called ‘, <strong>frames</strong>, ‘ and logical memory into blocks of same size called ‘, <strong>pages</strong>, ‘. While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.’]