BeautifulSoup – Find all in

28 July 2024

4

Beautifulsoup is a Python module used for web scraping. In this article, we will discuss how contents of <li> tags can be retrieved from <ul> using Beautifulsoup.

Modules Needed:

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files.
requests: Requests allow you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python.

Approach

Import the modules
Provide an URL that has ul and li tags
Make the requests
Create the beautifulsoup object
Find the required tags
Retrieve the contents under li

Below the code, the HTML snippet contains a body with ul and li tags that have been obtained by the beautifulsoup object.

Method 1: Using descendants and find()

In this method, we use the descendants attribute present in beautifulsoup which basically returns a list iterator object having all the descendants/children of the parent tag, here parent is <ul> tag.

First, import the required modules, then provide the URL and create its requests object that will be parsed by the beautifulsoup object. Now with the help of find() function in beautifulsoup we will find the <body> and its corresponding <ul> tags. After this, the descendants attribute will give us the list iterator object which is needed to convert back into list. This list has a next line item, the tags with text, and finally the only text. So, we will print every second successive element of the list.

Example:

Python3

# importing the modules
import requests
from bs4 import BeautifulSoup
  
# providing url
url = "https://auth.geeksforgeeks.org/user/adityaprasad1308/articles"
  
# creating requests object
html = requests.get(url).content
  
# creating soup object
data = BeautifulSoup(html, 'html.parser')
  
# finding parent <ul> tag
parent = data.find("body").find("ul")
  
# finding all <li> tags
text = list(parent.descendants)
  
# printing the content in <li> tag
print(text)
for i in range(2, len(text), 2):
    print(text[i], end=" ")

Output:

Method 2: Using find_all()

Approach is same as the above example, but instead of finding the body we will find ul tags and then find all the li tags with the help of find_all() function which takes the tag name as an argument and returns all the li tags. After this we will simply iterate over all the <li> tags and with the help of text attribute we will print the text present in the <li> tag.

Example:

Python3

# importing the modules
import requests
from bs4 import BeautifulSoup
  
# providing url
url = 'https://auth.geeksforgeeks.org/user/adityaprasad1308/articles'
  
# creating request object
req = requests.get(url)
  
# creating soup object
data = BeautifulSoup(req.text, 'html')
  
# finding all li tags in ul and printing the text within it
data1 = data.find('ul')
for li in data1.find_all("li"):
    print(li.text, end=" ")

Output:

BeautifulSoup – Find all in

Modules Needed:

Approach

Method 1: Using descendants and find()

Python3

Method 2: Using find_all()

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US