Sunday, November 17, 2024
Google search engine
HomeLanguagesPython BeautifulSoup Navigating tree sideways

Python BeautifulSoup Navigating tree sideways

In this article, we will see how to navigate the beautifulsoup parse tree sideways. Navigating sideways means that the tags are on the same level. See the below example to get a better idea.

<a>
<b></b>
<c></c>
</a>

In the above example, the tags <b> and <c> are at the same level.

Installation of Required Modules:

bs4: We need to manually install the BeautifulSoup library in our machines as it is not provided by default in Python language Configuration. So let us install it by running the below command in our systems :

pip install bs4

lxml: lxml is a mature bonding between pythonic libxml2 and libxslt libraries, with help of ElementTree API, it provides safe and convenient access to those libraries.

pip install lxml 

Let’s understand with implementation:

Prettify(): Prettify() function in BeautifulSoup enables us to observe how nesting of tags is done in document.

Syntax: (BeautifulSoup Variable).prettify()

Example :

Python3




import bs4
 
 
sibling_soup = bs4.BeautifulSoup("<a><b>Welcome to GeekforLazyroar</b>\
<c>Hello Lazyroar</c></b></a>", 'html.parser')
print(sibling_soup.prettify())


Output:

<a>
 <b>
  Welcome to GeekforLazyroar
 </b>
 <c>
  Hello Lazyroar
 </c>
</a>

Navigating sideways

We can navigate sideways in a document using .next_sibling and .previous_sibling of BeautifulSoup in Python, these two functions in python provide us to navigate between tags that are in the same level of the tree.

Let us get a better insight into the concept through a proven example:

Consider a sample document :

Python3




# For importing BeautifulSoup
import bs4
 
 
# initiating variable of BeautifulSoup
sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets</b><c><strong>\
C++ Python Professional HandBook Guide</strong></b></a>", 'lxml'
 
# To print contents in the initiated BeautifulSoup
print(sibling_of_soup.prettify())


Output:

In the above code, we can clearly notice that <b> and <c> tags are on the same level and also they are both children to the same tag hence, we can classify them as siblings.

Now, we can navigate between the siblings <b> and <c> tags as they both are siblings by using:

  • .next_sibling()
  • .previous_sibling:

1. Navigating using .next_sibling :

Python3




import bs4
 
 
sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets</b><c><strong>\
C++ Python Professional HandBook Guide</strong></b></a>",'lxml')
 
# printing contents in BeautifulSoup Variable
print(sibling_of_soup.b.next_sibling)


Output:

In the above code, gives us the following output i.e the item in the c tag as the next sibling for the b tag is c hence, the item in c tag will be navigated and printed.

If we write a print statement for c tag like :

Python3




import bs4
 
 
sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets</b><c><strong>\
C++ Python Professional HandBook Guide</strong></b></a>",'lxml')
 
# Implementing Navigation on sibling
print(sibling_of_soup.c.next_sibling)


Output:

In the above code, the output generated is “None” as there is no tag present after c.

2. Navigating Using .previous_sibling:

Python3




import bs4
 
 
sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets</b><c><strong>\
C++ Python Professional
                                     
print(sibling_of_soup.c.previous_sibling)
print(sibling_of_soup.b.previous_sibling)


Output:

In the code, .previous_sibling on c tag, it generates an item in b tag as the previous sibling tag of it is b, but if we implement .previous_sibling to b tag it generates the output “None” as there is no sibling which occurred previous to b tag.

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments