In this article, we will see how to navigate the beautifulsoup parse tree sideways. Navigating sideways means that the tags are on the same level. See the below example to get a better idea.
<a> <b></b> <c></c> </a>
In the above example, the tags <b> and <c> are at the same level.
Installation of Required Modules:
bs4: We need to manually install the BeautifulSoup library in our machines as it is not provided by default in Python language Configuration. So let us install it by running the below command in our systems :
pip install bs4
lxml: lxml is a mature bonding between pythonic libxml2 and libxslt libraries, with help of ElementTree API, it provides safe and convenient access to those libraries.
pip install lxml
Let’s understand with implementation:
Prettify(): Prettify() function in BeautifulSoup enables us to observe how nesting of tags is done in document.
Syntax: (BeautifulSoup Variable).prettify()
Example :
Python3
import bs4 sibling_soup = bs4.BeautifulSoup("<a><b>Welcome to GeekforLazyroar< / b>\ <c>Hello Lazyroar< / c>< / b>< / a>", 'html.parser' ) print (sibling_soup.prettify()) |
Output:
<a> <b> Welcome to GeekforLazyroar </b> <c> Hello Lazyroar </c> </a>
Navigating sideways
We can navigate sideways in a document using .next_sibling and .previous_sibling of BeautifulSoup in Python, these two functions in python provide us to navigate between tags that are in the same level of the tree.
Let us get a better insight into the concept through a proven example:
Consider a sample document :
Python3
# For importing BeautifulSoup import bs4 # initiating variable of BeautifulSoup sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets< / b><c><strong>\ C + + Python Professional HandBook Guide< / strong>< / b>< / a>", 'lxml' ) # To print contents in the initiated BeautifulSoup print (sibling_of_soup.prettify()) |
Output:
In the above code, we can clearly notice that <b> and <c> tags are on the same level and also they are both children to the same tag hence, we can classify them as siblings.
Now, we can navigate between the siblings <b> and <c> tags as they both are siblings by using:
- .next_sibling()
- .previous_sibling:
1. Navigating using .next_sibling :
Python3
import bs4 sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets< / b><c><strong>\ C + + Python Professional HandBook Guide< / strong>< / b>< / a>", 'lxml' ) # printing contents in BeautifulSoup Variable print (sibling_of_soup.b.next_sibling) |
Output:
In the above code, gives us the following output i.e the item in the c tag as the next sibling for the b tag is c hence, the item in c tag will be navigated and printed.
If we write a print statement for c tag like :
Python3
import bs4 sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets< / b><c><strong>\ C + + Python Professional HandBook Guide< / strong>< / b>< / a>", 'lxml' ) # Implementing Navigation on sibling print (sibling_of_soup.c.next_sibling) |
Output:
In the above code, the output generated is “None” as there is no tag present after c.
2. Navigating Using .previous_sibling:
Python3
import bs4 sibling_of_soup = bs4.BeautifulSoup("<a><b>CPPSecrets< / b><c><strong>\ C + + Python Professional print (sibling_of_soup.c.previous_sibling) print (sibling_of_soup.b.previous_sibling) |
Output:
In the code, .previous_sibling on c tag, it generates an item in b tag as the previous sibling tag of it is b, but if we implement .previous_sibling to b tag it generates the output “None” as there is no sibling which occurred previous to b tag.