Get contents of entire page using Selenium

27 July 2024

3

In this article, we will discuss ways to get the contents of the entire page using Selenium. There can broadly be two methods for the same. Let’s discuss them in detail.

Method 1:

For extracting the visible text from the entire page, we can use the find_element_by_* methods which help us find or locate the elements on the page. Then, We will use the text method which helps to retrieve the text from a specific web element.

Approach

Import module
Instantiate driver
Get content of the page
Display contents scraped
Close driver

Syntax:

driver.find_element_by_class_xpath(“/html/body”).text

To find or locate multiple elements on a page:

find_element_by_link_text
find_element_by_partial_link_text
find_element_by_xpath
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
find_element_by_id
find_element_by_name

We can use these above methods for finding or locating elements on a entire page. Most commonly used method is find_element_by_xpath which helps us to easily locate any elements. We will use appropriate methods as per our requirement.

Program:

Python3

# importing the modules
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
  
# using webdriver for chrome browser
driver = webdriver.Chrome(ChromeDriverManager().install())
  
# using target url
driver.get(
    "https://www.geeksforgeeks.org/competitive-programming-a-complete-guide/")
  
# printing the content of entire page
print(driver.find_element_by_xpath("/html/body").text)
  
# closing the driver
driver.close()

Output:

Method 2:

There is one another method available for achieving our desired output. This one line will retrieve the entire text of the web page. Once we get the extracted data, with the help of file system, we will store the result inside the result.html file.

Approach:

Import module
Instantiate webdriver
Get contents from the URL
Open a file
Save contents to a file
Close file
Close driver

Syntax:

driver.page_source

Program:

Python3

# Importing important library
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
  
# using chrome browser
driver = webdriver.Chrome(ChromeDriverManager().install())
  
# Target url
driver.get(
    "https://www.geeksforgeeks.org/competitive-programming-a-complete-guide/")
  
# Storing the page source in page variable
page = driver.page_source.encode('utf-8')
# print(page)
  
# open result.html
file_ = open('result.html', 'wb')
  
# Write the entire page content in result.html
file_.write(page)
  
# Closing the file
file_.close()
  
# Closing the driver
driver.close()

Output:

Click here to download the output file of above program.

Get contents of entire page using Selenium

Method 1:

To find or locate multiple elements on a page:

Python3

Method 2:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US