Pagination using Scrapy – Web Scraping with Python

27 July 2024

4

Pagination using Scrapy. Web scraping is a technique to fetch information from websites .Scrapy is used as a python framework for web scraping. Getting data from a normal website is easier, and can be just achieved by just pulling HTMl of website and fetching data by filtering tags. But what in case when there is pagination in the data you are trying to fetch, For example – Amazon’s products can have multiple pages and to scrap all products successfully, one would need concept of pagination.

Pagination: Pagination, also known as paging, is the process of dividing a document into discrete pages, that means bundle of data on different page. These different pages have their own url. So we need to take these url one by one and scrape these pages. But to keep in mind is when to stop pagination. Generally pages have next button, this next button is able and it get disable when pages are finished. This method is used to get url of pages till the next page button is able and when it get disable no page is left for scraping.

Project to apply pagination using scrapy

Scraping mobile details from amazon site and applying pagination in the following below project.
The scraped details involves name and price of mobiles and pagination to scrape all the result for the following searched url

Logic behind pagination:
Here next_page variable gets url of next page only if next page is available but if no page is left then, this if condition get false.

next_page = response.xpath("//div/div/ul/li[@class='alast']/a/@href").get()
if next_page:
    abs_url = f"https://www.amazon.in{next_page}"
yield scrapy.Request(
    url=abs_url,
    callback=self.parse
)

Note:

abs_url = f"https://www.amazon.in{next_page}"

Here need to take https://www.amazon.in is because next_page is /page2. That is incomplete and the complete url is https://www.amazon.in/page2

Fetch xpath of details need to be scraped –
Follow below steps to get xpath –
xpath of items:

xpath of name:

xpath of price:

xpath of next page:

Spider Code: Scraping name and price from amazon site and applying pagination in the below code.

import scrapy
  
class MobilesSpider(scrapy.Spider):
    name = 'mobiles'
    # create request object initially
    def start_requests(self):
        yield scrapy.Request(
            url ='https://www.amazon.in / s?k = xiome + mobile + phone&crid'\
            + '= 2AT2IRC7IKO1K&sprefix = xiome % 2Caps % 2C302&ref = nb_sb_ss_i_1_5',
            callback = self.parse
        )
  
    #  parse products
    def parse(self, response):
        products = response.xpath("//div[@class ='s-include-content-margin s-border-bottom s-latency-cf-section']")
        for product in products:
            yield {
                'name': product.xpath(".//span[@class ='a-size-medium a-color-base a-text-normal']/text()").get(),
                'price': product.xpath(".//span[@class ='a-price-whole']/text()").get()
            }
  
        print()
        print("Next page")
        print()
        next_page = response.xpath("//div / div / ul / li[@class ='a-last']/a/@href").get()
        if next_page:
            abs_url = f"https://www.amazon.in{next_page}"
            yield scrapy.Request(
                url = abs_url,
                callback = self.parse
            )
        else:
            print()
            print('No Page Left')
            print()

Scraped Results:

Last Updated :
30 Sep, 2021

Pagination using Scrapy – Web Scraping with Python

Project to apply pagination using scrapy

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US

Pagination using Scrapy – Web Scraping with Python

Project to apply pagination using scrapy

Please Login to comment…

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US