Pagination using Scrapy. Web scraping is a technique to fetch information from websites .Scrapy is used as a python framework for web scraping. Getting data from a normal website is easier, and can be just achieved by just pulling HTMl of website and fetching data by filtering tags. But what in case when there is pagination in the data you are trying to fetch, For example – Amazon’s products can have multiple pages and to scrap all products successfully, one would need concept of pagination.
Pagination: Pagination, also known as paging, is the process of dividing a document into discrete pages, that means bundle of data on different page. These different pages have their own url. So we need to take these url one by one and scrape these pages. But to keep in mind is when to stop pagination. Generally pages have next button, this next button is able and it get disable when pages are finished. This method is used to get url of pages till the next page button is able and when it get disable no page is left for scraping.
Project to apply pagination using scrapy
Scraping mobile details from amazon site and applying pagination in the following below project.
The scraped details involves name and price of mobiles and pagination to scrape all the result for the following searched url
Here next_page variable gets url of next page only if next page is available but if no page is left then, this if condition get false.
next_page = response.xpath( "//div/div/ul/li[@class='alast']/a/@href" ).get() if next_page: yield scrapy.Request( url = abs_url, callback = self .parse ) |
Note:
abs_url = f"https://www.amazon.in{next_page}"
Here need to take https://www.amazon.in is because next_page is /page2. That is incomplete and the complete url is https://www.amazon.in/page2
- Fetch xpath of details need to be scraped –
Follow below steps to get xpath –
xpath of items:xpath of name:
xpath of price:
xpath of next page:
- Spider Code: Scraping name and price from amazon site and applying pagination in the below code.
import
scrapy
class
MobilesSpider(scrapy.Spider):
name
=
'mobiles'
# create request object initially
def
start_requests(
self
):
yield
scrapy.Request(
+
'= 2AT2IRC7IKO1K&sprefix = xiome % 2Caps % 2C302&ref = nb_sb_ss_i_1_5'
,
callback
=
self
.parse
)
# parse products
def
parse(
self
, response):
products
=
response.xpath(
"//div[@class ='s-include-content-margin s-border-bottom s-latency-cf-section']"
)
for
product
in
products:
yield
{
'name'
: product.xpath(
".//span[@class ='a-size-medium a-color-base a-text-normal']/text()"
).get(),
'price'
: product.xpath(
".//span[@class ='a-price-whole']/text()"
).get()
}
print
()
print
(
"Next page"
)
print
()
next_page
=
response.xpath(
"//div / div / ul / li[@class ='a-last']/a/@href"
).get()
if
next_page:
yield
scrapy.Request(
url
=
abs_url,
callback
=
self
.parse
)
else
:
print
()
print
(
'No Page Left'
)
print
()
Scraped Results:
Please Login to comment…