Scraping a JSON response with Scrapy

23 July 2024

0

Scrapy is a popular Python library for web scraping, which provides an easy and efficient way to extract data from websites for a variety of tasks including data mining and information processing. In addition to being a general-purpose web crawler, Scrapy may also be used to retrieve data via APIs.

One of the most common data formats returned by APIs is JSON, which stands for JavaScript Object Notation. In this article, we’ll look at how to scrape a JSON response using Scrapy.

To install Scrapy write the following command in your command line or on your terminal:

pip install scrapy

Example

Now we’ll look at an example to extract data from the bored public API endpoint (https://www.boredapi.com/api/activity).

Here’s what the actual data returned looks like:

{
  "activity": "Learn calligraphy",
  "type": "education",
  "participants": 1,
  "price": 0.1,
  "link": "",
  "key": "4565537",
  "accessibility": 0.1
}

Python3

# import modules 
import scrapy 
import json 
  
  
class Spider(scrapy.Spider): 
    name = "bored"
  
    def start_requests(self): 
        url = "https://www.boredapi.com/api/activity"
  
        yield scrapy.Request(url, self.parse) 
  
    def parse(self, response): 
        data = json.loads(response.text) 
  
        activity = data["activity"] 
        type = data["type"] 
        participants = data["participants"] 
  
        yield {"Activity": activity, "Type": type,  
               "Participants": participants} 

Explanation:

Here we have a Scrapy spider named Spider. The spider has 3 main parts:

The name variable – sets the name of the spider to “bored”.
The start_requests method – initiates the request to the API endpoint at “https://www.boredapi.com/api/activity”. The method yields a Scrapy request object and passes it to the parse method.
The parse method – handles the response from the API endpoint. The method loads the JSON response data into a Python dictionary using the json.loads function. Then, it extracts the values of the “activity”, “type”, and “participants” keys from the dictionary and stores them in variables with the same names. Finally, it yields a dictionary with the activity, type, and participants as keys and their corresponding values.

To run this file type the following into your terminal:

scrapy runspider <file name>

Output:

the output of the above command

Now, this output will contain a lot of unnecessary lines so it’ll be better to store your parsed responses in a separate file. You can do it by adding a -o tag to the command for the output file.

The “-L ERROR” is added to prevent any outputs other than error messages.

activity.json looks like this:

Scraping a JSON response with Scrapy

Example

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US