Saturday, October 18, 2025
HomeLanguagesScraping a JSON response with Scrapy

Scraping a JSON response with Scrapy

Scrapy is a popular Python library for web scraping, which provides an easy and efficient way to extract data from websites for a variety of tasks including data mining and information processing. In addition to being a general-purpose web crawler, Scrapy may also be used to retrieve data via APIs.

One of the most common data formats returned by APIs is JSON, which stands for JavaScript Object Notation. In this article, we’ll look at how to scrape a JSON response using Scrapy.

To install Scrapy write the following command in your command line or on your terminal:

pip install scrapy

Example

Now we’ll look at an example to extract data from the bored public API endpoint (https://www.boredapi.com/api/activity).

Here’s what the actual data returned looks like:

{
  "activity": "Learn calligraphy",
  "type": "education",
  "participants": 1,
  "price": 0.1,
  "link": "",
  "key": "4565537",
  "accessibility": 0.1
}

Python3




# import modules
import scrapy
import json
  
  
class Spider(scrapy.Spider):
    name = "bored"
  
    def start_requests(self):
  
        yield scrapy.Request(url, self.parse)
  
    def parse(self, response):
        data = json.loads(response.text)
  
        activity = data["activity"]
        type = data["type"]
        participants = data["participants"]
  
        yield {"Activity": activity, "Type": type
               "Participants": participants}


Explanation:

Here we have a Scrapy spider named Spider. The spider has 3 main parts:

  • The name variable – sets the name of the spider to “bored”.
  • The start_requests method – initiates the request to the API endpoint at “https://www.boredapi.com/api/activity”. The method yields a Scrapy request object and passes it to the parse method.
  • The parse method – handles the response from the API endpoint. The method loads the JSON response data into a Python dictionary using the json.loads function. Then, it extracts the values of the “activity”, “type”, and “participants” keys from the dictionary and stores them in variables with the same names. Finally, it yields a dictionary with the activity, type, and participants as keys and their corresponding values.

To run this file type the following into your terminal:

scrapy runspider <file name>

Output:

the output of the above command

Now, this output will contain a lot of unnecessary lines so it’ll be better to store your parsed responses in a separate file. You can do it by adding a -o tag to the command for the output file.

 

The “-L ERROR” is added to prevent any outputs other than error messages.

activity.json looks like this:

 

Dominic
Dominichttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Dominic
32361 POSTS0 COMMENTS
Milvus
88 POSTS0 COMMENTS
Nango Kala
6728 POSTS0 COMMENTS
Nicole Veronica
11892 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11954 POSTS0 COMMENTS
Shaida Kate Naidoo
6852 POSTS0 COMMENTS
Ted Musemwa
7113 POSTS0 COMMENTS
Thapelo Manthata
6805 POSTS0 COMMENTS
Umr Jansen
6801 POSTS0 COMMENTS