How to get Scrapy Output File in XML File?

23 July 2024

1

Prerequisite: Implementing Web Scraping in Python with Scrapy

Scrapy provides a fast and efficient method to scrape a website. Web Scraping is used to extract the data from websites. In Scrapy we create a spider and then use it to crawl a website. In this article, we are going to extract population by country data from worldometers website.

Let’s implementation with step-wise:

Step 1: Create a Scrapy project

scrapy startproject gfgdemo

We use the above command in the command line to create the project and then change the directory to gfgdemo. After this, we open the project in the editor of our choice (here VS Code).

Step 2: Creating the Spider for Population

scrapy genspider population www.worldometers.info/world-population/population-by-country

A new file named corona.py is added to the spiders directory.

Step 3: Make the following changes to the population.py

Python3

# only keep the base domain 
allowed_domains = ['www.worldometers.info']  
  
# change http to https 
start_urls = ['https://www.worldometers.info/world-population/population-by-country/']

Step 4: Preparing file to scrape the data

First, we will be visiting www.worldometers.info/world-population/population-by-country and disable the javascript as following

Open the inspector tool by pressing Ctrl+Shift+i
Then press Ctrl+Shift+p and write javascript and click on the Debugger shown in yellow to disable javascript and then refresh the page.

After this, we will select the part to be scraped using the xpath selector.

Write the codes for extracting the specific data. We write the following code in the parse method of the spider.

Python3

def parse(self, response): 
        countries = response.xpath("//tr") 
      
        for country in countries: 
            name = country.xpath("(.//td)[2]/a/text()").get() 
            population = country.xpath("(.//td)[3]/text()").get() 
            yield { 
                'name':name, 
                'population':population 
  
            }

Step 5: Scraping the data

In the command line write the following command to scrape the data for country name and its population.

scrapy crawl population

We get the data as a command-line output as shown above.

To extract the data as an XML file we write the following code in the command line.

scrapy crawl {spider} -o {filename}.xml
Ex: scrapy crawl population -o data.xml

Output:

This will create an XML file in the project directory with the scraped data. The data.xml file here contains the data as

Output:

<?xml version="1.0" encoding="utf-8"?>
<items>
<item><name>None</name><population>None</population></item>
<item><name>China</name><population>1,439,323,776</population></item>
<item><name>India</name><population>1,380,004,385</population></item>
<item><name>United States</name><population>331,002,651</population></item>
<item><name>Indonesia</name><population>273,523,615</population></item>
.
.
.
<item><name>Niue</name><population>1,626</population></item>
<item><name>Tokelau</name><population>1,357</population></item>
<item><name>Holy See</name><population>801</population></item>
</items>

How to get Scrapy Output File in XML File?

Let’s implementation with step-wise:

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

5 Best Free Online Virus Scanners & Removers for 2025 by Kate Davidson

Recent Comments

EDITOR PICKS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR POSTS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR CATEGORY

ABOUT US

FOLLOW US