Parsing XML with DOM APIs in Python

27 July 2024

0

The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.

Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:

<?xml version="1.0"?> 
<company> 
    <name>GeeksForGeeks Company</name> 
    <staff id="1"> 
        <name>Amar Pandey</name> 
        <salary>8.5 LPA</salary> 
    </staff> 
    <staff id="2"> 
        <name>Akbhar Khan</name> 
        <salary>6.5 LPA</salary> 
    </staff> 
    <staff id="3"> 
        <name>Anthony Walter</name> 
        <salary>3.2 LPA</salary> 
    </staff> 
</company> 

Now, let’s parse the above XML using python. The below code demonstrates the process,

from xml.dom import minidom 
  
doc = minidom.parse("sample.xml") 
  
# doc.getElementsByTagName returns the NodeList 
name = doc.getElementsByTagName("name")[0] 
print(name.firstChild.data) 
  
staffs = doc.getElementsByTagName("staff") 
for staff in staffs: 
        staff_id = staff.getAttribute("id") 
        name = staff.getElementsByTagName("name")[0] 
        salary = staff.getElementsByTagName("salary")[0] 
        print("id:% s, name:% s, salary:% s" %
              (staff_id, name.firstChild.data, salary.firstChild.data)) 

Output:

GeeksForGeeks Company
id:1, name: Amar Pandey, salary:8.5 LPA
id:2, name: Akbar Khan, salary:6.5 LPA
id:3, name: Anthony Walter, salary:3.2 LPA

The same can also be done using a user-defined function as shown in the code below:

from xml.dom import minidom 
  
doc = minidom.parse("sample.xml") 
  
# user-defined function 
def getNodeText(node): 
  
    nodelist = node.childNodes 
    result = [] 
    for node in nodelist: 
        if node.nodeType == node.TEXT_NODE: 
            result.append(node.data) 
    return ''.join(result) 
  
name = doc.getElementsByTagName("name")[0] 
print("Company Name : % s \n" % getNodeText(name)) 
  
  
staffs = doc.getElementsByTagName("staff") 
for staff in staffs: 
        staff_id = staff.getAttribute("id") 
        name = staff.getElementsByTagName("name")[0] 
        salary = staff.getElementsByTagName("salary")[0] 
        print("id:% s, name:% s, salary:% s" %
              (staff_id, getNodeText(name), getNodeText(salary))) 

Output:

Company Name : GeeksForGeeks Company 

id:1, name:Amar Pandey, salary:8.5 LPA
id:2, name:Akbhar Khan, salary:6.5 LPA
id:3, name:Anthony Walter, salary:3.2 LPA

Parsing XML with DOM APIs in Python

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

5 Best Free Online Virus Scanners & Removers for 2025 by Kate Davidson

Recent Comments

EDITOR PICKS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR POSTS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR CATEGORY

ABOUT US

FOLLOW US