The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.
Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:
<? xml version = "1.0" ?> < company > < name >GeeksForGeeks Company</ name > < staff id = "1" > < name >Amar Pandey</ name > < salary >8.5 LPA</ salary > </ staff > < staff id = "2" > < name >Akbhar Khan</ name > < salary >6.5 LPA</ salary > </ staff > < staff id = "3" > < name >Anthony Walter</ name > < salary >3.2 LPA</ salary > </ staff > </ company > |
Now, let’s parse the above XML using python. The below code demonstrates the process,
from xml.dom import minidom doc = minidom.parse( "sample.xml" ) # doc.getElementsByTagName returns the NodeList name = doc.getElementsByTagName( "name" )[ 0 ] print (name.firstChild.data) staffs = doc.getElementsByTagName( "staff" ) for staff in staffs: staff_id = staff.getAttribute( "id" ) name = staff.getElementsByTagName( "name" )[ 0 ] salary = staff.getElementsByTagName( "salary" )[ 0 ] print ( "id:% s, name:% s, salary:% s" % (staff_id, name.firstChild.data, salary.firstChild.data)) |
Output:
GeeksForGeeks Company id:1, name: Amar Pandey, salary:8.5 LPA id:2, name: Akbar Khan, salary:6.5 LPA id:3, name: Anthony Walter, salary:3.2 LPA
The same can also be done using a user-defined function as shown in the code below:
from xml.dom import minidom doc = minidom.parse( "sample.xml" ) # user-defined function def getNodeText(node): nodelist = node.childNodes result = [] for node in nodelist: if node.nodeType = = node.TEXT_NODE: result.append(node.data) return ''.join(result) name = doc.getElementsByTagName( "name" )[ 0 ] print ( "Company Name : % s \n" % getNodeText(name)) staffs = doc.getElementsByTagName( "staff" ) for staff in staffs: staff_id = staff.getAttribute( "id" ) name = staff.getElementsByTagName( "name" )[ 0 ] salary = staff.getElementsByTagName( "salary" )[ 0 ] print ( "id:% s, name:% s, salary:% s" % (staff_id, getNodeText(name), getNodeText(salary))) |
Output:
Company Name : GeeksForGeeks Company id:1, name:Amar Pandey, salary:8.5 LPA id:2, name:Akbhar Khan, salary:6.5 LPA id:3, name:Anthony Walter, salary:3.2 LPA