XML Parsers

In this section, you will learn about the XML parsers. The XML parser is used to read, update, create and manipulate an XML document.

XML Parsers

XML Parsers

     

XML parser is used to read, update, create and manipulate an XML document.

Parsing XML Documents

To manipulate an XML document, XML parser is needed. The parser loads the document into the  computer's memory. Once the document is loaded, its data can be manipulated using the appropriate parser. 

We will soon discuss APIs and parsers for accessing XML documents using serially accesss mode (SAX) and  random access mode (DOM). The specifications to ensure the validity of XML documents are DTDs  and the Schemas.

DOM: Document Object Model

The XML Document Object Model (XML DOM) defines a standard way to access and manipulate XML documents using any programming language (and a parser for that language).

The DOM presents an XML document as a tree-structure (a node tree), with the elements, attributes, and text defined as nodes. DOM provides access to the information stored in your XML document as a hierarchical object model.

The DOM converts an XML document into a collection of objects in a object model in a tree structure (which  can be manipulated  in any way ). The textual information in  XML document gets turned into a bunch of tree nodes and an user can easily traverse through any part of the  object tree,  any time. This makes easier to modify the data, to remove it, or even to insert a new one. This mechanism is also known as the random access protocol .

DOM is very useful when the document is small. DOM  reads the entire XML structure and holds the object tree in memory, so it is much more CPU and memory intensive. The DOM  is most suited for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user.

SAX: Simple API for XML

This API was an innovation, made  on the XML-DEV mailing list through a product collaboration, rather than being a product of the W3C.

SAX (Simple API for XML) like DOM  gives access to the information stored in XML documents using any programming language (and a parser for that language).

This standard API works in serial access  mode to parse XML documents. This is a very fast-to-execute mechanism employed to read and write XML data comparing to its  competitors. SAX tells the application, what is in the document by notifying through a stream of parsing events. Application then processes those events to act on data.

SAX is also called as an event-driven protocol, because it implements the technique  to register the handler  to invoke the callback methods whenever an event is generated. Event is generated when the parser encounters a new XML tag or encounters an error, or wants to tell  anything else. SAX is memory-efficient to a great extend.

SAX is very useful when the document is large.

DOM  reads the entire XML structure and holds the object tree in memory, so it is much more CPU and memory intensive. For that reason, the SAX API  are preferred for server-side applications and data filters that do not require any memory intensive representation of the data.