Parsing XML Documents
To manipulate an XML document, XML parser is needed. The parser loads the document into the computer's memory. Once the document is loaded, its data can be manipulated using the appropriate parser.
We will soon discuss APIs and parsers for accessing XML documents using serially accesss mode (SAX) and random access mode (DOM). The specifications to ensure the validity of XML documents are DTDs and the Schemas.
DOM: Document Object Model
The XML Document Object Model (XML DOM) defines a standard way to access and manipulate XML documents using any programming language (and a parser for that language).
The DOM presents an XML document as a tree-structure (a node tree), with the elements, attributes, and text defined as nodes. DOM provides access to the information stored in your XML document as a hierarchical object model.
The DOM converts an XML document into a collection of objects in a object model in a tree structure (which can be manipulated in any way ). The textual information in XML document gets turned into a bunch of tree nodes and an user can easily traverse through any part of the object tree, any time. This makes easier to modify the data, to remove it, or even to insert a new one. This mechanism is also known as the random access protocol .
DOM is very useful when the document is small. DOM reads the entire XML structure and holds the object tree in memory, so it is much more CPU and memory intensive. The DOM is most suited for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user.
SAX: Simple API for XML
This API was an innovation, made on the XML-DEV mailing list through a product collaboration, rather than being a product of the W3C.
SAX (Simple API for XML) like DOM gives access to the information stored in XML documents using any programming language (and a parser for that language).
This standard API works in serial access mode to parse XML documents. This is a very fast-to-execute mechanism employed to read and write XML data comparing to its competitors. SAX tells the application, what is in the document by notifying through a stream of parsing events. Application then processes those events to act on data.
SAX is also called as an event-driven protocol, because it implements the technique to register the handler to invoke the callback methods whenever an event is generated. Event is generated when the parser encounters a new XML tag or encounters an error, or wants to tell anything else. SAX is memory-efficient to a great extend.
SAX is very useful when the document is large.
DOM reads the entire XML structure and holds the object tree in memory, so it is much more CPU and memory intensive. For that reason, the SAX API are preferred for server-side applications and data filters that do not require any memory intensive representation of the data.
Recommend the tutorial