DOM (Document Object Model)

DOM Overview

Stylus Studio® XML Enterprise Suite deals with all of the issues of DOM and SAX for you when you deal with parsing XML and connecting together stages in an XML pipeline. But sometimes it's useful to know how all the pieces of DOM fit together. This page is designed to give a brief overview of the Document Object Model, or DOM, along with comparisons of the three levels of DOM available.

One of the problems with DOM is that it was designed to be a cross-platform interface, which basically means the way it has of expressing the XML model is not optimal for any single platform. Thus instead of any local language idiom — such as collections, hashes, sets or maps — there are artificial constructs such as NodeLists, NamedNodeMaps, and NameLists. Hopefully, in the event you do need to navigate the DOM, this overview will help you see what is available and in what DOM level.

Note that this document is only concerned with the XML DOM, and not the HTML DOM. However, many of the same comments apply to the latter. Also, this document only deals with W3C recommendations, drafts or even candidate recommendations are not included.

DOM Basic Types

DOMObject (3)

The DOMObject type represents a native object (or Object) type, as in ECMAScript or Java.

DOMString (1) (2) (3)

The DOMString type is implemented as an array of 16-bit characters from the UTF-16 Unicode character set. It corrsponds to the String type in both Java and ECMAString, or (unofficially) the wstring type in some C/C++ implementations.

DOMTimeStamp (2) (3)

The DOMTimeStamp is used to store either a time or a duration, measured in milliseconds. It must be implemented either as a Date type or as at least a 64-bit integer.

DOMUserData (3)

This is a reference to user-specific data. In Java, it is an Object, and in ECMAScript, it is an any.

DOM Structure Model

Attr (1) (2) (3)

Attr objects correspond to the attributes of an element. However, they are not part of the DOM tree, and therefore have no parentNode, previousSibling or nextSibling. An element can access them by means of specific get, set and remove methods. Oddly enough, they are extensions of Node.

CDataSection (1) (2) (3)

This represents CDATA sections in the source XML.

CharacterData (1) (2) (3)

This is the parent interface for CDataSection and Text.

Comment (1) (2) (3)

This represents the content of an XML comment. It does not include the actual <!-- and --> markers, but only what is between them. It is also a subclass of CharacterData.

Document (1) (2) (3)

This is the container of the document. It's one child element would be the first, or root-level, element in the XML document. This interface contains many methods which operate on the document as a whole or which require the document itself for context.

DocumentFragment (1) (2) (3)

This is the "diet" version of a Document object, useful for passing around subtrees which may or may not contain children.

DocumentType (1) (2) (3)

This read-only object holds the public and system identifiers of the document, the entities and notation lists, and the internal subset of the DTD.

Element (1) (2) (3)

The Element interface corresponds to elements in XML. They can have as children text, other elements, processing instructions, or, via special methods, attributes.

Entity (1) (2) (3)

Any entities within the XML document, either parsed or unparsed, are represented by instances of an object defined with this interface. If available, this can include the public and system indentifiers as well as the encoding and XML version. Entity nodes and their children are read-only.

EntityReference (1) (2) (3)

An EntityReference represents an entity reference in the XML tree. Note that character references and predefined entities (such as &x#2022; and &amp;) would already have been expanded by the XML parser. EntityReference nodes and their children are read-only.

NamedNodeMap (1) (2) (3)

A NamedNodeMap is an unordered collection of names that correspond to Nodes.

Node (1) (2) (3)

Node is the fundamental interface for the DOM — almost every object in the DOM structure inherits directly or indirectly from Node, even those that do not support child nodes.

NodeList (1) (2) (3)

The NodeList is just an ordered collection of Nodes. Some implementations reportedly implement elements as both Nodes and NodeLists, so be wary of what object you actually receive from the DOM.

Notation (1) (2) (3)

This represents a notation as declared in a DTD. They cannot be created through the DOM interface.

ProcessingInstruction (1) (2) (3)

This just corresponds to a PI in the XML. Note that even though the <?xml version=...?> at the top of an XML document looks like a PI, it is not and will not be seen as such in the DOM.

Text (1) (2) (3)

This represents the character data of an element or attribute. There are some gotchas when dealing with text nodes; it is possible to have several adjacent ones that may need to be normalized, and mixed CDataSection and Text nodes may not behave as you would expect, event after normalization.

DOM Core

DOMConfiguration (3)

The configuration of the DOM is located here. A set of SAX-like properties is used, including canonical-form, cdata-sections, check-character-normalization, comments, datatype-normalization, element-content-whitespace, entities, error-handler, infoset, namespaces, namespace-declarations, normalize-characters, schema-location, schema-type, split-cdata-sections, validate, validate-if-schema, and well-formed.

DOMError (3)

The DOMError interface describes the type of error encountered, along with additional details such as the severity (SEVERITY_WARNING, SEVERITY_ERROR or SEVERITY_FATAL_ERROR), the location, and any related exceptions.

DOMErrorHandler (3)

A DOMErrorHandler describes a callback interface used when an error occurs. It is set through the DOMConfiguration interface.

DOMException (1) (2) (3)

For languages that support exception handling, the DOMException is used when something truly exceptional (pardon the pun) happens. Normal errors would return normal exceptions, such as for out-of-bound errors when dealing with arrays, or null-reference exceptions when the user passes a null where one was not expected. The DOMException is reserved for when something cannot happen because it is impossible to perform, such as when data is lost or when there is some internal inconsistency.

DOMImplementation (1) (2) (3)

The DOMImplementation interface is used to expose methods that are used to create DOM trees and (starting in DOM Level 3) set or query the features available within the DOM.

DOMImplementationList (3)

This is just a list of available DOM implementations.

DOMImplementationSource (3)

The DOMImplementationSource interfaces provides a way for a user program to request a DOM implementation that implements a certain set of user-specified features.

DOMLocator (3)

This points to a location in a document, such as the point at which an error occurred. It includes fields the line and column number, the byte offset into the stream, the URI of the document, and more. If one or more of these are not available, they will be set to null or -1 as appropriate.

DOMStringList (3)

This is just an ordered list (e.g. an array) of DOMString values.

ExceptionCode (1) (2) (3)

This is an enumeration of exception codes:

DOM Levels 1, 2 and 3: INDEX_SIZE_ERR, DOMSTRING_SIZE_ERR, HIERARCHY_REQUEST_ERR, WRONG_DOCUMENT_ERR, INVALID_CHARACTER_ERR, NO_DATA_ALLOWED_ERR, NO_MODIFICATION_ALLOWED_ERR, NOT_FOUND_ERR, NOT_SUPPORTED_ERR, and INUSE_ATTRIBUTE_ERR.

DOM Levels 2 and 3: INVALID_STATE_ERR, SYNTAX_ERR, INVALID_MODIFICATION_ERR, NAMESPACE_ERR and INVALID_ACCESS_ERR.

DOM Level 3: VALIDATION_ERR and TYPE_MISMATCH_ERR.

NameList (3)

This interface describes an ordered list of names each of which corresponds to a namespace.

TypeInfo (3)

The DOM Level 3 introduces a TypeInfo structure which contains schema-specific data type information that is referenced by an element or attribute node.

UserDataHandler (3)

The UserDataHandler interface defines a way for an application to get a callback whenever a node is cloned, imported or renamed.

DOM Views

AbstractView (2)

This is the base interface for any specialized views of the DOM.

DocumentView (2)

This is an alternate view of the DOM. Perhaps this could be used to represent the DOM after a CSS transformation has occurred, or based on some other transformation or presenetation of the underlying physical DOM.

DOM Events

DocumentEvent (2)

To fire an event of a certain type against the document, use the method supplied by this interface.

Event (2)

This is the base interface for any type of event against the DOM. This is what is passed to an event handler.

EventException (2)

Events that fail may throw this type of exception. Typically it will return an EventExceptionCode of UNSPECIFIED_EVENT_TYPE_ERR.

EventListener (2)

This is the interface that code that handles events for nodes must implement.

EventTarget (2)

If the event model is supported by a DOM, all participating nodes will also implement the EventTarget interface to denote they can be recipients of events. Via this interface, event listener code can be attached.

MouseEvent (2)

This is a type of user interface event that describes a mouse action.

MutationEvent (2)

A mutation event is an event which changes the structure of the DOM. The various types of defined mutations are: DOMSubtreeModified, DOMNodeInserted, DOMNodeRemoved, DOMNodeRemovedFromDocument, DOMNodeInsertedIntoDocument, DOMAttrModified and DOMCharacterDataModified. Using this mechanism, you can capture changes to the DOM and act accordingly.

UIEvent (2)

This is the generic interface for user-interface-related events. It is anticipated that in addition to the mouse event interface, a KeyEvent interface will be supplied in a future draft.

DOM Stylesheets

These do not apply to the XML DOM, but only to the HTML DOM, with the exception of one footnote. This one exception is the processing instruction which triggers a browser to load an XSLT document and apply it to the current XML document. That PI looks like this:

<?xml-stylesheet href="your CSS stylesheet path here" type="text/css"?>

There are other optional pseudo-attributes, including title="...", media="...", charset="..." and alternate="yes"|"no". For more guidance, please refer to Associating Style Sheets with XML documents.

DOM Traversal

DocumentTraversal (2)

This interface contains methods that create iterators and walkers for traversing the DOM.

NodeFilter (2)

A NodeFilter will allow the iterator or walker to which it is attached to skip over nodes that do not match its criteria.

NodeIterator (2)

This allows a list of nodes matching some criteria to be stepped through one at a time, in document order, both forwards and backwards. It is valid until detached even if the underlying document changes.

TreeWalker (2)

The TreeWalker interface allows you to navigate the DOM using the filtering but in a tree-like fashion instead of the list-like fashion of the NodeIterator. It also supports returning only certain types of nodes.

DOM Ranges

DocumentRange (2)

This interface exposes the method for creating a Range object.

Range (2)

A range describes some portion of a document starting and stopping at specific locations. This is not just a subtree, as it may start anywhere, even in the middle of text content, and end anywhere. The only limitation is that both starting and ending points must have a containing object that is a common ancestor. The two boundary points can be as close as being within the same string, or as far apart as the starting and ending objects of the document.

RangeException (2)

When a BAD_BOUNDARYPOINTS_ERR or INVALID_NODE_TYPE_ERR error is encountered, the RangeException is thrown. Those codes are enumerated in RangeExceptionCode.

DOM Load and Save

DOMImplementationLS (3)

This adds to the DOMImplementation interface new methods for creating loading and saving objects.

LSException (3)

This exception can be thrown to stop parsing or serializating.

LSInput (3)

This denotes an input source for loading. It supports both character and byte streams, and not having those will try to resolve the string literal, then the system ID, then the public ID.

LSInputStream (3)

This denotes a binary input stream for loading as XML.

LSLoadEvent (3)

This is a type of event that singals the end of a load.

LSOutput (3)

This is the destination for XML output. It must contain either a character stream, a byte stream or a system ID.

LSOutputStream (3)

This denotes a binary output stream for saving as XML.

LSParser (3)

The parser takes some form of XML input and builds a DOM from it.

LSParserFilter (3)

The filter can intercept nodes as they are being parsed. It can abort the parsing, or inject, modify or remove nodes on the fly as the DOM is being built.

LSProgressEvent (3)

This is a callback that can show the progress of parsing.

LSReader (3)

This denotes a sequence of characters to be read as XML for loading.

LSResourceResolver (3)

A resolver can redirect references to external objects. A URI and other related information are passed in, and a LSInput object is returned that corresponds to that resource if it is available.

LSSerializer (3)

The serializer writes a DOM or in fact any node type to an LSOutput destination. It will fix up namespaces and escape characters as necessary.

LSSerializerFilter (3)

Based on the NodeFilter interface, this can be used to skip the serialization of certain nodes at the time the DOM is being written to an LSOutput destination.

LSWriter (3)

This denotes a sequence of characters to be written as XML.

DOM Validation

CharacterDataEditVAL (3)

This subclasses the NodeEditVAL to expose additional methods useful for textual processing, such as whether this node is all whitespace or not.

DocumentEditVAL (3)

This specifies properties such as whether the document must be continually re-validated as each change is made to the DOM and also includes methods for force validation.

ElementEditVAL (3)

In addition to the features of NodeEditVAL, this provides information for guided editing such as which elements are appropriate as child elements.

ExceptionVAL (3)

Some of the validation operations may throw this exception — typically when asked to validate before a schema is attached.

NodeEditVAL (3)

This is the basis for all node-oriented validation interfaces, and supports checking for well-formedness as well as schema compliance. It surfaces enumerated values when appropriate as well as the default value for an element or attribute.


The End of DOM

Yeah, with all these methods, we wish. However, the DOM is useful, and Stylus Studio® XML Enterprise Suite contains many powerful tools based on the DOM model which provide many of the benefits the above interfaces imply — guide editing, on-the-fly validation, support of multiple encodings and then much higher-level constructs like XML Pipeline Tools and XML Schema Tools. Investigate today why Stylus Studio® is the choice of serious XML developers by downloading your free evaluation copy.

PURCHASE STYLUS STUDIO ONLINE TODAY!!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Try Stylus DOM XML Tools

Only Stylus Studio leverages the Document Object Model API for XML in our tools and processing components - Download a free trial today!

Learn XQuery in 10 Minutes!

Say goodbye to 10-minute abs, and say Hello to "Learn XQuery in Ten Minutes!", the world's fastest and easiest XQuery primer, now available for free!

Why Pay More for XML Tools?

With Stylus Studio® X16 XML Enterprise Suite, you get the most comprehensive XML tool suite at one incredibly low price. Value: it's just one of many reasons why smart XML developers are choosing Stylus Studio!

 
Free Stylus Studio XML Training: