[Home] [By Thread] [By Date] [Recent Entries]
Hallo Eric, I'd like to take this opportunity to share some of my thoughts (well, maybe a question or two). I am currently on the early stages of designing an (XML based) data exchange network. There is no specific output type or client. We'll have multiple kinds of output and client types over the network, as well as multiple types of input. Lets say that in many of the client queries, the requested data will be parts of XML documents. The conclusion I have reached is deployment of DOM for the input data to be manipulated and SAX to handle most client requests. XSLT will take place in both cases while this model will be used in 90% of the case. My problem is I haven't reached a decision about the structure and size of the XML documents (each will have to be able to cover more than one type of data request) or even the number of different types of documents... What I'm trying to say is, I usually read about people arguing about APIs, while none mentions the actual design of how the whole thing (mostly the document characteristics) will be or the transformation/processing stages. Finally ... anyone has any resources to share on this matter? They would be much appreciated ;-) Kindest regards, Manos Batsis Interactive Media Director BeCom : A Profile Company http://www.becom.gr http://www.profile.gr e-mail: manosb@p... Tel: +301 3270500,+301 3270565 Fax: +3013221268 -----Original Message----- From: Eric van der Vlist [mailto:vdv@d...] Sent: Tuesday, December 05, 2000 5:37 AM To: xml-dev@l... Subject: (more) extensible SAX Although this email would have been more timely 1 year ago, I'd like to share some of my thoughts about SAX and one of the ways to make it more easily extensible. First, to set up the context, I'd like to say a word about what I think is the most important difference between SAX and other APIs (like DOM). In most of the papers I can read, SAX is opposed to DOM as a pull versus push. While this is certainly an important difference, I don't see it as the main difference, but I'd rather say that the main difference is that SAX and DOM are acting at different levels and that SAX is the most "neutral" interface, DOM being more biased by a specific interpretation of what is a XML document. What is making SAX unique is that no (or very few) assumption is made on the way the information will be used and is presented almost raw to the application. While an application using a DOM interface will have to re-interpret information stored into the DOM and often to translate its structure, the same application using SAX will only have to create its object model from raw information. This is true of "data oriented" application and can even being true of document oriented applications, XSLT processors being a good example of applications that can increase there performance by using their specific object models rather than by using a standard DOM. Now, I'd like to go on by explaining what I think are the two weaknesses of SAX. The first of them is that the information isn't raw enough for some applications and that there is still an information loss in the interpretation that is done (an example is the fact that you can't access information about parsed entities as discussed in one of my articles [1] on XML.com). This second (and almost opposite) one is that in some cases, there isn't enough interpretation. The way SAX1 has needed to be modified to support the namespaces is a good example for this and the problem is likely to happen again as long as new features are added through modularization to XML 1.0. I think that both are coming from a quest to find a balance and to define an API that will meet most of the needs (I could call it the "one fits all" utopia) and that this issue should be addressed by adding more modularity and layering rather than by adding more complexity to existing methods. The way SAX2 is handling namespaces is showing, IMHO, how difficult it will be to extend its features. I find the fact that to expose more information about a simple "startElement" we have needed to change the API to add new parameters to the methods really worrying. I think this would be a good justification to hide the complexity of the XML productions within objects. What do I mean concretely ? Instead of: startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, Attributes atts) throws SAXException I would have far preferred to have: startElement(org.xml.sax.StartElement start) throws SAXException Where the StartElement class would have been extensible by adding new methods rather than by modifying existing ones and could potentially have provided all the available information about the tag. Without such a mechanism, I am afraid that to support feature X or Y (think of xml:base of xml:lang for instance), you'll need to add more parameters to the startElement method. This model would also allow to provide the full text of the opening tag to the tools that might need it (for instance a XML editor that would like to preserve its format). It would help solving the issue of scoped nodes that I have recently posted on xml-dev [2]. Last point, why do I call it a layered interface ? Because we could define on top of this a layered architecture where a single event would get richer by each layer it comes through. The first layer could be the recognition of the basics XML productions. A second layer could be to include entities processing and well formness checks. Next layers would include namespaces and scoped attributes. The same object (startElement for instance) could go through the different layers and gain peace of interpretation and information without losing it's original info just by being used to create a object from an extended class at each step. I had a look at Aelfred and XP and both are more or less implementing this kind of layering, even though it's not that clearly separated and it's using internal proprietary interfaces. I don't see anything but advantages, one of them being the extensiblity: with this architecture, SAX2 would just have been a layer on top of SAX1. Have I miss something ? Thanks Eric [1] http://www.xml.com/pub/a/2000/08/09/xslt/xslt.html [2] http://lists.xml.org/archives/xml-dev/200011/msg00551.html -- See you at XML 2000 http://gca.org/attend/2000_conferences/XML_2000/building.htm#vlist ------------------------------------------------------------------------ Eric van der Vlist Dyomedea http://dyomedea.com http://xmlfr.org http://4xt.org http://ducotede.com ------------------------------------------------------------------------
|

Cart



