[Home] [By Thread] [By Date] [Recent Entries]
Costello, Roger L. wrote: > Hey Rick, > > In the Schematron 1.5 specification you write: > > "Which is not, of course, to say that grammars are not quite useful > when appropriate." > > When should a path-based language (e.g., Schematron) be used? What > should be the role of a path-based language in expressing constraints > and validating XML instance documents? > > When should a grammar-based language (e.g., DTD, XSD, RNG) be used? > What should be the role of a grammar-based language in expressing > constraints and validating XML instance documents? > It depends on whether you mean "should" from the POV of passive consumers of technology and standards and infrastructure or "should" from the POV of active creators and pioneers of technology and standards and infrastructure. In the first case, "should" is limited by products and availability and skillsets and capital cycles; no blue-sky-ing. You use whatever it takes to get the job done reliably with what is available to you. By force of habit, marketing and expectations, people will typically try a grammar first, then Schematron to wallpaper the cracks. That is reasonable. But I am more interested in the second "should": there is no need to say "XSD is too fat" or "RELAX NG is too lean" or even "Schematron is just right" because the motivation is not territorial or the achievement of perfection but how to move on to a more effective basis than XSD and grammars currently offer: people don't have enough energy after working through the XSD issues to build the more complex WS-* systems that the vendors have invested heavily in. The equation is simple: the fewer brain cells occupied by figuring out the schema and WSDL, the more available for integration tasks. But there is little awareness that regular grammars only can represent certain kinds of structures; people tend to blame the particular flavour (XSD, RELAX NG, DTD) rather than the whole class of technology. The litmus test is this: if you can say "My data is a simple tree of information" without any qualms, then a grammar is probably perfectly workable for you. But if you have reservations, such as "Well, we have a lot of internal links actually" or "Well, it is really the serialization of a graph or data structure that is not really a tree at all" or "Well, we have structures which are dependent on more than just the parent element and preceding siblings" or "Well, we have inherited constraints such as inclusions, exclusions, attributes of an element that can also appear on any of the descendants" or "Well, we have structural requirements that are scoped to the document rather than the particular context" or "Well, we are using standard schemas that use wildcards but we have particular requirements for them" or "Its not a tree, but multiple tree variants with some kind of selector attribute". Or, "Well, its a tree but I really only can define a portion of the constraints at this time: other people will be adopting and adjusting it". Or "Well, it it is a tree, but it serializes an ER database and we don't care whether one-to-one relationships are represented using child or parent containment, attributes or lD links or keys, but only one." Or "Well, it is a tree, but we need to be justify the business reason why every constraint exists, otherwise it is just a cost: grammars encourage us to have a lot of sequence constraints that our data does have and which adds extra work when generating XML and extra tests when accepting XML." IMHO, ultimately grammars will be a niche technology: an implementation technology or optimization under the hood, a niche schema language when you have repeating structures that are not tagged explicitly...perhaps even the internal format for an XML IDE. I seem to be the only person in the world who believes this, which must be embarrassing for everyone else. :-) We can still have XML, namespaces, XSLT2, XQuery2, XS:datatypes and so on; but the validation and type attribution can be built into the same XPath process that does the XSLT, XSLT2 or XQUERY2 rather than requiring a separate validation step. Yet everyone knows that XSD is a schema language that people use for everything *except* validation. But grammars interpose a separate mapping step on modeling existing databases; I think this step is basically fat that can and should be trimmed, in favour of a path-based system that allows a direct mapping from the XML document to the data model in the kind of way I suggested in the ER example in the previous email. Summary: For now, if you have a simple tree and have grammar skills or applications, use RELAX NG or XSD and use Schematron to wallpaper any little differences. If you have something a little more complicated, plan on having a Schematron step as well as whatever grammar validation you have, just as standard procedure. If you need to validate and you have anything complex going on, just forget grammars and start off with Schematron. Especially if you have XSLT skills and don't want to waste your brain space learning a big fat standard like XSD. For the future...well, I think that if grammars are indeed fat, commercial logic and user acceptability issues will drive grammars out in favour of query-implementable schema languages. We are at the point in the investment cycle where the need to get a return on the type-dependent technologies does not override the need to get a return on the type-attributing technology; at some time in the future vendors will move from "how can we monetize our investment in XSD by providing type-aware systems like WS-* or XLinq or XQuery?" to "How can we monetize our investment in WS-* and XLinq or Xquery by choosing a less intrusive schema technology?" I don't know whether this is 2007 or 2017, but I think it will happen. Cheers Rick Jelliffe
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



