[Home] [By Thread] [By Date] [Recent Entries]
Forwarded for Clark... -------- Original Message -------- Subject: Re: Documents, data and markup: YAML Ain't Markup Language Date: Mon, 9 Jun 2003 19:28:55 +0000 From: Clark C. Evans <cce@c...> To: Paul Prescod <paul@p...> CC: Dare Obasanjo <dareo@m...>, xml-dev@l... References: <B885BEDCB3664E4AB1C72F1D85CB29F80648444F@R...> <3EE0F736.60608@p...> On Fri, Jun 06, 2003 at 01:19:02PM -0700, Paul Prescod wrote: | As Eric said, mixed content is a big one. Indeed. I would say that mixed content is *the* line between narrative (document processing) and operational (data processing) information. Mixed content is at the core of XML. You can choose not to use it, however, you always pay for its complexity. I wanted a serialization language that didn't pay the high price of attributes, mixed content, element tag vs content and other items necessary for document processing. ... The primary distinction between XML and YAML is the information model. YAML has two models, a graph and a serial model. The graph model assumes a random-access mechanism where nodes are functions (maps or lists) and scalars. In the serial model, this graph is flattened by marking the first occurance of a node, and then signifying subsequent occurances. In both models every node has a type, the default type being string, mapping, or list. In effect, in YAML ballences needs of a computerized random-access environment with human sequential reading needs. YAML specifically ignores document processing requirements. Over time I've come to use both XML and YAML, leveraging the strengths of each where they best fit my problems. In particular, their drastically different syntax lets you blend both of them together in the same file! I do this frequently. YAML has a significantly different model from XML: - XML distinguishes between 'tags' and 'content', in YAML mapping keys are scalars just like mapping values or list entries. Thus, XML has a deep syntaxtual distinction between 'meta-data' and 'data'. YAML avoids this distinction. - XML elements have attributes, or key/value pairs which serves as a mapping. YAML has a mapping, but unlike attributes, both the key and value can be structured. - The XML model is a tree, YAML is a graph. In YAML syntax there are 'anchors' and 'aliases', but these are features of the syntax necessary to flatten the graph. - XML has namespaces, YAML nodes can have a type specifier. They are similar, but quite different as 'namespace' really does not exist in YAML land, only types. Of course, someone is free to interpret sub-strings of a type specifier how ever they wish. - In the XML information model, syntax is king. In YAML, we have two models beacuse both humans and machines are king at the same time. Albeit machines are a bit more kingish. - The top production of XML is a single document node; the top production in YAML is a sequence of nodes. This dual model creates a few 'inconsitencies' which are easy to explain; certain elements of the serial model just are not in the graph model. The most troublesome is key order. Human readers require specific key ordering for their data processing; and some sequential processing applications need keys to be sent in a particular order. The solution here is to augment the 'graph' model with a 'style-sheet' which aids in the translation from a graph to a serialized textual form. Therefore, - XML requires a schema to extract data from the syntax, YAML requires a schema to serialize data to the syntax. Now, one *could* use YAML to express a document, however, the author would have to pay the price of keeping everything 'functional', that is, thinking only in terms of sequences and mappings. It is not pretty... I've tried it. Indeed our spec is written in a YAML language for documents, but as I remember it drops down to HTML for use in paragraphs. That said, as much as you can argue that XML is good for data serialization, I can argue that YAML is good for document processing. XML is butt ugly for data processing. YAML is butt ugly for document processing. And I do not think you can argue your way out of this. XML was designed up-front to be a document processing mechanism. There is no way to eliminate that legacy. | In document applications, order tends to matter by default. | In data applications, order tends not to matter except in | specialized list contexts. In data applications, I'd say that the structures fall evenly down the mapping vs sequence. The sequence is not really a 'specialized' context, it is more of a general rule. | Name/value pairs are probably the most convenient "fundamental data type". The fundamental data type is the function. Both mappings and lists are functions. IMHO, it is really mixed content which is the pivot point, that and having ordered keys where duplicates are allowed. | In documents, lists of elements tend to be. It is only because | documents tend not to make heavy use of name/value pairs that XML can | get away with such a weak notion of attributes (which, ironically, | data-heads are often agitating to remove!) Not really ironic. The attributes do not allow for recursion, and thus are not very useful in a data context. ;) | I am good friends with one of the inventors of YAML and I don't argue | with him when he says that YAML is better for most data-oriented | applications. I think he's probably right. But as somebody else said, | what would be the cost in toolset complexity of having to master two | different languages. Not that much. If anyone can master XML, they could master YAML in a faction of the time. Mostly beacuse YAML hasn't the toolset that XML has. However, the toolset will emerge, it just may take a few years. Also, YAML was really designed from the knowlege of XML, and thus lessons hard won by the XML community could be used by YAML without the legacy. Indeed, YAML owes much of its history to XML via the SML-DEV list and our dissident analysis. | If one could go back in time, one could approach the problem from | scratch with the needs of document and data heads equally represented. | It would not just be useful to combine them so we could reuse tools. It | would be useful to combine them because most documents have a | data-oriented subset (if only the "metadata" element at the top) and | many data applications have a document-oriented subset (if only rich | text fields). Another reason to combine them is that there is no clear | boundary. There is a spectrum. Yes! Much of my data now mixes the two. ;) | But I'm sorry to say that that is not the way XML is. | | And by the way, if you consider RDF: | | * triples are roughly equivalent to name/value pairs (the third item | in the triple is the "parent" object) | * order does not matter by default | * types and roles are distinguished | * types and roles are context-free | * triples with unknown predicates are easily ignored | | IMHO, is precisely the impedence mismatch between the data view of the | world and XML that makes RDF look so ugly. As a data model, RDF is not | far from ideal for most of the data-oriented applications I've done. | | I think that having a clean strategy for merging the two worlds is one | of the big open questions in the XML world. Thanks Paul. This was very insightful. Best, Clark
|

Cart



