[Home] [By Thread] [By Date] [Recent Entries]
Michael Kay wrote: > > > First normal form: > > ------------------ > > Data is in first normal form if it (a) has a primary key and > > (b) has no repeating fields. > > Actually, *data* can't be in first normal form - only *relations* can. You're correct -- poor wording on my part. > So > applying the concept to a data model that doesn't use relations is pretty > dicey. Agreed. What I'm curious about is whether there is a concept analogous to normalization that can be applied to the XML data model. This might be useful in proposing XML information modeling best practices, which was Simon's original goal. > Obviously (b) doesn't have any relevance to a hierarchic data model, That depends on how you choose to interpret what "repeating fields" means. If you choose to view B in: <!ELEMENT A (B+)> as a (pardon the expression) single, multi-valued attribute, then the above content model has no repeating fields while the following content model does: <!-- B1 and B2 represent the same real-world entity --> <!ELEMENT A (B1, B2)> > > In XML terms, this > > implies that you only store one "thing" per document > > You've made a magic jump from "data" being normalized to "documents" being > normalized, and you seem to be assuming that a document should represent one > tuple in a relation - that's a mighty big jump. That was my intention from the start, but I obviously wasn't clear about it. > Yes, [3NF] does apply to XML, but it certainly doesn't tell us how to split our > data into multiple documents. It does tell us how to design our hierarchies, > but not how to partition those hierarchies across documents. I think it does tell us how to partition documents. It says that data shared across multiple documents needs to reside in separate documents. Sales orders are a bad example here, since the "XML normalized" form is virtually identical to the relational normalized form. Semi-structured data is probably a better example, since documents containing semi-structured data are likely to be substantially different than their relational counterparts. > But all this presupposes that we are designing XML documents for storage and > query. Most XML documents are designed for messaging of some kind (between > humans or between software components). Within the context of a message, > duplication is far less of a problem, for example it doesn't matter if I > hold product code, description, and price as part of each order-line in an > order. Many XML databases are actually archives of such messages, so > duplication of data is a fact of life; and since it's an archive, the update > problem doesn't arise. This is the conclusion I came to. -- Ron
|

Cart



