[Home] [By Thread] [By Date] [Recent Entries]

  • From: Peter Flynn <peter@s...>
  • To: xml-dev@l...
  • Date: Tue, 28 Dec 2021 11:38:53 +0000

On 27/12/2021 12:03, Roger L Costello wrote:

[snip]
> If the XML document is not associated with a schema (XSD, DTD, or 
> RNG), then the answer is always (a) and the whitespace may be safely 
> discarded.

I think it's other way round. In the absence of a schema/DTD, whitespace
must be retained and passed to the application. Only a schema/DTD can
identify where whitespace can safely be ignored.

> So, sometimes the content of <Document> is one thing, sometimes it's
> another thing. This complicates lexers (and parsers) because they must
> have external, out-of-band knowledge about the document. 

Yes, exactly.

> Is that good language design?

For the original purposes of SGML and XML (large text documents with
both element content and mixed content), yes. In those cases, a schema
is pretty much always used, so the question never arises (it's [a]).

If you use XML to hold what is essentially rectangular data (rows and
columns), or if your application can dispense with mixed content, the
question also never arises (it's [b] and it's up to the application to
ignore whitespace-only nodes).

Basically it's a feature, not a bug 🐞

The only notable bug is (was?) in software that discards a
whitespace-only node that is the sole node between adjacent elements
when a schema/DTD has identified the context as being mixed content.
That is /always/ wrong.

Peter


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member