[Home] [By Thread] [By Date] [Recent Entries]


[Zielinski, Marek]

> I am trying to define restrictions on string lengths using Schema. The
data
> actually comes from databases, and is exchanged between two different
> systems. I encountered a snag: when the string contains one of the
reserved
> characters, like "&", the parser automatically translates it into an
entity,
> e.g. &. This increases the length of the string, and now the string
does
> not fit; the validator (I am using XMLSpy) rejects it as too long.
>

XML input can never actually contain the literal character "&" (unless it is
in CDATA, of course).  It must be written as & a m p ; (less the spaces).
Even if it were held in the database as a single ampersand character, when
serialized to xml it would have to be escaped.  Once parsed, the string
passed to the application should contain the original "&" character.

So the real question is,  does the length restriction in xml schema apply to
the raw xml or the the string stored in the infoset after parsing?

Section 2.2 of Part 1 of the Schema Rec says

"  The concepts and definitions used herein regarding XML are framed at the
abstract level of information items as defined in [XML-Infoset]. By
definition, this use of the infoset provides a priori guarantees of
well-formedness (as defined in [XML 1.0 (Second Edition)]) and namespace
conformance (as defined in [XML-Namespaces]) for all candidates for
·assessment· and for all ·schema documents·. "

So clearly xml schema assessment has to take place after all the entities
and character references have been replaced by the parser.

So this looks like a bug in Spy.

Cheers,

Tom P



Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member