[Home] [By Thread] [By Date] [Recent Entries]

  • To: xml-dev@l...
  • Subject: W3C XML Core WG requests comment: control characters in XML 1.1
  • From: John Cowan <jcowan@r...>
  • Date: Thu, 9 May 2002 13:14:42 -0400 (EDT)

This is a request for comment from this mailing list (or anyone else)
on a proposal by Shigemichi Yazawa for a standard representation for
the Unicode control characters that are not legal in XML 1.0.  See
http://lists.w3.org/Archives/Public/www-xml-blueberry-comments/2002May/0000.html

In essence, this provides an element "<xml:orphanedChar value="#x0001">"
which can be used *by convention* in place of an actual (and illegal) #x1
character.  The Infoset would view this as an element, not a character; it
would not be usable in attribute values; it is not fully general-purpose.
It would also require explicit declaration in schema languages, unless
they were modified to ignore it; even then, an element with an XSD
datatype would not be able to use this feature.

An alternative proposal is to use a processing instruction such as
"<?xmlchar #x1?>", which would be invisible to schemas.  A little *too*
invisible, in some cases: it would be legal in simple datatypes, but a
string-typed element containing 3 characters could not contain 3 control
characters and still be schema valid.

The idea is certainly a hack.  However, it may meet the use case
of people who wish to incorporate arbitrary Unicode strings into
XML character content by providing something that may meet the 80/20
requirement.  Whether it *does* meet the 80/20 requirement is what we
chiefly want to know.  Please make sure that all comments are cc-ed to
www-xml-blueberry-comments@w....

-- 
John Cowan <jcowan@r...>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member