[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Costello, Roger L." <costello@m...>
  • To: "xml-dev@l..." <xml-dev@l...>
  • Date: Fri, 4 Jan 2013 18:26:47 +0000

Hi Folks,

Consider this Spanish name: Martiñez

Instead of using the ñ character, one can use the (base) "n" character followed by a combining tilde (hex 303) character.

So that Spanish name can be equivalently expressed as: Martin&#x303;ez

Here is an XML document that uses the latter form:

<?xml version="1.0" encoding="utf-8"?>
<Name>Martin&#x303;ez</Name>

I wrote a stylesheet that uses the substring() function to extract the combining tilde character and onward:

    <xsl:template match="/">
            <Result>
                    <xsl:value-of select="substring(Name, 7)" /> 
            </Result>     
    </xsl:template>

The output is:

<?xml version="1.0" encoding="UTF-8"?>
<Result>Þez</Result>

I checked it for well-formedness and the XML Parser says it is well-formed.

According to the book, Fonts & Encodings (p. 61, first paragraph):

    ... we select a substring that begins
    with a combining character, this new
    string will not be a valid string in
     Unicode.

The value of the <Result> element is not a valid Unicode string, so how can it be a well-formed XML document?

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member