[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Costello, Roger L." <costello@m...>
  • To: "xml-dev@l..." <xml-dev@l...>
  • Date: Mon, 28 Jan 2013 19:21:24 +0000

Hi Folks,

The Kelvin Sign (K) is high up in the Unicode code space, it is codepoint U+212A. That's way up there.

Compare with the Latin capital letter K, its codepoint is U+004B. That's way down there.

Interestingly, the lowercase of the Kelvin Sign is the Latin small letter k:

	lower-case(&#x212A) = 'k'

"So what's the big deal?" you ask. Actually, it's a really big deal. Let me explain.

Suppose you want to enforce this rule in your XML instance documents: 

    	The value of the <Name> element must
    	be 'Lockhart' (lowercase, uppercase, any
    	case).

In XPath the rule can be expressed using the matches() function:

    	matches(Name, 'Lockhart', 'i')

The third argument ('i') means that you want matches to do a "case insensitive match."

So, applying the matches() function to this:

    	<Name>Lockhart</Name>

returns true.

But it also returns true to this (recall that U+212A is the Kelvin Sign):

    	<Name>Loc&#x212Ahart</Name>

Ouch!

The <Name> element contains invalid data but the matches() function claims that it is valid data.

Let's see how this applies to XML Schemas. Here I declare a <Name> element and specify that its value must be 'Lockhart' (case insensitive):

              <xs:element name="Name">
                    <xs:simpleType>
                        <xs:restriction base="xs:string">
                            <xs:assertion test="matches($value, 'Lockhart', 'i')" />
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>

I then validate this:

	<Name>Loc&#x212Ahart</Name>

against the schema and the validator says "Valid"

Ouch!

Invalid data has gotten into our system.

Question: Are there other characters similar to the Kelvin Sign? That is, are there other characters that are outside [A-Za-z]  but when lower-case() or upper-case() is applied to them they are inside [A-Za-z]?

/Roger 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member