[Home] [By Thread] [By Date] [Recent Entries]

  • From: Tim Bray <tbray@t...>
  • To: John Cowan <johnwcowan@g...>
  • Date: Sat, 31 Jul 2021 13:54:43 -0700

On Fri, Jul 30, 2021 at 3:04 PM John Cowan <johnwcowan@g...> wrote:

Well, it was originally the *creating* system that is supposed to NFC-normalize, and neither the receiving system nor a retransmitting system.  But that has never applied to XML or HTML, and as a systems property is too hard to manage.  So you should normalize just in case you need to compare: it's not normalization but equality under normalization that really matters.

Um… be very careful with that.  Normalization is a can of worms that can lead to surprising results. Many protocols that base themselves on Unicode explicitly forbid normalization and define equality in terms of codepoint-by-codepoint comparison. 

I can see using normalization in a data-acquisition UI or database search interface but it's hard to imagine many other situations where it would make sense.  Use the bits you've received over the wire, don't [expletive deleted] with them.

One you've looked at normalization you're on a slippery slope that could lead to (*gasp* *shudder*) case-folding. And you definitely don't want to go there.

 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member