Table of contentsAppendices |
2.13 Normalization CheckingNormalization CheckingAll XML Text Entity (including Document Entity) SHOULD be fully normalized as per the definition of [Definitions for Character Normalization] supplemented by the following definitions of relevant constructs for XML:
However, a document is still well-formed even if it is not fully normalized. XML processors SHOULD provide a user option to verify that the document being processed is in fully normalized form, and report to the application whether it is or not. The option to not verify SHOULD be chosen only when the input text is certified, as defined by [Definitions for Character Normalization]. The verification of full normalization MUST be carried out as if by first verifying that the entity is in include-normalized form as defined by [Definitions for Character Normalization] and by then verifying that none of the relevant constructs listed above begins (after character references are expanded) with a composing character as defined by [Definitions for Character Normalization]. Non-validating processors MUST ignore possible denormalizations that would be caused by inclusion of external entities that they do not read. NOTE: If, while verifying full normalization, a processor encounters characters for which it cannot determine the normalization properties (i.e., characters introduced in a version of Unicode [Unicode] later than the one used in the implementation of the processor), then the processor MAY, at user option, ignore any possible denormalizations caused by these characters. The option to ignore those denormalizations SHOULD NOT be chosen by applications when reliability or security are critical. XML processors MUST NOT transform the input to be in fully normalized form. XML applications that create XML 1.1 output from either XML 1.1 or XML 1.0 input SHOULD ensure that the output is fully normalized; it is not necessary for internal processing forms to be fully normalized. The purpose of this section is to strongly encourage XML processors to ensure that the creators of XML documents have properly normalized them, so that XML applications can make tests such as identity comparisons of strings without having to worry about the different possible "spellings" of strings which Unicode allows. addWhen entities are in a non-Unicode encoding, if the processor transcodes them to Unicode, it SHOULD use a normalizing transcoder. |