NOTE:
The presence of #xD in the above production is maintained purely for backward compatibility with the First Edition. As explained in [End-of-Line Handling], all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal.

delCharacters are classified for convenience as letters, digits, or other characters. A letter consists of an alphabetic or syllabic base character or an ideographic character. Full definitions of the specific characters in each class are given in (deleted) Appendix B.

A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

NOTE:
The Namespaces in XML Recommendation [xml-names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

An Nmtoken (name token) is any mixture of name characters.

addThe first character of a Name MUST be a NameStartChar, and any other characters MUST be NameChars; this mechanism is used to prevent names from beginning with European (ASCII) digits or with basic combining characters. Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names. See [Suggestions for XML Names] for suggestions on the creation of names.

addDocument authors are encouraged to use names which are meaningful words or combinations of words in natural languages, and to avoid symbolic or white space characters in names. Note that COLON, HYPHEN-MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted.

addThe ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references.

Names and Tokens

2.3	`NameStartChar`	::=	`":" \| [A-Z] \| "_" \| [a-z] \| [#xC0-#xD6] \| [#xD8-#xF6] \| [#xF8-#x2FF] \| [#x370-#x37D] \| [#x37F-#x1FFF] \| [#x200C-#x200D] \| [#x2070-#x218F] \| [#x2C00-#x2FEF] \| [#x3001-#xD7FF] \| [#xF900-#xFDCF] \| [#xFDF0-#xFFFD] \| [#x10000-#xEFFFF]`
2.3	`NameChar`	::=	`NameStartChar \| "-" \| "." \| [0-9] \| #xB7 \| [#x0300-#x036F] \| [#x203F-#x2040]`
2.3	`Name`	::=	`NameStartChar (NameChar)*`
2.3	`Names`	::=	`Name (#x20 Name)*`
2.3	`Nmtoken`	::=	`(NameChar)+`
2.3	`Nmtokens`	::=	`Nmtoken (#x20 Nmtoken)*`

NOTE:
The Names and Nmtokens productions are used to define the validity of tokenized attribute values after normalization (see [Attribute Types]).

Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup.

Literals

2.3	`EntityValue`	::=	`'"' ([^%&"] \| PEReference \| Reference)* '"'`
			`\| "'" ([^%&'] \| PEReference \| Reference)* "'"`
2.3	`AttValue`	::=	`'"' ([^<&"] \| Reference)* '"'`
			`\| "'" ([^<&'] \| Reference)* "'"`
2.3	`SystemLiteral`	::=	`('"' [^"]* '"') \| ("'" [^']* "'")`
2.3	`PubidLiteral`	::=	`'"' PubidChar* '"' \| "'" (PubidChar - "'")* "'"`
2.3	`PubidChar`	::=	`#x20 \| #xD \| #xA \| [a-zA-Z0-9] \| [-'()+,./:=?;!*#@$_%]`

NOTE:
Although the EntityValue production allows the definition of a general entity consisting of a single explicit < in the literal (e.g., <!ENTITY mylt "<">), it is strongly advised to avoid this practice since any reference to that entity will cause a well-formedness error.

[Next Chapter] [Home]

Table of contents

Appendices

2.3 Common Syntactic Constructs