Table of contentsAppendices |
2.3 Common Syntactic ConstructsCommon Syntactic ConstructsThis section defines some symbols used widely in the grammar. S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space
NOTE: delCharacters are classified for convenience as letters, digits, or other characters. A letter consists of an alphabetic or syllabic base character or an ideographic character. Full definitions of the specific characters in each class are given in (deleted) Appendix B.
A Name is a token beginning
with a letter or one of a few punctuation characters, and continuing with
letters, digits, hyphens, underscores, colons, or full stops, together known
as name characters. Names beginning with the string " NOTE: An Nmtoken (name token) is any mixture of name characters. addThe first character of a Name MUST be a NameStartChar, and any other characters MUST be NameChars; this mechanism is used to prevent names from beginning with European (ASCII) digits or with basic combining characters. Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names. See [Suggestions for XML Names] for suggestions on the creation of names. addDocument authors are encouraged to use names which are meaningful words or combinations of words in natural languages, and to avoid symbolic or white space characters in names. Note that COLON, HYPHEN-MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted. addThe ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references. Names and Tokens
NOTE: Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup. Literals
NOTE: |