[Home] [By Thread] [By Date] [Recent Entries]
Rick Jelliffe scripsit:
> Are you sure you have the right terms here? Pinyin is not pidgen.
True.
> And it usually has no accents. (If it has accents, in particular macrons,
> it may not be standard Pinyin, which is not to say that it might not
> be an old or extended Pinyin.)
Standard Hà nyÇ pÄ«nyÄ«n (æ±è¯æ¼é³) as used by the PRC, Singapore,
and ROC governments, and standardized as ISO 7098:1982, definitely does
have accents: one for each syllable (except for the toneless syllables),
as shown in this sentence.
> Language codes are in flux: the three letter codes and the two letter
> codes have different approaches.
Three-letter codes are never used for languages that have two-letter codes.
Chinese as a whole has the two-letter code "zh", whereas Mandarin proper
has the three-letter code "cmn". For backward compatibility, "zh-cmn"
also designates Mandarin.
> So first you need to determine the
> region: is your simplified text from PRC or Singapore?
>
> Assuming it is from PRC, then the language code zh-CN should be
> enough AFAIK.
There are texts from the PRC in traditional characters. "Zh-Hans" is
the modern standard form for simplified-character texts whether from the
PRC or Singapore or elsewhere. "Zh-CN" usually means the same thing,
but it is a backward compatibility hack.
> Note that there is (or should be) no need to specify anything about
> the script if you are just marking up existing text. @xml:lang
> specifies the language, and the script only indirectly because a
> language+region often has a standard or characteristic orthography:
> the general script being used is obvious from the characters
> themselves.
You're out of date here. xml:lang definitely can specify script, though
it is not required to.
> So you could use xml:lang="zh-CN" for all the three cases you
> mention. If you wanted to give more of a hint, you could try
> xml:lang="zh-CN-pinyin" or "zh-Latn-CN-pinyin" for the standard
> pinyin, and xml:lang="zh-CN-pinyin-adhoc" or "zh-Latn-CN-adhoc" for
> the non-standard one (where "adhoc" is some phrase you pick to
> indicate an extended pinyin or mystery format.)
I assumed that the OP wanted to have a distinct tag for each case.
If you are going to use something ad hoc, it must take the form "x-adhoc".
--
Even a refrigerator can conform to the XML John Cowan
Infoset, as long as it has a door sticker cowan@c...
saying "No information items inside". http://www.ccil.org/~cowan
--Eve Maler
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



