[Home] [By Thread] [By Date] [Recent Entries]
On Tue, Feb 28, 2012 at 3:55 AM, John Cowan <cowan@m...> wrote: > Rick Jelliffe scripsit: >> And it usually has no accents. (If it has accents, in particular macrons, >> it may not be standard Pinyin, which is not to say that it might not >> be an old or extended Pinyin.) > > Standard Hà nyÇ pÄ«nyÄ«n (æ±è¯æ¼é³) as used by the PRC, Singapore, > and ROC governments, and standardized as ISO 7098:1982, definitely does > have accents: one for each syllable (except for the toneless syllables), > as shown in this sentence. Yes, though as Wikipedia says "The tone-marking diacritics are commonly omitted in popular news stories and even in scholarly works. An unfortunate effect of this is the ambiguity that results as to which words are being represented." http://en.wikipedia.org/wiki/Pinyin I said "usually", though someone could count occurrences (in the press?) to resolve the issue better. >> Language codes are in flux: the three letter codes and the two letter >> codes have different approaches. > > Three-letter codes are never used for languages that have two-letter codes. > Chinese as a whole has the two-letter code "zh", whereas Mandarin proper > has the three-letter code "cmn". For backward compatibility, "zh-cmn" > also designates Mandarin. The official language of PRC is Mandarin. The official script is Simplified. zh-CN means the Chinese as used in PRC: non-official and regional languages require disambiguation, but I don't see why zh-CN does, apart from cheese-paring. http://www.w3.org/International/articles/language-tags/says "Avoid region, script or other subtags except where they add useful distinguishing information." Because Han Unification did not unify characters with different stroke counts (in the original source standards) IIRC, then the use of zh with -Hans may not actually provide much information useful to a renderer, but be more interesting for bibliographic categorization. http://www.w3.org/International/questions/qa-choosing-language-tags says "If your application identified Mandarin Chinese in the past using the language tag zh-CN (Chinese as used in Mainland China), or even just zh, you can continue to use zh in this way. Using cmn or cmn-CN may cause serious compatibility problems if the software or users expect a tag such as zh." >> Note that there is (or should be) no need to specify anything about >> the script if you are just marking up existing text. @xml:lang >> specifies the language, and the script only indirectly because a >> language+region often has a standard or characteristic orthography: >> the general script being used is obvious from the characters >> themselves. > > You're out of date here. xml:lang definitely can specify script, though > it is not required to. Yes I often am out-of-date! s/specifies/typically specifies/ But the correct information needed in a language attribute depends on the intent of the markup. The region should not be dismissed as the primary information of interest in marking up Chinese language, even now that the borders are more open and computing is being done by non-Mandarin speakers. >> So you could use xml:lang="zh-CN" for all the three cases you >> mention. If you wanted to give more of a hint, you could try >> xml:lang="zh-CN-pinyin" or "zh-Latn-CN-pinyin" for the standard >> pinyin, and xml:lang="zh-CN-pinyin-adhoc" or "zh-Latn-CN-adhoc" for >> the non-standard one (where "adhoc" is some phrase you pick to >> indicate an extended pinyin or mystery format.) The problem I heard about tagging languages as pinyin (with no language code), is that proper names (people, places) are often transcribed phonetically from the speaker's local language, rather than being read as Mandarin. Consequently, the further your text moves from being straight Mandarin as written say by a Beijinger, then the less that zh, zh-CN, and zh-Hans will be satisfactory. http://www.greentranslations.com/article/spelling-orthography-of-pinyin "Since 1976, place names throughout China have been transliterated into Pinyin so that they can be pronounced by local non-Mandarin speakers. Thus, in Mongolia and Tibet, for example, Pinyin is the system employed for spelling localities in phonetic form." So the first thing to do is to determine whether the text has region-specific features and, if it does, make sure that that the language code reflects those: SG, TW, HK, etc. The more that it is official Beijing-style Mandarin with no phonetic proper names from other languages or dialects, then the more that plain zh, zh-CN, zh-Hans and zh-pinyin would be adequate, is my understanding. Cheers Rick
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



