Re: It's too late to improve XML ... lessons learned?

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: Tony Graham <tgraham@a...>
To: xml-dev@l...
Date: Fri, 14 Jan 2022 14:01:01 +0000

On 14/01/2022 02:44, Rick Jelliffe wrote:
...

I am interested to know what gotchas people have found in real deployments, in the last 20 years, with XML with non-ASCII data and markup. And also, whether modern Unicode is actually good enough now
for professional, quality publishing in CJK and other national or
major scripts.

It should go without saying that you need a good formatter to produce a
good result...

Unicode can represent the characters, of course, but formatting also
requires knowledge of things like line breaking, hyphenation, and when
to form contextual glyphs.

The Unicode Bidirectional Algorithm [3] makes it easier to consistently
handle text in different directions.  That, of course, is based on
properties assigned to each Unicode character, not just on their glyph
or language.  A lot of software uses International Components for
Unicode (ICU) [4] libraries in either C/C++ or Java to handle a lot of
what Unicode defines.

The details for how to format a script can be hard to come by.  Users of
a particular script can give good hints when they find a problem with
what you are currently producing.

The W3C did a ground-breaking effort when they produced Requirements for
Japanese Layout (JLReq) [5] that made details of Japanese formatting
accessible to the rest of us. The next JLReq version looks set to be a
'digital native' version with some things simplified and some things
promoted to being advanced options. (A bit like how MathML3 is becoming
MathML Core [7] for the subset that browsers consent to implement; David
Carlisle can correct me if this is a misrepresentation based on my view
from the outside.)

The JLReq concept has been copied/expanded to make a bunch of task
forces for different languages [8], plus there's other ways to
crowd-source information. [10][11] (Back in 2012 [9], it looked to me
like Community Groups would be the way to do this, but the W3C, like
most of us, experienced the gravitation pull of GitHub.)

For example, are PUA characters used much in XML?, or is Unihan plus markup good enough, or do people need to embed actual glyph information? How are new ideographs handled when you cannot wait for the Unicode Consortium process? Is the situation different with JSON?

I don't see non-standard characters in what I do.

Modern font formats, such as OpenType, have largely/completely removed
the need to think about glyph variants encoded in the Private Use area
in fonts. OpenType defines a lot of font features [1] that are encoded
in the font file as lookup tables, and the formatter and the font can
work together to turn text into ligatures or replacement glyphs. Many
people are familiar with there being a glyph for 'ffi', which is U+FB03,
but a language that uses a different 'i', say 'ï', may also benefit from
an 'ffï' ligature, which isn't in Unicode. An OpenType font could put
an 'ffï' ligature in the Private Use area or it could be in the font as
an unencoded glyph, but you wouldn't need to know which because the
ligature lookup would get you the right glyph if it exists.

CSS gave more friendly names to OpenType's four-letter acronyms that
you can use with CSS or XSL-FO [2], or you can access the font features
directly. [12][13]

Regards,

Tony Graham.
--
Senior Architect
XML Division
Antenna House, Inc.
----
Skerries, Ireland
tgraham@a...

[1] https://docs.microsoft.com/en-ie/typography/opentype/spec/features_ae
[2] https://www.antenna.co.jp/AHF/help/en/ahf-ext.html#axf.font-variant
[3] http://www.unicode.org/reports/tr9/
[4] https://icu.unicode.org/
[5] https://www.w3.org/TR/jlreq/
[6] https://github.com/w3c/jlreq/issues/281
[7] https://www.w3.org/TR/mathml-core/
[8] https://www.w3.org/International/i18n-drafts/nav/languagedev
[9] See page 24 in http://mentea.net/resources/multilingualweb2012.pdf
[10] https://w3c.github.io/type-samples/
[11] https://www.w3.org/International/i18n-activity/textlayout/
[12] https://www.antenna.co.jp/AHF/help/en/ahf-ext.html#axf.font-feature-settings
[13] https://caniuse.com/font-feature

References:
- Re: It's too late to improve XML ... lessons learned?
  - From: Stephen D Green <stephengreenubl@g...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Rick Jelliffe <rjelliffe@a...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Stephen D Green <stephengreenubl@g...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Stephen D Green <stephengreenubl@g...>
- Re: It's too late to improve XML ... lessons learned?
  - From: John Cowan <johnwcowan@g...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Tim Bray <tbray@t...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Stephen D Green <stephengreenubl@g...>
- Re: It's too late to improve XML ... lessons learned?
  - From: ht@m... (Henry S. Thompson)
- Re: Re: It's too late to improve XML ... lessons learned?
  - From: MURATA <eb2mmrt@g...>
- Re: Re: It's too late to improve XML ... lessons learned?
  - From: Tim Bray <tbray@t...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Michael Kay <mike@s...>
- Re: It's too late to improve XML ... lessons learned?
  - From: Rick Jelliffe <rjelliffe@a...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >