[Home] [By Thread] [By Date] [Recent Entries]


Amelia A Lewis <amyzing@t...> wrote:

| Building a lexical API on top of a syntactic one is ... backwards. 

Yep.  SAX, e.g., is based on ESIS, which is a syntactic API spec.

| It is perfectly easy to imagine, for instance, LAX: the lexical API for
| XML.  This would have different sorts of events, though.  Perhaps it
| would have "leftPointyBracket()" and "nameCharacters(char [])" and
| "tagWhitespace(char [])" and "attributeValue(char, char [])".

Well, both SGML and XML have lexical specifications (e.g. the ISO 8879
productions http://www.oreilly.com/people/staff/crism/sgmldefs.html and
the productions in the XML spec document).  SGML actually defines things
in terms of an _abstract syntax_.  For instance, a starttag begins with a
STAGO and ends (usually) with a TAGC, in the meanwhile picking up stuff
like names, VI (value indicator), LIT, LITA and the like.  (The delimiters
are bound to  a _concrete syntax_ in the SGML declaration; that's how "<"
is STAGO, "=" is VI, ">", etc.  XML disallows variant concrete syntaxes,
instead fixing the syntax to the bindings of the _Reference Concrete
Syntax_.)  So, it's possible to associate categories with token "events"
and define an API at that level: tokenization only.

| I don't know if an in-memory API corresponding to such a ... lax parse
| (oh, re ... lax.  You knew that was coming, right?) is possible,
| though. 

A push API shouldn't be too difficult.  By in-memory do you mean some
analogue of DOM, where all the tokens are held in a structure of some sort
(like a parse tree)? 

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member