[Home] [By Thread] [By Date] [Recent Entries]


David Tolpin wrote:

 >>>    s-pattern="""
 >>>      comment = "\(([^\(\)\\]|\\.)*\)"
 >>>      atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
 >>>      atoms = atom "(\." atom ")*"
 >>>      [...]
 >>>
 >>>Why isn't it done?
 >>
 >>
 >>HyLex used a similar syntax for regular expressions.
 >>I've always wondered why the idea never caught on elsewhere.
 >>(Then again, none of the ideas from HyTime ever really
 >>caught on...)
 >
 >
 > In fact, I've implemented it in an extension datatype library for my 
Relax
 > NG validator; it is only 70 lines of code in Scheme, after all. Proved
 > to be very useful for debugging.

Very clever. But a naive implementation would just recursively 
concatenate the strings to make a single regex strings. Could you 
elaborate on the debugging advantage, i.e., how it makes it easier for a 
schema writer to debug regular expressions?

Jeni Tennison used the same idea with a slightly different syntax in her 
DTLL proposal (I've lost the URL). Her idea had the added twist that an 
application could receive the results of the regular expression parse as 
a structured result, e.g., through a SAX API. Thus, using your example, 
the string "(David Tolpen)David.Tolpin@n..." might produce the 
'infoset':

<start>
   <comment>(David Tolpen)</comment>
   <local-part>
     <atoms>
       <atom>David</atom>.<atom>Tolpin</atom>
     </atoms>
    </local-part>@<domain>
     <atoms>
       <atom>nospam</atom>.<atom>net</atom>
     </atoms>
    </domain>
</start>

This still seems a fruitful avenue to explore.

Bob Foster
http://xmlbuddy.com/


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member