[Home] [By Thread] [By Date] [Recent Entries]

  • To: xml-dev@l...
  • Subject: regular expressions
  • From: David Tolpin <dvd@d...>
  • Date: Fri, 30 Jan 2004 00:02:46 +0400 (AMT)

Some schema languages use string regular expressions to check lexical space of 
attributes and character data. The regex strings often become uncomprehensible,
such as

(([a-zA-Z][0-9a-zA-Z+\-\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\.\-_!~*'()%]+)?(#[0-9a-zA-Z;/?:@&=+$\.\-_!~*'()%]+)?

for any URI. 

Providing a structured syntax, similar to that for XML, would help reading and debugging
them, for example,

    s-pattern="""
      comment = "\(([^\(\)\\]|\\.)*\)"
      atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
      atoms = atom "(\." atom ")*"
      person = "\"([^\"\\]|\\.)*\""
      location = "\[([^\[\]\\]|\\.)*\]"
      local-part = "(" atoms "|" person ")"
      domain = "(" atoms "|" location ")"
      start = "(" comment " )?" local-part "@" domain "( " comment ")?"
    """

instead of 

    pattern=
      "(\(([^\(\)\\]|\\.)*\) )?"
    ~ """([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|"([^"\\]|\\.)*")"""
    ~ "@" 
    ~ "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|\[([^\[\]\\]|\\.)*\])"
    ~ "( \(([^\(\)\\]|\\.)*\))?"

Why isn't it done?

David

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member