[Home] [By Thread] [By Date] [Recent Entries]

  • To: xml-dev@l...
  • Subject: Re: Re: If XML is too hard for a programmer, perhaps he'd be better off as a crossing guard
  • From: Sean McGrath <sean.mcgrath@p...>
  • Date: Fri, 28 Mar 2003 08:48:18 +0000
  • Cc: bill de hÓra <bill@d...>

[Bill de hÓra]
 >And I don't understand this disdain for regular expressions over XML.
 >Regexes are a perfectly useful tool for manipulating text.

Hi Bill,

I used regexp's myself - I'd say about 30% of the time when processing XML. 
It makes me nervous
though and I try not to do it in any mission critical context.

The trouble comes in having a degree of confidence in the correctness of 
the regexps.

For example, on the face of it using a regexp to catch occurences of:
         <name>Sean</name>
is simple. Not so for a many reasons. Writing regexps capable of getting 
this right
in the full generality of XML 1.0 is tantamount to writing a full xml 1.0 
WF parser.

The standard answer I get when I harp on about this is something
like "ah, but I know the XML I'm processing is machine generated and consistent
therefore...".

I always feel uneasy relying on the upstream XML supplier like this! It 
introduces a
degree of brittle coupling in systems that is best avoided if possible.

I can only see two routes to making XML regexping as safe as it is convenient:

1) Make a profile of XML 1.0 *syntax* that is regexp safe (permathread anyone?)

2) Use a post-parse syntax for regexp work like PYX notation

regards,
Sean


http://seanmcgrath.blogspot.com



Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member