[Home] [By Thread] [By Date] [Recent Entries]

  • From: matt@w... (Matthew Fuchs)
  • To: digitome@i...
  • Date: Thu, 28 Aug 97 9:21:33 PDT

> 
> 
> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
> >>useful if it contains the *entire* document.
> >
> >Just as it's not useful in processing HTML. Regexps that don't match across
> >line boundaries are the most common problem I've seen in HTML-processing
> >Perl scripts. Looks like that will continue until people figure out that
> >Perl's line "Feature" is jsut a big when used with XML/HTML.
> >
> 
> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
> parse!
> 
Nonsense! Regexps that fail across line boundaries are only due to
lazy DPHs.  The "s" modifer to a regex will treat the entire string
(i.e., document) as a single target.  The problem here is that
_insignificant_ whitespace (a newline) is treated significantly.
A regex modifier which treated newline, tabs, etc., as spaces would
really help reduce this problem. (Larry Wall doesn't follow this
mailing list does he?)


> XML as a friendly format to, say, DPH needs some explaining. To use Perl to
> read/write XML 
> you *must* use an XML parser. Indeed any tool intending to read/write XML
> needs to use a 
> *fully blown parser* to get at the document. Bye bye the entire Unix family
> of line oriented text processing apps:-(
> 

Maybe you just need to put a filter at the beginning of your pipeline
to normalize whitespace to whatever you need.

Matthew

-----------------------------------------------------
Matthew Fuchs
matt@w...
http://cs.nyu.edu/phd_students/fuchs
-----------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member