[Home] [By Thread] [By Date] [Recent Entries]
Elliotte Rusty Harold wrote: > At 1:11 PM +0200 6/5/04, Bjoern Hoehrmann wrote: [a given XPath being useful to extract a certain thing from a certain URL's response to GET] >> How do you know? > > View source. That's the thing. Viewing the source once (or, indeed, N times) and seeing a pattern (today's stories being at //html:today) and assuming it will work in future is, indeed, a rather informal kind of schema. At least, by my definition of schema as learnt from the database world, which is something like "a convention on how a given abstract piece of information is represented" - in this case, I'm not talking about schema in the sense perhaps more normally found in XML, as a "validity constraint". As well as that XPath, there's probably more to the informal schema being used here - unless the software that uses that XPath to extract today's news is a totally generic XSLT/CSS/etc supporting XML browser, then there's probably also an assumption that it's in XHTML, and that it's human-readable text in some human language as well (perhaps even an assumption of it being a specific dialect of English). Information about the structure of a site gleaned from viewing the source may be subject to random change; if the site published a schema (be it a formal machine-readable schema or a paragraph of text like above), they would then have the opportunity to also state how far users can rely on that not changing in future. They may lie, of course, but people will have more cause to complain if they "said" they wouldn't change it; so when some software that relies on it breaks, the author of the software can say "Hey! The news site broke its promise" rather than "Uh, I made an assumption that no longer holds"... ABS
|

Cart



