[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Michael Kay" <mike@s...>
  • To: "'Johannes Lichtenberger'" <Johannes.Lichtenberger@u...>
  • Date: Tue, 3 Feb 2009 15:57:11 -0000

Incidentally, you could also achieve the same effect with a one-line query
using the Saxon-SA streaming capabilities.

java com.saxonica.Query -qs:"saxon:stream(doc('in.xml')/xml/page)[1]"

should do the job. It will automatically stop reading the input when it has
found the data it needs. 

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Johannes Lichtenberger 
> [mailto:Johannes.Lichtenberger@u...] 
> Sent: 03 February 2009 15:49
> To: Michael Kay
> Cc: 'xml-dev'
> Subject: RE:  SAX - not well formed data
> 
> Am Dienstag, den 03.02.2009, 14:39 +0000 schrieb Michael Kay:
> > > I have a document like this:
> > > 
> > > <xml>
> > >   <page>
> > >     <rev>...</rev>
> > >     <rev>...</rev>
> > >   </page>
> > >   ... (some hundreds of pages)
> > >   <page>
> > >     <rev>...
> > > 
> > > so it's not well formed. 
> > 
> > It's not clear from that description why it isn't well-formed.
> 
> Well, I'm downloading and extracting a file with `curl 
> http://... | bzcat > test.xml`, but because it's very big, 
> and I maybe haven't got the time to analyse the whole data, 
> I'm extracting pages from the beginning, so I press CTRL+C 
> sometime afterwards. Maybe I could extract pages on-the-fly, 
> with something like `curl http://... | bzcat | java -jar 
> ExtractArticles but I'm not really familiar with Pipes and so 
> on :( Probably I would need XMLStreamReader instead of the 
> reader and buffer input or something like that, but I tried 
> it and failed...
> 
> > > I only want to be able to write out the first pages, but the SAX 
> > > Parser throws errors:
> > 
> > You should be able to abort the parse when you have read what you 
> > want, by throwing an exception from any of the callback 
> methods (e.g endElement()).
> > The parser will then exit back to your application with an 
> exception, 
> > which you can catch. You should check that this exception 
> is the one 
> > you were expecting, not some other unrelated error in your input.
> 
> Ok, that's possibly the best thing.
> 
> Thank you!
> 
> 
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member