[Home] [By Thread] [By Date] [Recent Entries]


From: "Richard Tobin" <richard@c...>
> > * else look for EBCDIC/ASCII signature (use string
"[^a-zA-Z01-9]{1-4}xtext\b"
> >    rather than "<?xml\b"
>
> For XML, it's only necessary to look at the first four bytes to cover
> Unicode encodings, ascii supersets and ebcdic.  In the xtext case, you
> will have to compare a string at several different positions or apply
> a regular expression.  Certainly doable, but certainly more complex too!

1. Based on the zero patterns in the first four octets, fill a 10-octet
array with candidate characters. (This may require reading up to 40 octets.)
2. Scan zero-based array locations 3 through 6 inclusive for 'e' in both
ASCII and EBCDIC. If neither is found, not an xtext. If EBCDIC is found, and
first four octets contained a zero byte, not an xtext. Otherwise, using the
appropriate charset verify the 'e' appears in the correct context.

Bob


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member