[Home] [By Thread] [By Date] [Recent Entries]

  • To: xml-dev mailing list <xml-dev@l...>
  • Subject: genx - string termination and bounding
  • From: Tim Bray <tbray@t...>
  • Date: Wed, 21 Jan 2004 19:39:50 -0800

I'm really sympathetic to the calls for counted rather than 
null-terminated strings, if only for the genxText() call.

So I was thinking about this back before we nuked the codePoint * 
versions, and I realized that a "length" argument could be confusing 
because in the codePoint* version it would naturally be the number of 
characters, while in the utf8Byte * version it would naturally be the 
number of bytes.  Blecch.  So I thought it would be more natural to 
have something like

  genxText(genxWriter w, utf8Byte * start, utf8Byte * end)

i.e. a pointer to the end of the string, which would have the same 
semantics in both versions of the call.  Well, we're losing the 
codePoint * stuff (good riddance) but I'd kind of like to stay with the 
stop-here argument rather than the byte (or character) count argument.

Of course if you want to null-terminate, you can, just do

  genxText(w, buf, NULL)

Two questions:
- if you have a zero byte in the string before you get to the end mark, 
should it just stop, or throw an error?  The first is more consistent 
with C culture (cf strncpy) but the latter a bit more stringent.  
Moderately leaning to just stopping.
- if the stop marker is stupidly in the middle of a UTF-8 character, 
genx should detect this and declare an error.  The existence of this 
situation is the only good argument for a count rather than a stopper.  
But not quite good enough.  -Tim


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member