[Home] [By Thread] [By Date] [Recent Entries]
You are right, it is a well known technique, Java JDK1.1 in fact uses very similar code for character classification. I replaced that with the simple 256 member array lookup (for characters in that range) and it sped up the parser ~10%. Istvan > ---------- > From: James Clark[SMTP:jjc@j...] > Reply To: James Clark > Sent: Thursday, September 04, 1997 10:04 PM > To: xml-dev@i... > Subject: Re: Character classification > > Istvan Cseri wrote: > > > > For better speed I would suggest an alternative solution: use a > quick > > array lookup for characters below 256 and go to the more expensive > > method above... It will do wonders with your parser. > > Except of course when you're parsing non-Latin scripts. > > There's another technique which exploits the fact that characters on > the > same page often have similar properties, and this is true even more so > for characters in the same column. > > The idea is to have a three-level table, the first level with 256 > entries, the second and third levels with 16 entries. The entries for > the first and second levels are a (possibly null) pointer to a > sub-table > plus a value; the entries for the third level are just values. To look > up the value for a character, you use the high 8 bits to index into > the > first-level table; if the pointer part of the entry is null, then > return > the value part of entry; otherwise use the sub-table table addressed > by > the pointer; use the next 4 bits to index into that in a similar way, > and, if necessary, the bottom 4 bits to index into the bottom table. > > This is I believe quite a well-known technique; I got it from Glenn > Adams. > > You can use this to implement case-folding by storing the difference > between a character and its upper-case equivalent modulo 2^16. > > There's a C++ implementation of this in SP in include/CharMap.h. > > James > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@i... the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@i...) > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|

Cart



