[Home] [By Thread] [By Date] [Recent Entries]

  • From: Michael Kay <mike@s...>
  • To: xml-dev@l...
  • Date: Wed, 26 Jan 2011 17:47:13 +0000

On 26/01/2011 15:56, Costello, Roger L. wrote:
> Hi Folks,
>
> It is my understanding that there are 3 flavors of regular expression parsers [1]:
>
> 1. Nondeterministic Finite Automaton (NFA)
>
> 2. Deterministic Finite Automaton (DFA)
>
> 3. Backtracking
>
> Which flavor of regular expression parser does SAXON use?
>

Saxon uses the DFA algorithm described in

http://www.ltg.ed.ac.uk/~ht/XML_Europe_2003.html

modified by a system of counters to handle minOccurs/maxOccurs 
constraints, which is inspired by subsequent work by Thompson and Tobin:

http://www.cogsci.ed.ac.uk/~ht/XTech_2006_paper.pdf

but does not follow it slavishly.

I'm not sure about your three categories, by the way. I think that if 
you use an NFA then you need some kind of backtracking (either that or 
you investigate multiple forwards paths in parallel, which amounts to 
the same thing.)

Michael Kay
Saxonica


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member