[Home] [By Thread] [By Date] [Recent Entries]
Hey Gary,
At 09:00 AM 3/14/2006, you wrote: I'm currently outputting as XML and it should only be the last stage in the chain that outputs as XHTML. The issue seems to be that the input includes declared entities that nothing on the later part of the chain understands. Therefore I want the unicode entity instead so   rather than for example. If the input includes declared entities, where are the declarations? It's true that HTML in the wild often leaves these declarations out. Nonetheless, if the input is to be parsed as XML, these declarations must be available. It could be that your input isn't even syntactically correct, well-formed XML (which is a way of saying things redundantly over again, as it's not XML until it follows the rules), in which case you need to start asking the questions Andrew has posed, and considering tools to fix the syntax. (It's no fun by hand.) On the other hand, maybe it's only the entity declarations that are missing, in which case providing your input with a DTD or DTD fragment that contains those declarations will be sufficient. If the entity declaration is available, the document can be parsed and presented to the XSLT engine for transformation. If it can't, there's nothing XSLT can do to help. (Accordingly, it's not an XSLT question, but a basic XML question: you'd have this problem even if you weren't using XSLT.) Should the stylesheet automatically do this? Is there some way I can force a text() catch in the template to convert the characters for me? Nope. An analogy: that's like putting the cake in the oven before the batter is mixed. You can't expect to put flour, eggs, sugar etc. straight into the oven and get "cake". Fortunately, with XSLT you won't get a mess of baked flour and eggs and melted sugar -- but you will get the error message you're seeing. Hint: the particular declaration you're looking for looks like: <!ENTITY nbsp " "> <!-- no-break space = non-breaking space,
U+00A0 ISOnum -->In the XHTML DTD, it's to be found in the xhtml-lat1.ent file. But if you put the DOCTYPE declaration at the top of your input <!DOCTYPE html [ <!ENTITY nbsp " "> ]> -- and if everything else is good (all other entities are declared, syntax is correct) -- you'll be okay. (This is for testing. If you have more than one input document you'll want to call the DTD in through an external identifier, either SYSTEM or PUBLIC depending on your parser and environment.) Note that how these characters are expressed in the *output* is not addressed here. You can figure that out once you've got your files parsing. Oh and since I forgot to mention I'm using Saxon 8 and XSLT 2.0. That's good; it gives you a number of ways of controlling how those characters appear in the output. But you have to get them in first. Cheers, Wendell ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
|

Cart



