[Home] [By Thread] [By Date] [Recent Entries]

  • To: XML Developers List <xml-dev@l...>
  • Subject: converting character entities to us-ascii /equivalents/
  • From: Robert Koberg <rob@k...>
  • Date: Wed, 06 Oct 2004 14:55:58 -0700
  • User-agent: Mozilla Thunderbird 0.7 (Macintosh/20040616)

Hi,

I need to output several versions of a page (through XSL 
transformations), one of which is us-ascii (for email). But, the content 
might contain some characters that are not supported by us-ascii (like 
em dash - &#151;).

I want the character entities to remain in the content. When 
transforming to us-ascii, I want to replace the entities with some ascii 
text equivalent: For example, '&#151;' would get converted to '--'.

The XML is pulled into the transformation through the document function 
using a custom URIResolver.

Is there an existing solution to this?

Does Apache's FOP and the text renderer handle this type of thing?

I have tried to set a ContentHandler (actually a DefaultHandler) on the 
XMLReader and tried to replace a character entity, but I am doing 
something wrong and a confused on how to proceed. Using the code below I 
get a recoverable error using saxon/aelfred and a failure when using 
saxon/xerces.

Here is a snippet from the URIResolver:


InputSource in = new InputSource(file.getAbsolutePath());
SAXSource source = new SAXSource(in);
XMLReader reader = null;
try {
   reader = 
XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
   //reader = 
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
} catch (SAXException e) {
   System.err.println(e.getMessage());
}

reader.setContentHandler(new AsciiHandler());

source.setXMLReader(reader);

return source;



And the DefaultHandler has one method:


public void characters(char[] text, int start, int length) {

   String str = new String(text, start, length);
   if (str.indexOf(174) > -1) {
    str.replaceAll("\u00AE", "(Registered Trademark)");
   }
   text = str.toCharArray();
}

How can I do this? Is there a better way to handle this type of thing?

thanks,
-Rob


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member