Subject: Re: xslt replace special characters
From: Mike Brown <mike@xxxxxxxx>
Date: Mon, 11 Nov 2002 13:38:52 -0700 (MST)
|
Alice Fan wrote:
> Thanks Greg. Right in the UI, we want the user to enter their URL. Their
> URL will most likely have name/value pairs. Is there an easier way? There
> is no otherway of filtering '&' before it gets processed in the XSL?
It doesn't matter if they're entering a URL/URI or not. Any text that you
intend to put into an XML document needs to be screened, to preserve
well-formedness / parseability.
1. Always note the following:
- non-XML characters need to be removed or replaced
(U+0000..U+0008, U+000B, U+000C, U+000E..U+001F, U+D800..U+DFFF,
U+FFFE..U+FFFF)
- a string is not a URI if it violates URI syntax, so if the text is
destined for a URI-pseudotype attribute value (like href or src in
HTML/XHTML), characters above U+007F should be escaped by writing
their equivalent UTF-8 bytes as '%xx' for each byte, where xx is the
hex notation for the byte (though this isn't strictly necessary; a
conforming HTML user agent will do this automatically)
- additional translation of ASCII-range characters (U+0000..U+007F) in
text destined for URI attributes is not required but is wise, to
ensure conformance to URI syntax; %-escape everything except
a-z, A-Z, 0-9, and these: - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , [ ]
2. If and when the XML document exists in serialized form
(i.e., as a string, not as a DOM object), note the following:
- if the text is not destined for a CDATA section, markup characters '&'
and '<' need to be escaped
- if the text is destined for a CDATA section, the '>' in ']]>'
needs to be escaped
- if the text is destined for a comment, it must not contain '--'
(how you handle such an offense is up to you)
- if the text is destined for an attribute value delimited by apostrophes,
then apostrophes in the value must be escaped (usually use ' unless
in HTML)
- if the text is destined for an attribute value delimited by quotes,
then quotes in the value must be escaped (usually use ")
- if the text is destined for a non-URI attribute value, then tab, LF,
and CR need to be escaped to facilitate round-tripping
I probably missed one or two cases, but as you can see, you can't just slap
any old text into a document and call it XML...
- Mike
____________________________________________________________________________
mike j. brown | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
| Current Thread |
- Re: xslt replace special characters, (continued)
- Greg Faron - Mon, 11 Nov 2002 13:44:51 -0500 (EST)
- Alice Fan - Mon, 11 Nov 2002 14:32:37 -0500 (EST)
- Greg Faron - Mon, 11 Nov 2002 14:57:28 -0500 (EST)
- Mike Brown - Mon, 11 Nov 2002 15:34:30 -0500 (EST) <=
- Alice Fan - Mon, 11 Nov 2002 14:53:15 -0500 (EST)
- Passin, Tom - Mon, 11 Nov 2002 15:10:14 -0500 (EST)
- Alice Fan - Mon, 11 Nov 2002 15:46:06 -0500 (EST)
- Alice Fan - Mon, 11 Nov 2002 15:49:31 -0500 (EST)
|
|