- From: Tomos Hillman <yamahito@g...>
- To: "=?utf-8?Q?xml-dev=40lists.xml.org?=" <xml-dev@l...>, "=?utf-8?Q?Costello=2C_Roger_L.?=" <costello@m...>
- Date: Tue, 1 Oct 2019 09:27:41 +0100
You can do it in XSLT, but it's probably not the easiest way, at least not as a first step.
Because the snippet of HTML is not XML well-formed, you'd have to treat it as text and essentially write an HTML parser; some other possible approaches might be:
- Use something like HTML Tidy as a first step to convert to valid XHTML (I am sure many other such tools exist)
- Use the ReX parser generator to create an HTML parser in XSLT and process as text
- Use Stephen Pemberton's invisible XML parser
TMTOWTDI!
On 30 Sep 2019, 18:40 +0100, Costello, Roger L. <costello@m...>, wrote:
Hi Folks,
At the bottom of this message I show HTML that was produced by an Outlook email message (a “Hello, world” email message). The HTML has some interesting features. For example, it has a comment containing namespace-qualified elements and attributes:
<!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="" /> </o:shapelayout></xml><![endif]-->
The v namespace prefix is used only in that comment and nowhere else. I have tried a couple tools that convert HTML to XHTML and apparently they don’t look inside the comment because they remove the namespace declaration:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
I want to import the XHTML back into Outlook, but unfortunately after removing namespace declarations the XHTML is not valid as far as Outlook is concerned.
Is there a tool that can convert the HTML generated by Outlook to XHTML, such that the XHTML can be reimported into Outlook?
If no such tool exists, I will create my own tool. Would XSLT be suitable for such a task? /Roger
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 15 (filtered medium)">
<style><!-- /* Font Definitions */ @font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;} a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;} span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;} .MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;} @page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1
{page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="" /> </o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal>Hello, world<o:p></o:p></p></div></body></html>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
|