[Home] [By Thread] [By Date] [Recent Entries]

  • From: Tomos Hillman <yamahito@g...>
  • To: "=?utf-8?Q?xml-dev=40lists.xml.org?=" <xml-dev@l...>, "=?utf-8?Q?Costello=2C_Roger_L.?=" <costello@m...>
  • Date: Tue, 1 Oct 2019 09:27:41 +0100

You can do it in XSLT, but it's probably not the easiest way, at least not as a first step.

Because the snippet of HTML is not XML well-formed, you'd have to treat it as text and essentially write an HTML parser; some other possible approaches might be:
  • Use something like HTML Tidy as a first step to convert to valid XHTML (I am sure many other such tools exist)
  • Use the ReX parser generator to create an HTML parser in XSLT and process as text
  • Use Stephen Pemberton's invisible XML parser
TMTOWTDI!

Thanks,
Tom
On 30 Sep 2019, 18:40 +0100, Costello, Roger L. <costello@m...>, wrote:

Hi Folks,

 

At the bottom of this message I show HTML that was produced by an Outlook email message (a “Hello, world” email message). The HTML has some interesting features. For example, it has a comment containing namespace-qualified elements and attributes:

 

<!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="" />
</o:shapelayout></xml><![endif]-->

 

The v namespace prefix is used only in that comment and nowhere else. I have tried a couple tools that convert HTML to XHTML and apparently they don’t look inside the comment because they remove the namespace declaration:

 

<html xmlns:v="urn:schemas-microsoft-com:vml"
            xmlns:o="urn:schemas-microsoft-com:office:office"
            xmlns:w="urn:schemas-microsoft-com:office:word"
            xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
            xmlns="http://www.w3.org/TR/REC-html40">

 

I want to import the XHTML back into Outlook, but unfortunately after removing namespace declarations the XHTML is not valid as far as Outlook is concerned.

 

Is there a tool that can convert the HTML generated by Outlook to XHTML, such that the XHTML can be reimported into Outlook?

 

If no such tool exists,  I will create my own tool. Would XSLT be suitable for such a task?  /Roger

 

<html xmlns:v="urn:schemas-microsoft-com:vml"
            xmlns:o="urn:schemas-microsoft-com:office:office"
            xmlns:w="urn:schemas-microsoft-com:office:word"
            xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
            xmlns="http://www.w3.org/TR/REC-html40">
           
<head>
                       
<META HTTP-EQUIV="Content-Type"
                                    CONTENT="text/html; charset=us-ascii">
                                   
<meta name=Generator content="Microsoft Word 15 (filtered medium)">
                                               
<style><!--
/* Font Definitions */
@font-face
           
{font-family:"Cambria Math";
           
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
           
{font-family:Calibri;
           
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
           
{margin:0in;
           
margin-bottom:.0001pt;
           
font-size:11.0pt;
           
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
           
{mso-style-priority:99;
           
color:#0563C1;
           
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
           
{mso-style-priority:99;
           
color:#954F72;
           
text-decoration:underline;}
span.EmailStyle17
           
{mso-style-type:personal-compose;
           
font-family:"Calibri",sans-serif;
           
color:windowtext;}
.MsoChpDefault
           
{mso-style-type:export-only;
           
font-family:"Calibri",sans-serif;}
@page WordSection1
           
{size:8.5in 11.0in;
           
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
           
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal>Hello, world<o:p></o:p></p></div></body></html>

 

 

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member