[Home] [By Thread] [By Date] [Recent Entries]

  • From: Ihe Onwuka <ihe.onwuka@g...>
  • To: "C. M. Sperberg-McQueen" <cmsmcq@b...>
  • Date: Thu, 17 Nov 2022 00:12:48 -0500



On Wed, Nov 16, 2022 at 7:41 PM C. M. Sperberg-McQueen <cmsmcq@b...> wrote:

Roger L Costello <costello@m...> writes:

> Michael Kay wrote:
>
>> the "Barnes & Noble" problem. The number #1 blunder
>> when writing XML is not to bother escaping `<` and `&`
>> if they happen to occur in your input.
>
> Ouch!
>
> You are right Michael.
>
> Upon reflection, I realized that there is an even nastier problem
> lurking than the problem of converting & and < in the input record
> data into &amp; and &lt; in the output XML.
>
> ...
>
> To implement the character conversions in AWK would be a monumental task.
>
> Eeeeeeek!
>
> Lesson Learned: Don't use AWK to convert records to XML.

Well, you may be right, and I believe many on this list share my
preference for performing such conversions in XSLT and/or XQuery, but I
have to say that the lesson you suggest seems a slightly broader
conclusion than is warranted by the experience you describe.


Agreed.
 
A couple points of detail:

  - Your downstream tools are likely to be somewhat happier if you
    convert the data to UTF-8 or UTF-16, but unless I am mistaken you
    are not in fact required to do so, in order to turn the data into
    XML.  XML does allow encoding declarations.

  - If you do want to convert the encoding it would surprise me a bit if
    awk had no constructs suitable for the work.  It would surprise me
    even more if a system with awk did not have the iconv utility for   
    converting textual data from one encoding to another.

        iconv --from-code=WINDOWS-1252 --to-code=UTF-8 < myinput > output.utf8


awk does not need such a construct. iconv and awk are part of a unix like shell ecosystem so iconv can pre or post process an awk conversion using normal shell scripting piping


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member