Subject: RE: Tokenizing and transforming a CSV file
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 25 Feb 2009 16:53:28 -0000
|
I would use xsl:analyze-string rather than tokenize(), with a regex such as
(,"[^"]*")|(,[^,]*)
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Mukul Gandhi [mailto:gandhi.mukul@xxxxxxxxx]
> Sent: 25 February 2009 16:44
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Tokenizing and transforming a CSV file
>
> Hi all,
> I have a CSV file (named, test.csv) as following (as an
> example, two lines/records are shown below):
>
> hi,"this is a long string, please tokenize me",hello,world
> hello,please tokenize me,hi there
>
> I want this to be transformed to following XML:
>
> <result>
> <record>
> <field>hi</field>
> <field>this is a long string, please tokenize me</field>
> <field>hello</field>
> <field>world</field>
> </record>
> <record>
> <field>hello</field>
> <field>please tokenize me</field>
> <field>hi there</field>
> </record>
> </result>
>
> i.e, each line/record should be tokenized by a comma, with a
> restriction that a comma inside a double quoted string should
> not be considered as a delimiter:
>
> Below is my attempt upto now.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> version="2.0">
>
> <xsl:output method="xml" indent="yes" />
>
> <xsl:variable name="filedata" select="unparsed-text('test.csv')" />
>
> <xsl:template match="/">
> <result>
> <xsl:for-each select="tokenize($filedata, '\r?\n')">
> <record>
> <xsl:for-each select="tokenize(., ',')">
> <field>
> <xsl:value-of select="." />
> </field>
> </xsl:for-each>
> </record>
> </xsl:for-each>
> </result>
> </xsl:template>
>
> </xsl:stylesheet>
>
> The above stylesheet produces following output:
>
> <result>
> <record>
> <field>hi</field>
> <field>"this is a long string</field>
> <field> please tokenize me"</field>
> <field>hello</field>
> <field>world</field>
> </record>
> <record>
> <field>hello</field>
> <field>please tokenize me</field>
> <field>hi there</field>
> </record>
> </result>
>
> As per my requirement, following output fragment
>
> <field>"this is a long string</field>
> <field> please tokenize me"</field>
>
> is wrong.
>
> This should actually appear as:
>
> <field>this is a long string, please tokenize me</field>
>
> I would appreciate any help regarding this problem.
>
> I am using XSLT 2.0 with Saxon 9.x.
>
>
> --
> Regards,
> Mukul Gandhi
|