[Home] [By Thread] [By Date] [Recent Entries]

  • From: Hans-Juergen Rennau <hrennau@y...>
  • To: "C. M. Sperberg-McQueen" <cmsmcq@b...>
  • Date: Tue, 15 Nov 2022 15:09:27 +0000 (UTC)

Cordially agreed, Michael, and I just wanted to post the same remark, when I saw you already did it. Small supplement: Using the csv:doc function, you need not read the file content. And the separator is controlled by the 'separator' option. Thus:

csv:doc($uri, map{'separator':'tab', 'header':'yes'})

gives the parsed document. A huge advantage to have both pieces of cake - have the parsed document, and find oneself in XQuery, the master language for evaluating tree-structured information. For example, in order to get a frequency distribution of the "date" field:

declare variable $uri external;
csv:doc($uri, map{'separator':'tab', 'header':'yes'})
! (for $dateElem in //date 
   group by $date := $dateElem order by $date 
   return $date||' #'||count($dateElem))

=>

1988 #1
2019 #2


Kind regards,
Hans-Jürgen

Am Dienstag, 15. November 2022 um 15:54:16 MEZ hat C. M. Sperberg-McQueen <cmsmcq@b...> Folgendes geschrieben:



Hans-Juergen Rennau <hrennau@y...> writes:

> Roger, I would find it interesting to compare an awk solution with an
> XQuery one, also considering aspects like clarity and
> extensibility. Especially interesting as the potential of XQuery for
> tool building is by and large ignored.

Agreed!

> ...
>
> PS. Example of an XQuery-based solution:
>
> declare variable $uri external;
> declare variable $sep external := '&#x9;';
> <document>{
>    let $lines := unparsed-text-lines($uri)
>    let $names := $lines => head() => tokenize($sep)
>    for $line in tail($lines) return
>    <row>{
>        for $field at $pos in tokenize($line, $sep) return
>            element {$names[$pos]} {$field}
>    }</row>
> }</document>

This is good (and should work anywhere), but after spending a little
time on my own CSV parsing routines I realized that in BaseX, the
simplest thing to do is just to call

    csv:parse(unparsed-text($uri), map { 'header': 'yes'})

That is for comma-separated values; I think for tab-separated values one
would have to specify an additional option.

I don't have time to check, but I have a dim recollection that eXist
also has a function for reading CSV.

--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member