[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: Approaches for validating database transformations using XML/XSLT tools?
From: "Jeffrey Greif jgreif@xxxxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 15 Mar 2026 23:13:35 -0000
Another approach:
Execute suitable SQL on each database writing a text file (such as CSV) for
each database such that the files are supposed to match exactly (or differ
according to some expected pattern) and run the Unix diff command with
suitable arguments.  If the files are not supposed to be identical, filter
out the expected differences in the pipeline preceding the call to diff.
Ideally, the diff output should be empty if everything matches.

A similar approach could be taken using the python unit tests to generate
and compare outputs.

The ability to filter out expected differences allows for whatever "process
the data" means in the description of the python code being tested.

On Sun, Mar 15, 2026 at 1:01b/PM BR Chrisman brchrisman@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> The database/client will not natively export XML?  If not, it might be
> simpler to use a database-conversion tool to import the database into an
> rdbms that will export XML... Going CSV -> XML from a table structure is a
> real pain.
>
> On Sun, Mar 15, 2026 at 11:49b/AM Roger L Costello costello@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi Folks,
>>
>> I have been given a very short-term task (40 hours total) to do some
>> testing and I wanted to get your input before starting.
>>
>> A person whom I will call Jane wrote Python code that extracts data from
>> tables in a Workday database, processes the data, and then inserts the
>> processed data into a user database called the *Semantic* database from
>> which users will make their queries. The structure of the Workday database
>> is considered too complicated for users to query directly. The Semantic
>> database is simpler (e.g., table names and column headers are more
>> user-friendly) and contains less data (users do not need all the details
>> that are in the Workday tables).
>>
>> My task is to test whether the Python codebs transformation is
*complete,
>> consistent, and accurate*.
>>
>> My plan is to leverage XML tools.
>>
>> I can use SQL to obtain the data that the Python code extracted from the
>> Workday database. The front-end application to Workday and Semantic allows
>> me to run SQL commands. The application can display the retrieved data in
a
>> raw form or export it as a *CSV file*.
>>
>> Here are two possible approaches that I could take:
>>
>> *1. Export to CSV and convert to XML via Excel*
>>
>> Get a CSV file from querying the Workday database. Do the same for the
>> Semantic database. Open each CSV file in Excel and then *File b Save As
>> XML*. Now I have XML. (Yes!) I can then create an XML Schema for the
>> Workday data and an XML Schema for the Semantic data. I can also create a
>> Schematron schema to check constraints between the two datasets. Validate,
>> validate, validate.
>>
>> The advantage of this approach is that I can get the data into XML
>> quickly.
>>
>> The disadvantage is that Excel, when generating XML, might silently
>> change some databfor example, dropping non-printable characters or
>> converting data from the character set used by Workday into UTF-8. Do you
>> know whether Excel performs such behind-the-scenes transformations? Such
>> transformations would be undesirablebI want full knowledge of what data
is
>> in the Workday database so I can see if any data is missing or altered in
>> the Semantic database.
>>
>> *2. Extract raw text and transform with XSLT*
>>
>> I could query the Workday database using the front-end application,
>> capture the raw data displayed by the application into a text file
(select,
>> copy, paste), and then use the XSLT unparsed-text() function to read the
>> data and generate XML (while carefully inspecting for non-printable
>> characters and character-set issues). I would do the same for the Semantic
>> database.
>>
>> The advantage of this approach is that I would have more control over how
>> the XML is generated and there would be no silent, behind-the-scenes
>> conversions.
>>
>> The disadvantage is that writing XSLT to carefully inspect the data and
>> generate XML will take time, and I do not have much time.
>>
>> Perhaps there are other approaches?
>>
>> What do you recommend?
>>
>> /Roger
>> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
>> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/965995> (by
>> email)
>>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/1067740> (by
> email <>)

Current Thread
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member