[Home] [By Thread] [By Date] [Recent Entries]

Subject: [Part 2] XML Design for Data Science Analysis
From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 10 May 2026 18:40:41 -0000
A big simplification in scaling comes from designing the XML so that the data
already represents a feature vector:
<Workplace>
    <Feature name="Age">
        <Value person="A">36</Value>
        <Value person="B">37</Value>
        <Value person="C">22</Value>
    </Feature>

    <Feature name="Kids">
        <Value person="A">3</Value>
        <Value person="B">2</Value>
        <Value person="C">0</Value>
    </Feature>

    <Feature name="Income" currency="USD">
        <Value person="A">100000</Value>
        <Value person="B">80000</Value>
        <Value person="C">101000</Value>
    </Feature>
</Workplace>
Now the scaling logic can be generic. It works for any feature because every
feature has the same structure:
$feature/Value
The generic transformation can be written in XPath as follows:
for $feature in /Workplace/Feature
return
  for $value in $feature/Value
  return
    (xs:decimal($value) - min($feature/Value ! xs:decimal(.)))
    div
    (max($feature/Value ! xs:decimal(.)) -
     min($feature/Value ! xs:decimal(.)))
That is a significant improvement.
You no longer need separate expressions for:
/Workplace/Staff/Age
/Workplace/Staff/Kids
/Workplace/Staff/Income
Instead, every variable has the same shape:
Feature/Value
So the algorithm becomes:
for each feature:
    scale all of its values
That is much closer to how data science thinks:
for each column:
    scale the values in that column
So the best XML design for this kind of analysis is not merely
"column-oriented."
It is regular, metadata-driven, feature-oriented XML.
The key design principle is:
Make every feature look structurally identical.
That is what makes the scaling logic dramatically simpler and more reusable.
/Roger

Current Thread
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member