fn:elements-to-maps: Robustness #1646

ChristianGruen · 2024-12-06T22:49:38Z

Copied from #1592 (comment) and #1592 (comment):

[USER2] More user feeback:

It’s confusing that the following function calls lead to completely different outputs:

elements-to-maps(
  <person>
    <name>Akila</name>
    <age>34</age>
  </person>
)

{"person":{"name":"Akila","age":"34"}}

elements-to-maps(
  <person>
    <name>Akila</name>
    <name>Jaha</name>
    <age>34</age>
  </person>
)

{"person":[{"name":"Akila"},{"name":"Jaha"},{"age":"34"}]}

The initial feedback I gathered so far is that the function works fine if the input is regular and uniform, but as soon as there are slight deviations, it can get wild. Here are some plain examples how a small change to the input results in fairly different output:

<xml>
  <info>X</info>
  <address>A</address><address>B</address>
</xml>
→ { "xml": ["A", "B"] }

<xml>
  <info>X</info>
  <address>A</address>
  <address>B</address>
</xml>
→ { "xml": [{ "info": "X" }, { "address": "A" }, { "address": "B" }] }

<xml id='id0'>
  <address>A</address>
  <address>B</address>
</xml>
→ { "xml": { "@id": "id0", "address": ["A", "B"] } }

Possible solutions:

Enable uniform by default (performance considerations should not outweigh usability concerns)
Change the rules for record from all-different(*!node-name()) to not(all-equal(*!node-name()))
Editorial changes: Stress in the introduction that robustness is a secondary requirement.

The text was updated successfully, but these errors were encountered:

michaelhkay · 2024-12-07T01:11:50Z

All XML-to-JSON converters suffer from this problem, and we have put in many features to minimise its impact (unlike the many popular one-shot converters, where the user simply has to live with it).

I'm open to other opinions on the default for "uniform"; it's a trade-off.

ChristianGruen · 2024-12-07T11:01:44Z

In some way, the new function differs a lot from the other functions we have: Its output depends a lot on specifics of the input. This makes it hard to guess what you will get, and this (I believe) is exactly what will be confusing in practice, and even dangerous in productive environments if you do not analyse your output very carefully.

Imagine we have a great number of homogeneous elements, with one exception:

<xml>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><info/></_>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><a>Y</a></_>
</xml>

The result is certainly not what you would expect:

{
  "xml": [
    ["X", "Y"],
    ["X", "Y"],
    ["X", "Y"],
    { "a": "X", "info": "" },
    ["X", "Y"],
    ["X", "Y"]
  ]
}

My experience is that a lot of real XML data is 90% homogeneous, but not 100%. It is exactly the advantage of XML to be more flexible than tables and, for example, to be able to omit empty elements or add new ones without changing your XPath expressions.

It's true that online converters give you no guarantee, but it is a general property of tools that you do not know what you get. As we are aiming to provide a well-defined language feature, I wonder whether we should not focus on robustness first and rather regard/offer dynamic/heuristic features as options.

Uniform layouts would be a cautious step towards that direction (unless there are other aspects, apart from performance, that speak against uniformity by default?).

michaelhkay · 2024-12-07T13:16:26Z

Whatever set of rules you adopt, it's not going to match the semantics of the data model 100% of the time. The aim is to get it right automatically most of the time, and to enable the user to intervene to fix the cases where we don't get it right. I think the current design achieves that.

Perhaps we should do some experiments on significant samples of real-world XML to see how well it copes.

ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. labels Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fn:elements-to-maps: Robustness #1646

fn:elements-to-maps: Robustness #1646

ChristianGruen commented Dec 6, 2024

michaelhkay commented Dec 7, 2024

ChristianGruen commented Dec 7, 2024

michaelhkay commented Dec 7, 2024

fn:elements-to-maps: Robustness #1646

fn:elements-to-maps: Robustness #1646

Comments

ChristianGruen commented Dec 6, 2024

michaelhkay commented Dec 7, 2024

ChristianGruen commented Dec 7, 2024

michaelhkay commented Dec 7, 2024