Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn:elements-to-maps: Robustness #1646

Open
ChristianGruen opened this issue Dec 6, 2024 · 3 comments
Open

fn:elements-to-maps: Robustness #1646

ChristianGruen opened this issue Dec 6, 2024 · 3 comments
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. XQFO An issue related to Functions and Operators

Comments

@ChristianGruen
Copy link
Contributor

Copied from #1592 (comment) and #1592 (comment):

[USER2] More user feeback:

It’s confusing that the following function calls lead to completely different outputs:

elements-to-maps(
  <person>
    <name>Akila</name>
    <age>34</age>
  </person>
)

{"person":{"name":"Akila","age":"34"}}

elements-to-maps(
  <person>
    <name>Akila</name>
    <name>Jaha</name>
    <age>34</age>
  </person>
)

{"person":[{"name":"Akila"},{"name":"Jaha"},{"age":"34"}]}

The initial feedback I gathered so far is that the function works fine if the input is regular and uniform, but as soon as there are slight deviations, it can get wild. Here are some plain examples how a small change to the input results in fairly different output:

<xml>
  <info>X</info>
  <address>A</address><address>B</address>
</xml>
→ { "xml": ["A", "B"] }

<xml>
  <info>X</info>
  <address>A</address>
  <address>B</address>
</xml>
→ { "xml": [{ "info": "X" }, { "address": "A" }, { "address": "B" }] }

<xml id='id0'>
  <address>A</address>
  <address>B</address>
</xml>
→ { "xml": { "@id": "id0", "address": ["A", "B"] } }

Possible solutions:

  • Enable uniform by default (performance considerations should not outweigh usability concerns)
  • Change the rules for record from all-different(*!node-name()) to not(all-equal(*!node-name()))
  • Editorial changes: Stress in the introduction that robustness is a secondary requirement.
@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. labels Dec 6, 2024
@michaelhkay
Copy link
Contributor

All XML-to-JSON converters suffer from this problem, and we have put in many features to minimise its impact (unlike the many popular one-shot converters, where the user simply has to live with it).

I'm open to other opinions on the default for "uniform"; it's a trade-off.

@ChristianGruen
Copy link
Contributor Author

In some way, the new function differs a lot from the other functions we have: Its output depends a lot on specifics of the input. This makes it hard to guess what you will get, and this (I believe) is exactly what will be confusing in practice, and even dangerous in productive environments if you do not analyse your output very carefully.

Imagine we have a great number of homogeneous elements, with one exception:

<xml>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><info/></_>
  <_><a>X</a><a>Y</a></_>
  <_><a>X</a><a>Y</a></_>
</xml>

The result is certainly not what you would expect:

{
  "xml": [
    ["X", "Y"],
    ["X", "Y"],
    ["X", "Y"],
    { "a": "X", "info": "" },
    ["X", "Y"],
    ["X", "Y"]
  ]
}

My experience is that a lot of real XML data is 90% homogeneous, but not 100%. It is exactly the advantage of XML to be more flexible than tables and, for example, to be able to omit empty elements or add new ones without changing your XPath expressions.

It's true that online converters give you no guarantee, but it is a general property of tools that you do not know what you get. As we are aiming to provide a well-defined language feature, I wonder whether we should not focus on robustness first and rather regard/offer dynamic/heuristic features as options.

Uniform layouts would be a cautious step towards that direction (unless there are other aspects, apart from performance, that speak against uniformity by default?).

@michaelhkay
Copy link
Contributor

Whatever set of rules you adopt, it's not going to match the semantics of the data model 100% of the time. The aim is to get it right automatically most of the time, and to enable the user to intervene to fix the cases where we don't get it right. I think the current design achieves that.

Perhaps we should do some experiments on significant samples of real-world XML to see how well it copes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

2 participants