-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fn:elements-to-maps: Robustness #1646
Comments
All XML-to-JSON converters suffer from this problem, and we have put in many features to minimise its impact (unlike the many popular one-shot converters, where the user simply has to live with it). I'm open to other opinions on the default for "uniform"; it's a trade-off. |
In some way, the new function differs a lot from the other functions we have: Its output depends a lot on specifics of the input. This makes it hard to guess what you will get, and this (I believe) is exactly what will be confusing in practice, and even dangerous in productive environments if you do not analyse your output very carefully. Imagine we have a great number of homogeneous elements, with one exception: <xml>
<_><a>X</a><a>Y</a></_>
<_><a>X</a><a>Y</a></_>
<_><a>X</a><a>Y</a></_>
<_><a>X</a><info/></_>
<_><a>X</a><a>Y</a></_>
<_><a>X</a><a>Y</a></_>
</xml> The result is certainly not what you would expect: {
"xml": [
["X", "Y"],
["X", "Y"],
["X", "Y"],
{ "a": "X", "info": "" },
["X", "Y"],
["X", "Y"]
]
} My experience is that a lot of real XML data is 90% homogeneous, but not 100%. It is exactly the advantage of XML to be more flexible than tables and, for example, to be able to omit empty elements or add new ones without changing your XPath expressions. It's true that online converters give you no guarantee, but it is a general property of tools that you do not know what you get. As we are aiming to provide a well-defined language feature, I wonder whether we should not focus on robustness first and rather regard/offer dynamic/heuristic features as options. Uniform layouts would be a cautious step towards that direction (unless there are other aspects, apart from performance, that speak against uniformity by default?). |
Whatever set of rules you adopt, it's not going to match the semantics of the data model 100% of the time. The aim is to get it right automatically most of the time, and to enable the user to intervene to fix the cases where we don't get it right. I think the current design achieves that. Perhaps we should do some experiments on significant samples of real-world XML to see how well it copes. |
Copied from #1592 (comment) and #1592 (comment):
[USER2] More user feeback:
The initial feedback I gathered so far is that the function works fine if the input is regular and uniform, but as soon as there are slight deviations, it can get wild. Here are some plain examples how a small change to the input results in fairly different output:
Possible solutions:
uniform
by default (performance considerations should not outweigh usability concerns)record
fromall-different(*!node-name())
tonot(all-equal(*!node-name()))
The text was updated successfully, but these errors were encountered: