-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1651 Ordered Maps #1609
1651 Ordered Maps #1609
Conversation
Impressive. First feedback:
|
That sounds overly complicated With a hidden property, the user forgets which kind of map he has. Then the users gets very confused when the order is not what he expects. Or worse, there is a map with ordering "random", but the user thinks it is "sorted", tests his query, and it randomly produces the correct output, because the keys come out in sorted order by happenstance. But then the next day the query fails, because it is randomly a different order One could just make all maps in fifo order (insertion order) Javascript and Python did that afaik If everything is fifo, there can still be a map:sort function to sort the keys ( without affecting keys that are inserted later) Reminds me of the |
I confess, my knee-jerk reaction to this issue is, "maps aren't ordered. If you care about the order don't put it in a map." An unpopular opinion, I'm sure, and one that's not consonant with other languages that already preserve insertion order. I assume that main argument against always using an order-preserving implementation is one of performance. Is that the case? How bad is the performance hit? How bad a hit would you be willing to accept as a trade-off for the simplicity of not having two kinds of maps? I guess a kind of counter argument would be, if most users want order-preserving maps most of the time, so that's what they end up always using out of habit and cutting-and-pasting from previous code, and because the results are never a surprise, then does the performance benefit of having unordered maps available justify the increased complexity and maintenance cost imposed on implementors to support a rarely used feature? |
From Christian: All things that I considered, and where I don't feel strongly that my solution is the only one acceptable.
"insertion order" is widely used, but what if we want some day to introduce "insert at start" rather than "insert at end"
Indeed so. But the term "hashed random" is widely used. Well, it used to be.
Yes, seems reasonable.
Yes, probably. I also wondered about how to get the last key in the range without retrieving all of them, perhaps
Yes, I changed my mind while writing so it's inconsistent. |
Norm:
Well there are two kinds of ordered maps - sorted, and fifo, and both are useful. I don't know what the performance hit is, but probably measurable (space as well as time). Also sorted maps restrict the keys you can use to those that have a defined ordering. |
@ndw Difficult to tell in general, but too bad to make it the default. Immutable maps (such as the HAMT that we have implemented) are very efficient, but pretty complex data structures, at least compared to conventional mutable maps, and it would be a considerable effort to enhance it to support orderedness. We would probably introduce a new data structure and use it as a fallback for ordered contents. As far as I know, Haskell doesn’t use HAMT for ordered maps either. |
Benibela:
I've never seen that being a problem in Java, where you choose a map implementation at the time of instantiation and that affects its subsequent behavior. And of course, for most interactions with the map, it doesn't matter because if you assume the order of keys is unpredictable, your code will always work. The only thing that will fail is if you try to put the wrong kind of key into a sorted map, and I think that will be rather uncommon - people tend to know what kind of keys a map is designed to contain. |
I hate to be hesitant, but after my nightly reflection, I think we should take one step at a time, and consider whether we need the full range of features proposed in this PR. We have two challenges that are closely related on the one hand, but that are completely different on the other:
Challenge 1 is something we cannot simulate with the existing map. I can see various additional use cases apart from representing JSON data. One common way to process ordered map entries is to create sequences of singleton entries; it would be great to be able to represent them in a single map. Challenge 2 is something that I clearly regard as a convenience feature. It can easily be simulated by iterating over the map entries: let $map := map:build(1 to 1_000_000, string#1)
return map:keys($map)
=> sort()
=> filter(fn { . >= '5' and . < '6' })
=!> $map() In addition, …
Adding multiple map variant is not only yet-another-non-trivial-4.0-todo for implementers. Its implications will also be challenging for users. It is not obvious/intuitive to understand that the parsing of JSON data that retains the order leads to a FIFO map. Most people would rather call the result “ordered” (which would be in line with the proposed The proposal itself is definitely impressive: it does a great job of trying to solve multiple problems at once. Still, I hope we will spend some more time understanding its long-term consequences before we accept and merge it. Personally, I would be perfectly fine if we only introduced an “ordered map” that takes the insertion order into account. |
One observation I would make is that, given the availability of libraries providing a variety of persistent collections with similar APIs, it's actually very easy to implement. The difference between creating a random, sorted, or insertion-ordered map can be a single line of code at the point where you create the underlying data structure. (OK, I'm exaggerating slightly: a sorted map requires an appropriate comparator, which is slightly different from the context-sensitive comparators used for fn:sort() - but it's not far off). |
Building an unsorted map, and then sorting the keys on retrieval is fine if you only have to do it once. The gain comes if you do it repeatedly; and of course it gives you a basis for functionality like finding "nearby" values. |
It’s probably no coincidence that we have so few implementations today ;·), so I would only tend to be persuaded only if at least 1, 2 other implementers (possibly non-Java) agreed on it… In terms of our processor, I know it is no trivial extension. It might be easier indeed to drop our own implementation and resort to a library, but there have been various reasons why we did it by ourselves (at least back then). One general question that arises frequently should be asked again: Do we want the specs to be a document for Saxon and BaseX, or do we believe in embracing other implementers as well?
It would really be interesting to learn more about the use cases that have been encountered so far. Which applications would be really hard to solve with unordered maps? If performance has been an issue here in the past (it hasn’t been for us), couldn’t it as well be a chance to to tweak existing unordered-map implementations? |
The design is inspired by Saxon's range-key maps, which were introduced to meet a specific customer requirement. https://www.saxonica.com/documentation12/index.html#!functions/saxon/key-map It's sufficiently long ago that I forget the details, but it was a publishing requirement along the lines of dividing an encyclopedia into sections covering alphabetic ranges. I also recollect a customer using this for ranges of dates: find all the appointments in a given week (or find the next appointment after a given date/time). The only other way to achieve that is with a serial search. |
Sorry for cross-posting. I have moved my subsequent comment on sorted maps to #564 (comment) ff. |
0fa6594
to
b4b2720
Compare
I have revised the PR in response to discussion at the WG meeting yesterday. Main changes:
|
df0b2b2
to
e3c87ad
Compare
PR now revised to drop sorted maps. |
<xtermref spec="DM40" ref="dt-entry-order">entry order</xtermref> | ||
is <termref def="implementation-dependent"/>. | ||
</fos:value> | ||
<fos:value value="insertion"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<fos:value value="insertion"> | |
<fos:value value="true"> |
…maybe just ordered
instead of retain-order
, to describe the result of the function call and not the procedural part?
I have decided to change the title of the PR to reference #1651. |
Fix #564
Introduces ordered maps: specifically, sorted maps which return entries in order sorted by key, and fifo maps which return entries in the order of insertion.
Although this has been on the TODO-list for a long time and has many useful applications, raising a PR at this stage is particularly motivated by comments on the elements-to-maps() function pointing out that having a predictable order of properties in serialized JSON can be very useful, and that many existing XML-to-JSON converters achieve this. This gives the opportunity, for example, to parse JSON into a representation that retains input order, delete and/or add some properties, and then serializate the JSON with the order retained.