Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

150bis revised proposal for fn:ranks #1062

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

michaelhkay
Copy link
Contributor

This proposal is an amended/alternative proposal for the fn:ranks function, taking into account the work done on the original issue #150 and the PR #1027 and the comments raised. Acknowledgements to the original author for the idea and for a lot of good work on examples etc.

It amends the previous proposal as follows:

(a) the signature and the semantics are aligned with fn:sort. This adds some functionality (multiple sort keys, ascending/descending) and also removes some complexity (two different collations for comparing input items and result items)

(b) the style of exposition is changed editorially for consistency with other functions

@michaelhkay michaelhkay mentioned this pull request Mar 5, 2024
@dnovatchev
Copy link
Contributor

This proposal is an amended/alternative proposal for the fn:ranks function, taking into account the work done on the original issue #150 and the PR #1027 and the comments raised. Acknowledgements to the original author for the idea and for a lot of good work on examples etc.

It amends the previous proposal as follows:

(a) the signature and the semantics are aligned with fn:sort. This adds some functionality (multiple sort keys, ascending/descending) and also removes some complexity (two different collations for comparing input items and result items)

(b) the style of exposition is changed editorially for consistency with other functions

I find both of these "amends" a step backwards from the original proposal.

What was intentionally pretty simple to understand, and call, function - is now loaded with huge and unnecessary luggage, like multiple key() functions and multiple collations.

Despite this awful increase in complexity, some functionality present in the original fn:ranks proposal has been eliminated - users do want to have distinct items within a rank - and this is not supported in this "amended" proposal.

For me the original fn:ranks and the function proposed here are two entirely different functions, that should not be mixed together. I firmly believe that it is better to go with the simpler one - easier to use and understand and still giving the user the option of getting distinct results in the ranks, which the new proposal eliminated completely.

If someone wants to suppress any good proposal coming from other people, then why we have this so called "community group" at all?

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Mar 5, 2024

I think the proposal is simpler than yours because almost all the semantics are copied and reused from fn:sort: it adds capability while reducing the number of words in the spec, and that's generally a good indicator.

I took out the functionality involving multiple collations and different comparisons for ordering and equality comparison because after strenuous efforts, I simply couldn't find a good way to take advantage of it.

Don't take it personally to have your ideas taken in a different direction by other people. It happens to me all the time, and the result is usually better than what I would have achieved on my own. That's precisely why we have a community group, to throw ideas around until a consensus crystalises. It's always good when a requirement gets creative input and alternative proposed solutions from different people.

@dnovatchev
Copy link
Contributor

dnovatchev commented Mar 5, 2024

Don't take it personally to have your ideas taken in a different direction by other people. It happens to me all the time, and the result is usually better than what I would have achieved on my own. That's precisely why we have a community group, to throw ideas around until a consensus crystalises. It's always good when a requirement gets creative input and alternative proposed solutions from different people.

I am not "taking it personally", but I have a few questions:

  1. Why should people who want to have distinct items within a rank be prevented from specifying this on a function call?

  2. Why complicate the reader's understanding of this function (difficulties in understanding may result in the user's decision not to use the function at all) with:

  • sequence of many key functions vs. a single function

  • sequence of many collations vs. a single collation

  • multiple directions of ordering vs. specifying this within the key function

  • unnatural order of the arguments, forcing the user always to provide on a call at least one collation (even if there is no need for a collation) before being able to specify the key function.

These are the changes that I see in the new proposal and these are the immediate questions that they raise. In case these issues are eliminated, I would have nothing against the proposal - this is a question of providing the best and simplest to use functionality - not "taking it personally".

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Mar 5, 2024

In résponse to your questions:

  1. Why should people who want to have distinct items within a rank be prevented from specifying this on a function call?

It may well be my limited understanding, but I haven't seen use cases that explain to me why this functionality is needed. It wasn't clearly identified in the original issue #150, and seems to have crept in later through a kind of "feature creep". Your proposal has two examples that use this functionality, but they only provide input and output, no motivating statement of requirements. In the original exposition of the issue, you pointed to functions available in MS SQL and in particular to the DENSE_RANK() function. As far as I can see the capability delivered in my version of fn:ranks is functionally equivalent to what DENSE_RANK() does. But I have no experience with MS SQL, and the web page you pointed to is not well written, so I may well have missed something.

  1. Why complicate the reader's understanding of this function with...:

My belief is that the similarity of the function to fn:sort will make it easier for users to grasp how to use the function, rather than making it more difficult. There is less to learn because their existing knowledge and experience is transferable.

In addition, we discovered from user experience that the 3.1 version of fn:sort lacked features that users needed; I don't see any obvious reason why they should be needed in fn:sort and not in fn:ranks. I see that the MS SQL DENSE_RANK() function does indeed allow multiple sort keys, with the option for any of them to be descending.

I also see (on some closer reading) that DENSE_RANK() allows separate keys for partitioning, and for sorting within a partition. As far as I can see, such use cases can be satisified by using fn:partition and fn:sort in combination. But if you think there is a requirement that cannot be satisfied this way, then please share it, so we can work on it.

@dnovatchev
Copy link
Contributor

In résponse to your questions:

  1. Why should people who want to have distinct items within a rank be prevented from specifying this on a function call?

It may well be my limited understanding, but I haven't seen use cases that explain to me why this functionality is needed. It wasn't clearly identified in the original issue #150, and seems to have crept in later through a kind of "feature creep". Your proposal has two examples that use this functionality, but they only provide input and output, no motivating statement of requirements. In the original exposition of the issue, you pointed to functions available in MS SQL and in particular to the DENSE_RANK() function. As far as I can see the capability delivered in my version of fn:ranks is functionally equivalent to what DENSE_RANK() does. But I have no experience with MS SQL, and the web page you pointed to is not well written, so I may well have missed something.

In SQL everything is a set. Thus every record (row) returned from a query is distinct from all others, so there is no need for further disambiguation. On the other side we still don't have the set datatype in XPath and this is why we still need to rely on functions like fn:distinct-values and kill many trees by having to repeat in the rules of most of our functions that "the order of the results is implementation-defined" - which is totally eliminated when we have and use the set type.

  1. Why complicate the reader's understanding of this function with...:

My belief is that the similarity of the function to fn:sort will make it easier for users to grasp how to use the function, rather than making it more difficult. There is less to learn because their existing knowledge and experience is transferable.

I quite disagree here. If the sort function is rather incomprehensible and difficult to grasp, and we base the ranks function on it, then the ranks function would be equally incomprehensible and difficult to grasp.

In addition, we discovered from user experience that the 3.1 version of fn:sort lacked features that users needed; I don't see any obvious reason why they should be needed in fn:sort and not in fn:ranks. I see that the MS SQL DENSE_RANK() function does indeed allow multiple sort keys, with the option for any of them to be descending.

The proposed function ranks is not a blind copy of any of the MS SQL ranking functions. It is rather an intentional attempt to provide the maximum simple and easiest to use such function to the user.

I also see (on some closer reading) that DENSE_RANK() allows separate keys for partitioning, and for sorting within a partition. As far as I can see, such use cases can be satisified by using fn:partition and fn:sort in combination. But if you think there is a requirement that cannot be satisfied this way, then please share it, so we can work on it.

Not a requirement that cannot be satisfied - just that all this becomes too-complicated for a sane, everyday developer.

The main idea in fn:ranks is that in its simplest form it will (in the most frequent case) be called specifying just two arguments: the $input and a single $key function. If a user wants to do something more sophisticated, they can always specify this within the key function.

@ChristianGruen ChristianGruen added the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label May 21, 2024
@ChristianGruen ChristianGruen changed the title 150bis - revised proposal for fn:ranks 150bis revised proposal for fn:ranks Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked PR is blocked (has merge conflicts, doesn't format, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants