-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
982 Rewrite of scan-left and scan-right #1296
base: master
Are you sure you want to change the base?
Conversation
First of all, thank you for finding the minor typo in the example. I have submitted a pull request (1297) that corrects this. As explained in the "xpath-ng" channel of slack (https://xmlcom.slack.com/archives/C01GVC3JLHE/p1719185521735999?thread_ts=1719181439.481479&cid=C01GVC3JLHE), the results in the spec are correct if the mistyped expression:
is corrected to what has really been intended:
Again, as explained in the same thread (https://xmlcom.slack.com/archives/C01GVC3JLHE/p1719188848301309?thread_ts=1719181439.481479&cid=C01GVC3JLHE), the correct result is provided:
Please, read the explanation there, and I would be glad to answer any additional questions. As for whether or not the initial accumulator value should be provided, this is being done in accordance with how the function is defined in Haskell, and the initial accumulator value is definitely needed to be provided in case the input is the empty sequence. As the pull request #1297 fixes the typo and the questions raised have been answered, can we close this pull request (#1296) now? |
@@ -31345,7 +31345,7 @@ path with an explicit <code>file:</code> scheme.</p> | |||
<fos:proto name="scan-left" return-type="array(*)*"> | |||
<fos:arg name="input" type="item()*" usage ="navigation"/> | |||
<fos:arg name="zero" type="item()*"/> | |||
<fos:arg name="action" type="fn(item()*, item()) as item()*" usage="inspection"/> | |||
<fos:arg name="action" type="fn(item()*, item(), xs:integer) as item()*" usage="inspection"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A position argument is not needed and is artificial burden on the user's cognitive, learning and understanding process.
@@ -31354,11 +31354,16 @@ path with an explicit <code>file:</code> scheme.</p> | |||
<fos:property>focus-independent</fos:property> | |||
</fos:properties> | |||
<fos:summary> | |||
<p>Produces the complete (ordered) sequence of all partial results from every new value | |||
the accumulator is assigned to during the evaluation of fn:fold-left.</p> | |||
<p>Produces the complete (ordered) sequence of all intermediate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precisely:
<p>Produces the complete (ordered) sequence of all intermediate
results of an evaluation of fn:fold-left immediately after an item
in the sequence has been processed.</p>
<p>The function is equivalent to the following implementation in XPath (return clause added in comments for completeness):</p> | ||
<p>The result of the function is the value of the expression:</p> | ||
<eg>(1 to count($input)) ! | ||
array{ slice($input, end := .) => fold-left($zero, $action) }</eg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't obvious at all if this expression is equivalent to the current XPath 3.1 executable code.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?
given in the rules above is provided purely for formal specification | ||
purposes.</p> | ||
<p>Each intermediate result is placed in a separate array. The number of arrays | ||
in the result is the same as the number of items in <code>$input</code>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be an array containing only the initial accumulator value - exactly as done in Haskell's scanl
function.
This is necessary for the case when the input is the empty sequence. If nothing is produced, then the function fails to provide the steps of processing - the decision to produce and return as final result the initial-accumulator-value.
</fos:notes> | ||
<fos:examples> | ||
<fos:example> | ||
<fos:test> | ||
<fos:expression><eg>scan-left(1 to 5, 0, op('+'))</eg></fos:expression> | ||
<fos:result>[ 0 ], [ 1 ], [ 3 ], [ 6 ], [ 10 ], [ 15 ]</fos:result> | ||
<fos:result>[ 1 ], [ 3 ], [ 6 ], [ 10 ], [ 15 ]</fos:result> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out above, we do need the initial-accumulator-value included.
<p>Produces the complete (ordered) sequence of all partial results from every new value | ||
the accumulator is assigned to during the evaluation of fn:fold-right.</p> | ||
<p>Produces the complete (ordered) sequence of all intermediate | ||
results of an evaluation of <code>fn:fold-right</code>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precisely:
<p>Produces the complete (ordered) sequence of all intermediate
results of an evaluation of `fn:fold-right` immediately after an item
in the sequence has been processed. The intermediate result of processing the
k-th item of the input sequence is immediately followed by
the intermediate result of processing the k+1th item in the input sequence</p>
<fos:examples> | ||
<fos:example> | ||
<fos:test> | ||
<fos:expression><eg>scan-right(1 to 5, 0, op('+'))</eg></fos:expression> | ||
<fos:result><eg>[ 55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ], | ||
[ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]</eg></fos:result> | ||
<fos:result><eg>[ 5 ], [ 9 ], [ 12 ], [ 14 ], [ 15 ]</eg></fos:result> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is wrong.
The correct result (also produced by Haskell) is:
[15], [14], [12], [9], [5], [0]
The original example may be better in showing the details of how scan-right works:
scan-right(1 to 5, 0, op('+'))
produces:
[55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ], [ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]
|
||
<eg>reverse(1 to count($input)) ! | ||
array{ slice($input, start := .) => fold-right($zero, $action) }</eg> | ||
<!--<p>The function is equivalent to the following implementation in XPath (return clause in comments added for completeness):</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed below, this produces the wrong result
@@ -31381,36 +31386,34 @@ let $scan-left := fn( | |||
$scan-left-inner($input, $zero, $action, $scan-left-inner) | |||
} | |||
(: return $scan-left(1 to 10, 0, op('+')) :) | |||
]]></eg> | |||
]]></eg>--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are any pressing reasons to comment out the XPath 3.1 - executable code?
|
||
<eg>reverse(1 to count($input)) ! | ||
array{ slice($input, start := .) => fold-right($zero, $action) }</eg> | ||
<!--<p>The function is equivalent to the following implementation in XPath (return clause in comments added for completeness):</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't obvious at all if this expression is equivalent to the current XPath 3.1 executable code.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?
@@ -31477,37 +31484,34 @@ let $scan-right := function( | |||
$scan-right-inner($input, $zero, $f, $scan-right-inner) | |||
} | |||
(: return $scan-right(1 to 10, 0, op('+')) :) | |||
]]></eg> | |||
]]></eg> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?
the first argument is any item in the sequence <code>$input</code>, and the second is either | ||
the value of <code>$zero</code> or the result of a previous application of | ||
<code>$action</code>.</p> | ||
<p>See <code>fn:fold-left</code>: errors are raised in the same situations.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean:
See fn:fold-right
: errors are raised in the same situations.
purposes.</p> | ||
<p>Each intermediate result is placed in a separate array. The number of arrays | ||
in the result is the same as the number of items in <code>$input</code>.</p> | ||
<p>The fact that the function has the same signature as <code>fn:fold-right</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be an array containing only the initial accumulator value - exactly as done in Haskell's scanr
function.
This is necessary for the case when the input is the empty sequence. If nothing is produced, then the function fails to provide the steps of processing - the decision to produce and return as final result the initial-accumulator-value.
<p>The fact that the function has the same signature as <code>fn:fold-right</code> | ||
means that this function can conveniently be used to study the behavior of | ||
an call on <code>fn:fold-right</code> with the same arguments, perhaps for | ||
diagnostic purposes.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more common use-case is for producing the running totals of a continuous computation
</olist> | ||
</fos:notes> | ||
<p>A practical implementation might be expected to evaluate the result | ||
incrementally in a single right-to-left pass of the input; the equivalent expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a single right-to-left-pass of the input" is not exactly precise and could be misleading.
There is initially a "left-to-right" pass in which the items are fixed and waiting for the remaining computation on the right to be performed.
<fos:expression><eg>scan-right(1 to 3, 0, op('-'))</eg></fos:expression> | ||
<fos:result>[ 2 ], [ -1 ], [ 3 ], [ 0 ]</fos:result> | ||
<fos:expression><eg>scan-right(1 to 5, 0, op('-'))</eg></fos:expression> | ||
<fos:result>[ 5 ], [ -1 ], [ 4 ], [ -2 ], [ 3 ]</fos:result> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. The correct result is:
[3], [-2], [4], [-1], [5], [0]
Thanks for review. I'll pick up your corrections to the examples. I think that leaves three substantive points. Firstly, the question of whether to include the zero value in the result. It does seem to be a convention that the zero value is included, though I have to say I can't see why. But I'll back out that change and include it since it's conventional. Secondly the position argument. If it were just this function I wouldn't have a strong argument to make for it. But consistency is absolutely essential. If we have this argument on the fold functions, then we must have it on the scan functions. It's completely unacceptable to make them different. We made a decision to include a position argument uniformly on all the item-by-item callbacks and unless we want to reverse that decision, we should respect it. Thirdly the style of specification. I agree with you that it's hard to verify that my formulation is equivalent to yours. But the reason that it's hard to verify is that your formulation is very hard to reason about. The way I've written it (at least for scan-left) it's very clear that the formal exposition corresponds directly to the informal explanation: In fact, we could express it clearly in narrative prose as "the function returns a sequence of arrays in which the value of the Nth array is the result of applying fold-left, with the same $zero and $action arguments, to the subsequence of $input starting at the first item and ending at the Nth item". By contrast, working out what the function actually does from the current 20 lines of recursive code essentially involves executing the algorithm in your head. The recursive exposition would be a lot clearer if we wrote it in XQuery, as we do with other functions:
Perhaps we should go with that one. (Though adding the position argument would require reintroduction of a helper function, as is done with fold-left.) Finally, I notice that I introduced a change to the form of the output. For the third example, your code (ignoring the stray ']' at the end) returns
whereas mine returns
Either would work, but I feel that the second result is more usable (for example, it can be rendered as JSON). This can be achieved simply by changing |
I'm inclined to propose dropping the position argument for both fold and scan. It complicates the specification and the use cases are unconvincing. I believe it has been incorrectly specified (for fold-left, the first time $action is called, the value supplied for $pos is 2, whereas for fold-right it is count($input)-1; and the "Error conditions" section talks of $action being applied to 2 arguments). For the -right forms in particular, the semantics are mind-bending enough without introducing this complication. |
I greatly support the suggestion in your next comment to drop this position-argument from both folds and scans! 💯 💯 💯
OK, why not provide both? The pure XPath code is intended for people who need to use the functions right now as we speak - not willing to wait for two more years until they get these functions.
We could go either way. My understanding is that the on each step we get a single result of type |
A great decision, which I absolutely support. This allows me to skip writing my own proposal for removing the position-argument from the folds! 💯 💯 💯 |
1209589
to
2b68b96
Compare
(: return $scan-left(1 to 10, 0, op('+')) :) | ||
]]></eg> | ||
) as item()* { | ||
array{$zero}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to construct the array as:
[$zero]
This is more understandable, because we treat the $action function as producing a single result - even when this result is a sequence.
In light of the fact that we have discovered a use-case for the positional fold-left within our own code, and in the light of Christian's comments, I have reinstated the 3-argument callback for the fold and scan functions. I found a new way to formulate the semantics that I hope is conceptually clearer and less error-prone, by defining the arity-3 version in each case in terms of the arity-2 version. Certainly for fold-right and scan-right, I think it makes it much clearer how the position argument works. |
The fact that it required so much effort and fantasy just to construct a single use-case, only further shows that adding a position-argument to folds and scans is absolutely unnecessary and, with good will, can be avoided. Let's adhere to the KISS principle in our design work and try to learn from other great sources, such as Microsoft. In case someone so badly needs the position-argument, this could perfectly be added as a separate, new function, keeping clean the original ideas behind folds and scans. As for the "uniformity" ... "Where all think alike, no one thinks very much." |
I feared this would cause new controversy.
|
I am not replying, because I have already said everything on this subject, and all the above arguments do not hold. I wouldn't be discussing the X-languages if I was strict Javascript fan. And I will hate the waste of time, confusion and possible errors due to this, having to read longer than necessary documentation describing an argument that I would never use and that I should never use. The ridiculous attempts to find use-cases for having this argument in folds are good evidence, we don't need to waste our time on arguing further. |
OK, following this logic let us then add a dozen more meaningless and unnecessary arguments - they will be completely optional and no one would be forced to use them. |
Here’s a (simplified) example for using the positional argument in folds that I just come across today. The code creates intersections of multiple ID sequences: let $id-arrays := [
(1, 3, 5, 10, 15, 21),
(4, 7, 10, 15),
(8, 10, 15, 18)
]
return array:fold-left($id-arrays, (), fn($result, $ids, $pos) {
if($pos = 1) then $ids else $ids[. = $result]
}) Of course, it will only have a chance to convince those who are not immune to open discussions in principle. |
Alas, this is another contrived example. Here is a proper and do note: simpler, fold solution without any positional argument: let $id-arrays := [
(1, 3, 5, 10, 15, 21),
(4, 7, 10, 15),
(8, 10, 15, 18)
]
return
array:fold-left($id-arrays, $id-arrays[1], fn($result, $ids) {
$ids[. = $result]
}) @ChristianGruen So, who is immune or not to open discussions? |
In the original use case, it’s not that easy as more operations are performed on the input sequence in the HOF body before the final intersection is created. Next, your solution requires the input to be bound to a variable. You can’t do things like
In general, people who have already decided what’s right or wrong, no matter what others will say. |
There was no "original use case" specified - just an artificial example that uses unnecessarily a positional argument - just for the sake to show that having the positional argument is useful - when in fact it is not at all.
Hmm... ??? This is not "my solution" . This is exactly the solution that was originally provided by @ChristianGruen , having been optimized by just throwing out the unneeded 3-argument action-function and replacing it by a simpler 2-argument action function that has 2 times less operations. As for:
This statement is completely wrong. We can have
As for performing "more operations", the 2-argument action function performs 2 times less operations as the whole "or" has been eliminated - it indeed was not needed and superficially added.
It is not at all matter who says what. It is a matter of facts and evidence. The fact is that for years there has not been even a single example specified for the necessity of a 3-argument action function, and the fact that so many wise people could not provide for a significantly long period any non-contrived example. Thus everyone can make the conclusion just based on the facts. |
Let’s stop it; we should let others speak. Everything else is waste of time and energy. |
Absolutely agreed! The facts are there - let everyone make their conclusion. |
An index argument is extremely useful I need it all the time But too many variables make the code hard too read. And the implementation becomes slow, when it has to handle too many arguments. Especially with function coercion adding further type checks XPath had the best solution with the |
I can't see why people are getting so emotional about this. It's a simple design trade-off, of the kind that software designers have to make all the time: do we design this function to handle only the use cases that we can identify, or do we make it consistent with other functions? There are arguments for both approaches and the decision is a subjective one. In general I'm a strong believer in orthogonality as a language design principle: that means making different constructs work the same way, even if the effect is to provide functionality for which there is no known use case. The arguments in favour of that are that it's easier for users to remember how things work if everything works the same way, and it's often easier for implementors as well because they can reuse the same code and don't have to deal with so many special cases. In addition, it's often the case that when the language designers can't think of a use case, the users will soon find one; we don't have the benefit of having a complete requirements statement from our users. There are exceptions where a non-orthogonal design decision is appropriate. For example, in general we want all numeric functions to operate on all numeric types, but we made a decision that trigonometric functions should only operate on double-precision floating point, not on single-precision or decimals. That decision is justified both (a) by the absence of use cases, and (b) by considerations of implementation complexity (libraries for trigonometric functions on decimals are not readily available). As a group, we can't make decisions on a matter like this unless everyone starts by recognising that there are valid arguments for and against both options; we need to cut out the kind of comment that asserts that either option is obviously correct or obviously wrong. We should also (and this is much more difficult) avoid spending more time on it than the matter deserves. We could go either way without upsetting many users - we need a bit more of a sense of perspective. |
If only that were true! At least in XSLT, it's not possible to determine by static analysis whether |
A nice simple use case for positional arguments on fold-left is to implement something like string-join() or intersperse() that uses separators:
Of course, that doesn't prove that the positional fold is a must-have, but it perhaps illustrates how it might be useful. |
In principle, I see three types of use cases: 1. Special treatment of first inputExpressed with Michael’s example, and the one before for intersecting sequences. Code like… let $input := 1 to 5
return fold-left(tail($input), head($input), fn($a, $b) { $ac '-', $b }) …can instead be written as: fold-left(1 to 5, (), fn($a, $b, $p) { $a, '-'[$p > 1], $b }), The input must only be specified once here, which is e.g. helpful when chaining function calls. 2. Positional computationsThe availability of the position can be helpful when it contributes to the result: fold-left((10, 1, 12, 2, 13, 3), 0, fn($acc, $n, $p) {
$acc + (if($p mod 2) then $n else -$n)
}) It also allows us to write irregular join patterns: fold-left(1 to 5, (), fn($a, $b, $p) { $a, '-'[$p mod 5 = 0], $b }), 3. DebuggingWhen developing code, or when writing logs, it can be helpful to output intermediate results along with the number of folds that have already been performed: fold-left(1 to 5, 0, fn($a, $b, $p) { trace($a + $b, $p || '. ') }) As so often, real use cases are too complex to be represented with simple examples. I experienced that’s a general challenge when teaching folds to newcomers. All examples could also be written with an additional argument in a recursive function. All folds can be written recursively in general. |
Of course, this is simpler written as a call to the classic left fold with a simpler, 2-arguments action function: fold-left(tail($input), $input[1], fn($accum, $next){$accum, $separator, $next}) And this call is noticeably shorter and simpler than the one that uses a 3-arguments action-function. No conditions, no braces, no need to make it more complex. So, how can we say that "it might be useful" to enable the user to produce more complex and difficult to understand solutions? |
@ChristianGruen, Thank you for the provided classification. Please, see my previous comment to the example by @michaelhkay that falls into group 1. To repeat it here:
For the other 2 cases, we have a general and extremely useful solution provided by @michaelhkay ( thanks Mike! ). This is a 2-step solution. The first step produces a map with entries In the 2nd step this map is passed as the $input argument of a function call to fold-left, that uses just a 2-argument action-function. I strongly support this great way of specifying "positional computation" - it is more readable and understandable. It is less complex. This is a very helpful discussion that has strongly convinced me even more that Microsoft was right not to add any overloads with a 3-argument action-function to methods of the Enumerable class that are folds or results of executing folds: Aggregate, Any, All, Min, Max, MaxBy, Average. Was the specification of these 7 methods (without an action function that takes a position-argument) an accidental omission? Obviously not! |
I agree it’s helpful. We had the chance to read a lot about different personal preferences and convictions. Personally, I have learnt to appreciate the enhanced positional flexibility in the past weeks. This is my own practical experience; it’s obviously not up to me to judge about the perception of anyone else. I also know I have no chance to deliver arguments that are convincing enough to everyone reading this. Similarly, we have no objective criteria to decide whether Microsoft or JavaScript/Kotlin have got it right. I rather think a majority vote may help us here. What I think we all can do is gather more opinions and interview other users (ideally, without bias). At least for me, it has always been constructive to get to know which recently added features are embraced, rejected and ignored by our user community. The judgements didn’t necessarily reflect my own opinion. As I have already mentioned, the positional argument for |
Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function. I would be happy if we have such a function in case this is a new, different function from the well-established folds. Having such a separate function will give us a better visibility into its actual frequency of usage - thus we will have really objective data. |
Currently, this is the consistent solution, as the other HOF functions have been enhanced with position arguments as well. In this matter, Microsoft and JavaScript have a commonality: They have both made a consistent design choice. If we believe that the addition of position arguments should be discussed again, the most productive solution could be to discuss it in a separate issue. |
Yes, as I demonstrated at our meeting this week, the folds definitions either produce totally wrong result or even compile-time errors. (Maybe because of the attempt to add too much complexity and new "functionality" ?) And yes, this is a problem concerning all folds and therefore scans. Folds are the base for scans, thus we need this resolved first for folds. Discussing this in isolation just for scans is not too-meaningful. |
This was definitely an editorial lack of attention on my part. I remember I did not spend enough much time on it. Sorry for that.
Not just folds indeed: It affects the full range of sequence functions (more than 10) that have function parameters. It’s been a while ago when we have included it to the spec after the proposal was accepted in the weekly meeting – but you could certainly create a new issue for that to discuss it anew. |
Hasn't this discussion been going on for the last 2-3 weeks? Aren't we really in the midst of this discussion? And yes, folds are some specific group of functions, more differentiated than the general group of all sequence functions. |
If I got it right, the discussion in this PR is not positional arguments in general. I believe that moving it into a separate issue would be more productive. Sorry for closing the PR in between (the Close button is too easy to reach). |
Fix #982
The "equivalent expression" is replaced with one that is much shorter and hopefully easier to understand, though hopelessly inefficient as an actual implementation.
The result no longer includes the zero value. This seems simpler, and is consistent with other expositions I have read, e.g. of the Scala functions.
The signature of scan-left and scan-right is now identical to fold-left and fold-right, which apart from having the virtue of consistency, makes it much easier to specify one in terms of the other. The change is that the callback function now allows a position argument.