Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

982 Rewrite of scan-left and scan-right #1296

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

michaelhkay
Copy link
Contributor

Fix #982

  1. The "equivalent expression" is replaced with one that is much shorter and hopefully easier to understand, though hopelessly inefficient as an actual implementation.

  2. The result no longer includes the zero value. This seems simpler, and is consistent with other expositions I have read, e.g. of the Scala functions.

  3. The signature of scan-left and scan-right is now identical to fold-left and fold-right, which apart from having the virtue of consistency, makes it much easier to specify one in terms of the other. The change is that the callback function now allows a position argument.

@michaelhkay michaelhkay requested a review from dnovatchev June 23, 2024 22:57
@michaelhkay michaelhkay added Enhancement A change or improvement to an existing feature Tests Needed Tests need to be written or merged XQFO An issue related to Functions and Operators labels Jun 23, 2024
@dnovatchev
Copy link
Contributor

dnovatchev commented Jun 24, 2024

First of all, thank you for finding the minor typo in the example. I have submitted a pull request (1297) that corrects this.

As explained in the "xpath-ng" channel of slack (https://xmlcom.slack.com/archives/C01GVC3JLHE/p1719185521735999?thread_ts=1719181439.481479&cid=C01GVC3JLHE), the results in the spec are correct if the mistyped expression:

  scan-right(1 to 5, 0, op('+'))

is corrected to what has really been intended:

  scan-right(1 to 10, 0, op('+'))

Again, as explained in the same thread (https://xmlcom.slack.com/archives/C01GVC3JLHE/p1719188848301309?thread_ts=1719181439.481479&cid=C01GVC3JLHE), the correct result is provided:

[ 55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ], [ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]

Please, read the explanation there, and I would be glad to answer any additional questions.

As for whether or not the initial accumulator value should be provided, this is being done in accordance with how the function is defined in Haskell, and the initial accumulator value is definitely needed to be provided in case the input is the empty sequence.

As the pull request #1297 fixes the typo and the questions raised have been answered, can we close this pull request (#1296) now?

@@ -31345,7 +31345,7 @@ path with an explicit <code>file:</code> scheme.</p>
<fos:proto name="scan-left" return-type="array(*)*">
<fos:arg name="input" type="item()*" usage ="navigation"/>
<fos:arg name="zero" type="item()*"/>
<fos:arg name="action" type="fn(item()*, item()) as item()*" usage="inspection"/>
<fos:arg name="action" type="fn(item()*, item(), xs:integer) as item()*" usage="inspection"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A position argument is not needed and is artificial burden on the user's cognitive, learning and understanding process.

@@ -31354,11 +31354,16 @@ path with an explicit <code>file:</code> scheme.</p>
<fos:property>focus-independent</fos:property>
</fos:properties>
<fos:summary>
<p>Produces the complete (ordered) sequence of all partial results from every new value
the accumulator is assigned to during the evaluation of fn:fold-left.</p>
<p>Produces the complete (ordered) sequence of all intermediate
Copy link
Contributor

@dnovatchev dnovatchev Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely:

    <p>Produces the complete (ordered) sequence of all intermediate 
    results of an evaluation of fn:fold-left immediately after an item 
    in the sequence has been processed.</p>

<p>The function is equivalent to the following implementation in XPath (return clause added in comments for completeness):</p>
<p>The result of the function is the value of the expression:</p>
<eg>(1 to count($input)) !
array{ slice($input, end := .) => fold-left($zero, $action) }</eg>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't obvious at all if this expression is equivalent to the current XPath 3.1 executable code.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?

given in the rules above is provided purely for formal specification
purposes.</p>
<p>Each intermediate result is placed in a separate array. The number of arrays
in the result is the same as the number of items in <code>$input</code>.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be an array containing only the initial accumulator value - exactly as done in Haskell's scanl function.

This is necessary for the case when the input is the empty sequence. If nothing is produced, then the function fails to provide the steps of processing - the decision to produce and return as final result the initial-accumulator-value.

</fos:notes>
<fos:examples>
<fos:example>
<fos:test>
<fos:expression><eg>scan-left(1 to 5, 0, op('+'))</eg></fos:expression>
<fos:result>[ 0 ], [ 1 ], [ 3 ], [ 6 ], [ 10 ], [ 15 ]</fos:result>
<fos:result>[ 1 ], [ 3 ], [ 6 ], [ 10 ], [ 15 ]</fos:result>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out above, we do need the initial-accumulator-value included.

<p>Produces the complete (ordered) sequence of all partial results from every new value
the accumulator is assigned to during the evaluation of fn:fold-right.</p>
<p>Produces the complete (ordered) sequence of all intermediate
results of an evaluation of <code>fn:fold-right</code>.</p>
Copy link
Contributor

@dnovatchev dnovatchev Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely:

    <p>Produces the complete (ordered) sequence of all intermediate 
    results of an evaluation of `fn:fold-right`  immediately after an item 
    in the sequence has been processed. The intermediate result of processing the
     k-th item of the input sequence is immediately followed by 
     the intermediate result of processing the k+1th item in the input sequence</p>

<fos:examples>
<fos:example>
<fos:test>
<fos:expression><eg>scan-right(1 to 5, 0, op('+'))</eg></fos:expression>
<fos:result><eg>[ 55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ],
[ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]</eg></fos:result>
<fos:result><eg>[ 5 ], [ 9 ], [ 12 ], [ 14 ], [ 15 ]</eg></fos:result>
Copy link
Contributor

@dnovatchev dnovatchev Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is wrong.

The correct result (also produced by Haskell) is:

[15], [14], [12], [9], [5], [0]

The original example may be better in showing the details of how scan-right works:

scan-right(1 to 5, 0, op('+'))

produces:

[55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ], [ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]


<eg>reverse(1 to count($input)) !
array{ slice($input, start := .) => fold-right($zero, $action) }</eg>
<!--<p>The function is equivalent to the following implementation in XPath (return clause in comments added for completeness):</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed below, this produces the wrong result

@@ -31381,36 +31386,34 @@ let $scan-left := fn(
$scan-left-inner($input, $zero, $action, $scan-left-inner)
}
(: return $scan-left(1 to 10, 0, op('+')) :)
]]></eg>
]]></eg>-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are any pressing reasons to comment out the XPath 3.1 - executable code?


<eg>reverse(1 to count($input)) !
array{ slice($input, start := .) => fold-right($zero, $action) }</eg>
<!--<p>The function is equivalent to the following implementation in XPath (return clause in comments added for completeness):</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't obvious at all if this expression is equivalent to the current XPath 3.1 executable code.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?

@@ -31477,37 +31484,34 @@ let $scan-right := function(
$scan-right-inner($input, $zero, $f, $scan-right-inner)
}
(: return $scan-right(1 to 10, 0, op('+')) :)
]]></eg>
]]></eg> -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are any pressing reasons to replace the XPath 3.1 - executable code with this one?

the first argument is any item in the sequence <code>$input</code>, and the second is either
the value of <code>$zero</code> or the result of a previous application of
<code>$action</code>.</p>
<p>See <code>fn:fold-left</code>: errors are raised in the same situations.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean:

See fn:fold-right: errors are raised in the same situations.

purposes.</p>
<p>Each intermediate result is placed in a separate array. The number of arrays
in the result is the same as the number of items in <code>$input</code>.</p>
<p>The fact that the function has the same signature as <code>fn:fold-right</code>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be an array containing only the initial accumulator value - exactly as done in Haskell's scanr function.

This is necessary for the case when the input is the empty sequence. If nothing is produced, then the function fails to provide the steps of processing - the decision to produce and return as final result the initial-accumulator-value.

<p>The fact that the function has the same signature as <code>fn:fold-right</code>
means that this function can conveniently be used to study the behavior of
an call on <code>fn:fold-right</code> with the same arguments, perhaps for
diagnostic purposes.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more common use-case is for producing the running totals of a continuous computation

</olist>
</fos:notes>
<p>A practical implementation might be expected to evaluate the result
incrementally in a single right-to-left pass of the input; the equivalent expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a single right-to-left-pass of the input" is not exactly precise and could be misleading.

There is initially a "left-to-right" pass in which the items are fixed and waiting for the remaining computation on the right to be performed.

<fos:expression><eg>scan-right(1 to 3, 0, op('-'))</eg></fos:expression>
<fos:result>[ 2 ], [ -1 ], [ 3 ], [ 0 ]</fos:result>
<fos:expression><eg>scan-right(1 to 5, 0, op('-'))</eg></fos:expression>
<fos:result>[ 5 ], [ -1 ], [ 4 ], [ -2 ], [ 3 ]</fos:result>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. The correct result is:

[3], [-2], [4], [-1], [5], [0]

@michaelhkay
Copy link
Contributor Author

Thanks for review. I'll pick up your corrections to the examples.

I think that leaves three substantive points.

Firstly, the question of whether to include the zero value in the result. It does seem to be a convention that the zero value is included, though I have to say I can't see why. But I'll back out that change and include it since it's conventional.

Secondly the position argument. If it were just this function I wouldn't have a strong argument to make for it. But consistency is absolutely essential. If we have this argument on the fold functions, then we must have it on the scan functions. It's completely unacceptable to make them different. We made a decision to include a position argument uniformly on all the item-by-item callbacks and unless we want to reverse that decision, we should respect it.

Thirdly the style of specification. I agree with you that it's hard to verify that my formulation is equivalent to yours. But the reason that it's hard to verify is that your formulation is very hard to reason about. The way I've written it (at least for scan-left) it's very clear that the formal exposition corresponds directly to the informal explanation: In fact, we could express it clearly in narrative prose as "the function returns a sequence of arrays in which the value of the Nth array is the result of applying fold-left, with the same $zero and $action arguments, to the subsequence of $input starting at the first item and ending at the Nth item". By contrast, working out what the function actually does from the current 20 lines of recursive code essentially involves executing the algorithm in your head.

The recursive exposition would be a lot clearer if we wrote it in XQuery, as we do with other functions:

declare function fn:scan-left($input as item(*), $zero as item(*), $action as fn(....)) {
   [$zero], 
   if (exists($input)) {
       fn:scan-left(tail($input), $action($zero, head($input), $action))
   }
}

Perhaps we should go with that one. (Though adding the position argument would require reintroduction of a helper function, as is done with fold-left.)

Finally, I notice that I introduced a change to the form of the output. For the third example, your code (ignoring the stray ']' at the end) returns

[ () ], [ 2 ], [ (2, 4) ], [ (2, 4, 6) ]

whereas mine returns

[], [ 2 ], [2, 4], [2, 4, 6]

Either would work, but I feel that the second result is more usable (for example, it can be rendered as JSON). This can be achieved simply by changing [$zero] to array{$zero}.

@michaelhkay
Copy link
Contributor Author

I'm inclined to propose dropping the position argument for both fold and scan. It complicates the specification and the use cases are unconvincing. I believe it has been incorrectly specified (for fold-left, the first time $action is called, the value supplied for $pos is 2, whereas for fold-right it is count($input)-1; and the "Error conditions" section talks of $action being applied to 2 arguments). For the -right forms in particular, the semantics are mind-bending enough without introducing this complication.

@dnovatchev
Copy link
Contributor

Thanks for review. I'll pick up your corrections to the examples.

I think that leaves three substantive points.

Firstly, the question of whether to include the zero value in the result. It does seem to be a convention that the zero value is included, though I have to say I can't see why. But I'll back out that change and include it since it's conventional.
👍 👍 👍

Secondly the position argument. If it were just this function I wouldn't have a strong argument to make for it. But consistency is absolutely essential. If we have this argument on the fold functions, then we must have it on the scan functions. It's completely unacceptable to make them different. We made a decision to include a position argument uniformly on all the item-by-item callbacks and unless we want to reverse that decision, we should respect it.

I greatly support the suggestion in your next comment to drop this position-argument from both folds and scans!

💯 💯 💯

Thirdly the style of specification. I agree with you that it's hard to verify that my formulation is equivalent to yours. But the reason that it's hard to verify is that your formulation is very hard to reason about. The way I've written it (at least for scan-left) it's very clear that the formal exposition corresponds directly to the informal explanation: In fact, we could express it clearly in narrative prose as "the function returns a sequence of arrays in which the value of the Nth array is the result of applying fold-left, with the same $zero and $action arguments, to the subsequence of $input starting at the first item and ending at the Nth item". By contrast, working out what the function actually does from the current 20 lines of recursive code essentially involves executing the algorithm in your head.

The recursive exposition would be a lot clearer if we wrote it in XQuery, as we do with other functions:

declare function fn:scan-left($input as item(*), $zero as item(*), $action as fn(....)) {
   [$zero], 
   if (exists($input)) {
       fn:scan-left(tail($input), $action($zero, head($input), $action))
   }
}

Perhaps we should go with that one.

OK, why not provide both? The pure XPath code is intended for people who need to use the functions right now as we speak - not willing to wait for two more years until they get these functions.

Finally, I notice that I introduced a change to the form of the output. For the third example, your code (ignoring the stray ']' at the end) returns

[ () ], [ 2 ], [ (2, 4) ], [ (2, 4, 6) ]

whereas mine returns

[], [ 2 ], [2, 4], [2, 4, 6]

Either would work, but I feel that the second result is more usable (for example, it can be rendered as JSON). This can be achieved simply by changing [$zero] to array{$zero}.

We could go either way. My understanding is that the on each step we get a single result of type item()* - and this is a single sequence that we put in an array, in order to model "a sequence of sequences".

@dnovatchev
Copy link
Contributor

I'm inclined to propose dropping the position argument for both fold and scan. It complicates the specification and the use cases are unconvincing. I believe it has been incorrectly specified (for fold-left, the first time $action is called, the value supplied for $pos is 2, whereas for fold-right it is count($input)-1; and the "Error conditions" section talks of $action being applied to 2 arguments). For the -right forms in particular, the semantics are mind-bending enough without introducing this complication.

A great decision, which I absolutely support.

This allows me to skip writing my own proposal for removing the position-argument from the folds!

💯 💯 💯

@michaelhkay michaelhkay force-pushed the 982-scan-left-right branch from 1209589 to 2b68b96 Compare June 25, 2024 20:17
(: return $scan-left(1 to 10, 0, op('+')) :)
]]></eg>
) as item()* {
array{$zero},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to construct the array as:

[$zero]

This is more understandable, because we treat the $action function as producing a single result - even when this result is a sequence.

@michaelhkay michaelhkay removed the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label Jun 27, 2024
@michaelhkay
Copy link
Contributor Author

In light of the fact that we have discovered a use-case for the positional fold-left within our own code, and in the light of Christian's comments, I have reinstated the 3-argument callback for the fold and scan functions. I found a new way to formulate the semantics that I hope is conceptually clearer and less error-prone, by defining the arity-3 version in each case in terms of the arity-2 version. Certainly for fold-right and scan-right, I think it makes it much clearer how the position argument works.

@dnovatchev
Copy link
Contributor

dnovatchev commented Jun 27, 2024

The fact that it required so much effort and fantasy just to construct a single use-case, only further shows that adding a position-argument to folds and scans is absolutely unnecessary and, with good will, can be avoided.

Let's adhere to the KISS principle in our design work and try to learn from other great sources, such as Microsoft.

In case someone so badly needs the position-argument, this could perfectly be added as a separate, new function, keeping clean the original ideas behind folds and scans.

As for the "uniformity" ...

"Where all think alike, no one thinks very much."

Walter Lippman

@ChristianGruen
Copy link
Contributor

I feared this would cause new controversy.

  • No one is forced to use the index argument, it’s completely optional.
  • The index argument is also available in JavaScript.
  • Kotlin, which is loved for its conciseness and consistency, provides a dedicated reduceIndexed function for that purpose. Adding an extra function would obviously be contrary to our approach to add the position argument to existing HOF functions.

@dnovatchev
Copy link
Contributor

I feared this would cause new controversy.

* No one is forced to use the index argument, it’s completely optional.

* The index argument is also available in JavaScript.

* Kotlin, which is loved for its conciseness and consistency, provides a dedicated `reduceIndexed` function for that purpose. Adding an extra function would obviously be contrary to our approach to add the position argument to existing HOF functions.

I am not replying, because I have already said everything on this subject, and all the above arguments do not hold. I wouldn't be discussing the X-languages if I was strict Javascript fan.

And I will hate the waste of time, confusion and possible errors due to this, having to read longer than necessary documentation describing an argument that I would never use and that I should never use.

The ridiculous attempts to find use-cases for having this argument in folds are good evidence, we don't need to waste our time on arguing further.

@dnovatchev
Copy link
Contributor

I feared this would cause new controversy.

* No one is forced to use the index argument, it’s completely optional.

OK, following this logic let us then add a dozen more meaningless and unnecessary arguments - they will be completely optional and no one would be forced to use them.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jul 11, 2024

Here’s a (simplified) example for using the positional argument in folds that I just come across today. The code creates intersections of multiple ID sequences:

let $id-arrays := [
  (1, 3, 5, 10, 15, 21),
  (4, 7, 10, 15),
  (8, 10, 15, 18)
]
return array:fold-left($id-arrays, (), fn($result, $ids, $pos) {
  if($pos = 1) then $ids else $ids[. = $result]
})

Of course, it will only have a chance to convince those who are not immune to open discussions in principle.

@dnovatchev
Copy link
Contributor

dnovatchev commented Jul 11, 2024

Of course, it will only have a chance to convince those who are not immune to open discussions in principle.

Alas, this is another contrived example.

Here is a proper and do note: simpler, fold solution without any positional argument:

let $id-arrays := [
  (1, 3, 5, 10, 15, 21),
  (4, 7, 10, 15),
  (8, 10, 15, 18)
]
return
  array:fold-left($id-arrays, $id-arrays[1], fn($result, $ids) {
    $ids[. = $result]
  })

@ChristianGruen So, who is immune or not to open discussions?

@ChristianGruen
Copy link
Contributor

Here is a proper and do note: simpler, fold solution without any positional argument:

In the original use case, it’s not that easy as more operations are performed on the input sequence in the HOF body before the final intersection is created. Next, your solution requires the input to be bound to a variable. You can’t do things like fold-left(EXPR, ...).

@ChristianGruen So, who is immune or not to open discussions?

In general, people who have already decided what’s right or wrong, no matter what others will say.

@dnovatchev
Copy link
Contributor

dnovatchev commented Jul 11, 2024

Here is a proper and do note: simpler, fold solution without any positional argument:

In the original use case, it’s not that easy as more operations are performed on the input sequence in the HOF body before the final intersection is created.

There was no "original use case" specified - just an artificial example that uses unnecessarily a positional argument - just for the sake to show that having the positional argument is useful - when in fact it is not at all.

Next, your solution requires the input to be bound to a variable. You can’t do things like fold-left(EXPR, ...).

Hmm... ??? This is not "my solution" . This is exactly the solution that was originally provided by @ChristianGruen , having been optimized by just throwing out the unneeded 3-argument action-function and replacing it by a simpler 2-argument action function that has 2 times less operations.

As for:

Next, your solution requires the input to be bound to a variable. You can’t do things like fold-left(EXPR, ...).

This statement is completely wrong. We can have

fold-left(ANY_EXPR_PRODUCING_SEQUENCE, ...)

As for performing "more operations", the 2-argument action function performs 2 times less operations as the whole "or" has been eliminated - it indeed was not needed and superficially added.

@ChristianGruen So, who is immune or not to open discussions?

In general, people who have already decided what’s right or wrong, no matter what others will say.

It is not at all matter who says what. It is a matter of facts and evidence. The fact is that for years there has not been even a single example specified for the necessity of a 3-argument action function, and the fact that so many wise people could not provide for a significantly long period any non-contrived example.

Thus everyone can make the conclusion just based on the facts.

@ChristianGruen
Copy link
Contributor

Thus everyone can make the conclusion just based on the facts.

Let’s stop it; we should let others speak. Everything else is waste of time and energy.

@dnovatchev
Copy link
Contributor

Thus everyone can make the conclusion just based on the facts.

Let’s stop it; we should let others speak. Everything else is waste of time and energy.

Absolutely agreed!

The facts are there - let everyone make their conclusion.

@benibela
Copy link

An index argument is extremely useful

I need it all the time

But too many variables make the code hard too read. And the implementation becomes slow, when it has to handle too many arguments. Especially with function coercion adding further type checks

XPath had the best solution with the position() function. You get the index if you need it and it if you do not need it, you do not have to deal with it

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Jul 15, 2024

I can't see why people are getting so emotional about this. It's a simple design trade-off, of the kind that software designers have to make all the time: do we design this function to handle only the use cases that we can identify, or do we make it consistent with other functions? There are arguments for both approaches and the decision is a subjective one.

In general I'm a strong believer in orthogonality as a language design principle: that means making different constructs work the same way, even if the effect is to provide functionality for which there is no known use case. The arguments in favour of that are that it's easier for users to remember how things work if everything works the same way, and it's often easier for implementors as well because they can reuse the same code and don't have to deal with so many special cases. In addition, it's often the case that when the language designers can't think of a use case, the users will soon find one; we don't have the benefit of having a complete requirements statement from our users.

There are exceptions where a non-orthogonal design decision is appropriate. For example, in general we want all numeric functions to operate on all numeric types, but we made a decision that trigonometric functions should only operate on double-precision floating point, not on single-precision or decimals. That decision is justified both (a) by the absence of use cases, and (b) by considerations of implementation complexity (libraries for trigonometric functions on decimals are not readily available).

As a group, we can't make decisions on a matter like this unless everyone starts by recognising that there are valid arguments for and against both options; we need to cut out the kind of comment that asserts that either option is obviously correct or obviously wrong. We should also (and this is much more difficult) avoid spending more time on it than the matter deserves. We could go either way without upsetting many users - we need a bit more of a sense of perspective.

@michaelhkay
Copy link
Contributor Author

XPath had the best solution with the position() function. You get the index if you need it and it if you do not need it, you do not have to deal with it

If only that were true! At least in XSLT, it's not possible to determine by static analysis whether position() and last() will be used on a particular sequence, so they have to be maintained "just in case". The XQuery solution of binding a position variable explicitly is much better from that point of view.

@michaelhkay
Copy link
Contributor Author

A nice simple use case for positional arguments on fold-left is to implement something like string-join() or intersperse() that uses separators:

declare function fn:intersperse($input as item()*, $separator as item()*) as item()* {
   fold-left($input, (), fn($accum, $next, $pos){$accum, if($pos gt 1) {$separator}, $next})
}

Of course, that doesn't prove that the positional fold is a must-have, but it perhaps illustrates how it might be useful.

@ChristianGruen
Copy link
Contributor

In principle, I see three types of use cases:

1. Special treatment of first input

Expressed with Michael’s example, and the one before for intersecting sequences. Code like…

let $input := 1 to 5
return fold-left(tail($input), head($input), fn($a, $b) { $ac '-', $b })

…can instead be written as:

fold-left(1 to 5, (), fn($a, $b, $p) { $a, '-'[$p > 1], $b }),

The input must only be specified once here, which is e.g. helpful when chaining function calls.

2. Positional computations

The availability of the position can be helpful when it contributes to the result:

fold-left((10, 1, 12, 2, 13, 3), 0, fn($acc, $n, $p) {
  $acc + (if($p mod 2) then $n else -$n)
})

It also allows us to write irregular join patterns:

fold-left(1 to 5, (), fn($a, $b, $p) { $a, '-'[$p mod 5 = 0], $b }),

3. Debugging

When developing code, or when writing logs, it can be helpful to output intermediate results along with the number of folds that have already been performed:

fold-left(1 to 5, 0, fn($a, $b, $p) { trace($a + $b, $p || '. ') })

As so often, real use cases are too complex to be represented with simple examples. I experienced that’s a general challenge when teaching folds to newcomers.

All examples could also be written with an additional argument in a recursive function. All folds can be written recursively in general.

@dnovatchev
Copy link
Contributor

A nice simple use case for positional arguments on fold-left is to implement something like string-join() or intersperse() that uses separators:

declare function fn:intersperse($input as item()*, $separator as item()*) as item()* {
   fold-left($input, (), fn($accum, $next, $pos){$accum, if($pos gt 1) {$separator}, $next})
}

Of course, that doesn't prove that the positional fold is a must-have, but it perhaps illustrates how it might be useful.

Of course, this is simpler written as a call to the classic left fold with a simpler, 2-arguments action function:

 fold-left(tail($input), $input[1], fn($accum, $next){$accum, $separator, $next})

And this call is noticeably shorter and simpler than the one that uses a 3-arguments action-function.

No conditions, no braces, no need to make it more complex.

So, how can we say that "it might be useful" to enable the user to produce more complex and difficult to understand solutions?

@dnovatchev
Copy link
Contributor

@ChristianGruen, Thank you for the provided classification.

Please, see my previous comment to the example by @michaelhkay that falls into group 1. To repeat it here:

The re-write using just the 2-argument action-function is noticeably shorter and simpler than the one that uses a 3-arguments action-function.

No conditions, no braces, no need to make it more complex.

So, how can we say that "it might be useful" to enable the user to produce more complex and difficult to understand solutions?

For the other 2 cases, we have a general and extremely useful solution provided by @michaelhkay ( thanks Mike! ). This is a 2-step solution. The first step produces a map with entries {"item": $input[$pos], "pos": $pos}.

In the 2nd step this map is passed as the $input argument of a function call to fold-left, that uses just a 2-argument action-function.

I strongly support this great way of specifying "positional computation" - it is more readable and understandable. It is less complex.

This is a very helpful discussion that has strongly convinced me even more that Microsoft was right not to add any overloads with a 3-argument action-function to methods of the Enumerable class that are folds or results of executing folds: Aggregate, Any, All, Min, Max, MaxBy, Average.

Was the specification of these 7 methods (without an action function that takes a position-argument) an accidental omission? Obviously not!

@ChristianGruen
Copy link
Contributor

This is a very helpful discussion that has strongly convinced me even more

I agree it’s helpful. We had the chance to read a lot about different personal preferences and convictions. Personally, I have learnt to appreciate the enhanced positional flexibility in the past weeks. This is my own practical experience; it’s obviously not up to me to judge about the perception of anyone else. I also know I have no chance to deliver arguments that are convincing enough to everyone reading this.

Similarly, we have no objective criteria to decide whether Microsoft or JavaScript/Kotlin have got it right. I rather think a majority vote may help us here.

What I think we all can do is gather more opinions and interview other users (ideally, without bias). At least for me, it has always been constructive to get to know which recently added features are embraced, rejected and ignored by our user community. The judgements didn’t necessarily reflect my own opinion. As I have already mentioned, the positional argument for fn:for-each has already been appreciated and utilized by someone who writes productive code with our implementation (and I was surprised to learn that fn:for-each is used at all, as I always use for or !).

@dnovatchev
Copy link
Contributor

This is a very helpful discussion that has strongly convinced me even more

I agree it’s helpful. We had the chance to read a lot about different personal preferences and convictions. Personally, I have learnt to appreciate the enhanced positional flexibility in the past weeks. This is my own practical experience; it’s obviously not up to me to judge about the perception of anyone else. I also know I have no chance to deliver arguments that are convincing enough to everyone reading this.

Similarly, we have no objective criteria to decide whether Microsoft or JavaScript/Kotlin have got it right. I rather think a majority vote may help us here.

What I think we all can do is gather more opinions and interview other users (ideally, without bias). At least for me, it has always been constructive to get to know which recently added features are embraced, rejected and ignored by our user community. The judgements didn’t necessarily reflect my own opinion. As I have already mentioned, the positional argument for fn:for-each has already been appreciated and utilized by someone who writes productive code with our implementation (and I was surprised to learn that fn:for-each is used at all, as I always use for or !).

This is a very helpful discussion that has strongly convinced me even more

I agree it’s helpful. We had the chance to read a lot about different personal preferences and convictions. Personally, I have learnt to appreciate the enhanced positional flexibility in the past weeks. This is my own practical experience; it’s obviously not up to me to judge about the perception of anyone else. I also know I have no chance to deliver arguments that are convincing enough to everyone reading this.

Similarly, we have no objective criteria to decide whether Microsoft or JavaScript/Kotlin have got it right. I rather think a majority vote may help us here.

What I think we all can do is gather more opinions and interview other users (ideally, without bias). At least for me, it has always been constructive to get to know which recently added features are embraced, rejected and ignored by our user community. The judgements didn’t necessarily reflect my own opinion. As I have already mentioned, the positional argument for fn:for-each has already been appreciated and utilized by someone who writes productive code with our implementation (and I was surprised to learn that fn:for-each is used at all, as I always use for or !).

Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function.

I would be happy if we have such a function in case this is a new, different function from the well-established folds.

Having such a separate function will give us a better visibility into its actual frequency of usage - thus we will have really objective data.

@ChristianGruen
Copy link
Contributor

Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function.

Currently, this is the consistent solution, as the other HOF functions have been enhanced with position arguments as well. In this matter, Microsoft and JavaScript have a commonality: They have both made a consistent design choice.

If we believe that the addition of position arguments should be discussed again, the most productive solution could be to discuss it in a separate issue.

@dnovatchev
Copy link
Contributor

Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function.

Currently, this is the consistent solution, as the other HOF functions have been enhanced with position arguments as well. In this matter, Microsoft and JavaScript have a commonality: They have both made a consistent design choice.

If we believe that the addition of position arguments should be discussed again, the most productive solution could be to discuss it in a separate issue.

Yes, as I demonstrated at our meeting this week, the folds definitions either produce totally wrong result or even compile-time errors. (Maybe because of the attempt to add too much complexity and new "functionality" ?)

And yes, this is a problem concerning all folds and therefore scans. Folds are the base for scans, thus we need this resolved first for folds. Discussing this in isolation just for scans is not too-meaningful.

@ChristianGruen
Copy link
Contributor

Yes, as I demonstrated at our meeting this week, the folds definitions either produce totally wrong result or even compile-time errors. (Maybe because of the attempt to add too much complexity and new "functionality" ?)

This was definitely an editorial lack of attention on my part. I remember I did not spend enough much time on it. Sorry for that.

And yes, this is a problem concerning all folds and therefore scans.

Not just folds indeed: It affects the full range of sequence functions (more than 10) that have function parameters. It’s been a while ago when we have included it to the spec after the proposal was accepted in the weekly meeting – but you could certainly create a new issue for that to discuss it anew.

@dnovatchev
Copy link
Contributor

dnovatchev commented Jul 18, 2024

And yes, this is a problem concerning all folds and therefore scans.

Not just folds indeed: It affects the full range of sequence functions (more than 10) that have function parameters. It’s been a while ago when we have included it to the spec after the proposal was accepted in the weekly meeting – but you could certainly create a new issue for that to discuss it anew.

Hasn't this discussion been going on for the last 2-3 weeks? Aren't we really in the midst of this discussion?

And yes, folds are some specific group of functions, more differentiated than the general group of all sequence functions.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jul 18, 2024

Hasn't this discussion going on for the last 2-3 weeks? Aren't we really in the midst of this discussion?

If I got it right, the discussion in this PR is not positional arguments in general. I believe that moving it into a separate issue would be more productive.

Sorry for closing the PR in between (the Close button is too easy to reach).

@ChristianGruen ChristianGruen added the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked PR is blocked (has merge conflicts, doesn't format, etc.) Enhancement A change or improvement to an existing feature Tests Needed Tests need to be written or merged XQFO An issue related to Functions and Operators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scan-left, scan-right: position argument, array functions
4 participants