Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing and contradicting coverage of "result" flag and multiple end/final nodes in process graphs #547

Open
soxofaan opened this issue Oct 29, 2024 · 2 comments
Milestone

Comments

@soxofaan
Copy link
Member

soxofaan commented Oct 29, 2024

Some excerpts of the spec (current draft branch): L298 L326 L474

Technically, a process graph is defined to be a graph of connected processes with exactly one node returning the final result
...
.. multiple end nodes are possible

"exactly one node returning the final result" is in conflict with the statement that multiple end nodes are possible, as discussed here as well:

One of the nodes in a map of processes (the final one) MUST have the result flag set to true, all the other nodes can omit it as the default value is false.

It's unclear if the "One of the nodes" here means "exactly one" or "at least one". I guess the intention is "exactly one", but the current wording still allows the (wrong) interpretation that multiple nodes can have result=true.
Also, as discussed at Open-EO/openeo-processes#279 (comment), the presence of a result node is actually only necessary for "child" processes, so the spec could also specify "at most one".

Another thing to nitpick about in that sentence is "the final one":

  • seems to suggest there is a single "final" node, which is not true if you heave multiple "end" nodes in your graph
  • it suggests "final node" and "result node" are synonyms, but I don't think we should establish such an equivalence: you can have multiple "final" nodes, but only one "result" node. Even the spec itself suggest that:

    Please be aware that the result node (result set to true) is not necessarily the last node that is executed. The author of the process graph may choose to set a non-end node to the result node!

Having such a node is important as multiple end nodes are possible, but in most use cases it is important to exactly specify the return value to be used by other processes. Each child process graph must also specify a result node similar to the "main" process graph.

First, a nitpicking note: the first sentence here introduces "end node". So at this point, the spec is mixing usage of "result node", "final node" and "end node", in some places it vaguely suggest these are equivalent, while in other places it contradicts this.

Regardless, the message of these two sentences is a bit unclear to me. The first sentence seems to start talking about the top-level graph (where multiple end nodes are a valid use case), but then with the "used by other processes" it seems to switch over to child process graphs. But then actually the next sentence talks about "child process graphs" explaining the need for a result node based on the "main" process graph. While it's actually the other way around: child process graphs need a single result node, while the "main" process graph doesn't.

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2024

Thanks for writing this up in detail. I agree with most of this.

We shouldn't make it "at most" because that could be breaking for some (I think for all of the JS ecosystem at least), so we should keep the result node also on the top-level. Also, the result node at the top-level is only really irrelevant when submitted for execution. If stored as a UDP, it actually has a meaning as it's the return value of the UDP. So it must be "exactly one".

Other than that, happy to review a PR. Otherwise we'll keep it on the agenda for 1.3.0.

@m-mohr m-mohr added this to the 1.3.0 milestone Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants