Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immediate start of federates with STP offset under decentralized coordination & fix target code STP_offset #2368

Merged
merged 12 commits into from
Jul 23, 2024

Conversation

Depetrol
Copy link
Collaborator

@Depetrol Depetrol commented Jul 15, 2024

In decentralized coordination, the behavior used to be to wait for STP_offset before starting the worker threads. With this change, the federates start immediately without waiting for STP_offset. A test has been added for this change.

When the user sets STP_offset to FOREVER, it enables dataflow-like behavior, where a federate will wait until all network input ports have received a message with tag g or greater before advancing its tag to g.

This PR is linked to this reactor-c PR.

This PR also includes a fix to lfc federated target that allows target language expressions such as {= FOREVER =} as STP_offset. Without this fix, lfc would throw an error when STP_offset is any {= ... =} expression:

lfc: fatal error: Error running generator
java.lang.NullPointerException: Cannot read field "unit" because "time" is null
        at org.lflang.generator.c.CTypes.getTargetTimeExpr(CTypes.java:69)

@Depetrol Depetrol requested a review from edwardalee July 15, 2024 21:07
@Depetrol Depetrol added enhancement Enhancement of existing feature bugfix labels Jul 15, 2024
cmnrd
cmnrd previously requested changes Jul 16, 2024
Copy link
Collaborator

@cmnrd cmnrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change only works for the C target, and it relies on inspection of target code. The LF compiler does not understand C and interpreting target code should be left to the target compiler.

time is a type in LF, and we need to make valid time literals part of the language. There are two ways to achieve this:

  • Make all target code expressions valid time values and leave the type checking to the target compiler. Coincidently, Ability to use of target code expressions for time values in C++ #2369 implemented this recently for the C++ target. We could also enable this for C, but it would come at the cost of loosing static analyzability.
  • Introduce a time literal in the LF syntax that represents the C value FOREVER. Since this would be a change in the language, I think it should go through the RFC process. It would also need to be implemented for all targets.

Could you explain what your PR is trying to achieve? Dataflow like behavior is quite vague.

@Depetrol
Copy link
Collaborator Author

Without this change, setting STP_offset = {=FOREVER=} is valid syntax accepted by the compiler, but would cause lfc to crash, throwing a not user friendly error message: java.lang.NullPointerException: Cannot read field "unit" because "time" is null.
I think fundamentally resolving this issue would require us to modify LF syntax and introduce a time literal without the {==} braces so that we set STP_offset = FOREVER.
The current change inspects target code, but it inspects for a special symbol in LF syntax. It works for Python target as shown in the test file, and I think this should work for other targets as well because the change is made in ASTUtils. This could be a temporary fix that would later be removed after the RFC process is complete.

Regarding the dataflow-like behavior, in some cases the user wishes to use LF as dataflow reactions, as shown in the test file Dataflow.lf. Specifically, the logical time stays the same while microstep increases, and the reaction acts like a synchronous dataflow application. However to keep the logical time unadvanced, STP_offset needs to be set to FOREVER. This could be useful when the user is running a loop with logical delay of 0 in decentralized coordination.

@cmnrd
Copy link
Collaborator

cmnrd commented Jul 17, 2024

Without this change, setting STP_offset = {=FOREVER=} is valid syntax accepted by the compiler, but would cause lfc to crash, throwing a not user friendly error message: java.lang.NullPointerException: Cannot read field "unit" because "time" is null.

This is a bug in the LF compiler (more precisely, the Python code generator). STP_offset is given as a target code expression, not as a time literal. So the code generator should not invoke getLiteralTimeValue as the given expression is not a valid time value (also see the comment above the function). A simple fix would be to change the Python generator. Instead of calling getLiteralTimeValue on a target code expression, the generator could just insert this expression verbatim, like we do with all target code expressions. This is robust, as then the Python interpreter evaluates the expression and we don't make assumptions about its meaning.

The current change inspects target code, but it inspects for a special symbol in LF syntax.

FOREVER is not LF syntax. That is why it needs to be escaped using {=...=}. It is a value defined in the C runtime and in the Python wrapper. It has no special meaning to LF, and it is not defined in other runtimes.

The current change inspects target code, but it inspects for a special symbol in LF syntax.

This is brittle as the LF compiler is neither a C compiler nor a Python interpreter. If a user places FOO=FOREVER in the preamble, wouldn't it be reasonable to expect that we can use {= FOO =} as the STP offset?

It is also brittle, because now the LF compiler defines what FOREVER means when used in a target code expression. What if we adjust the value in the runtimes? Then the LF compiler would magically reinterpret this value.

This could be a temporary fix that would later be removed after the RFC process is complete.

We have many such "temporary fixes" in the code base, which accumulate to a significant amount of technical debt. To avoid more accumulation of technical debt, we made a commitment to evaluate PRs based on relevance and soundness (see CONTRIBUTING) and I do not consider the proposed solution to be sound for the reasons stated.

Regarding the dataflow-like behavior, in some cases the user wishes to use LF as dataflow reactions, as shown in the test file Dataflow.lf. Specifically, the logical time stays the same while microstep increases, and the reaction acts like a synchronous dataflow application. However to keep the logical time unadvanced, STP_offset needs to be set to FOREVER. This could be useful when the user is running a loop with logical delay of 0 in decentralized coordination.

I do not understand how setting the STP offset to FOREVER supports this use case. Doesn't an STP offset of FOREVER literally mean that the federate has to wait forever until it can be sure that there are no other earlier events? Also, I don't understand how the STP_offset relates to the advancement of logical time in your program. In your program, you only use microstep delays. So also with any other value for STP_offset your program should never advance beyond (t_0, N), where N is the maximum microstep and t_0 is the timestamp of the start tag.

Also note that if you use microsteps in this way, you have to be careful, as you will eventually run out of microstep values.

@Depetrol
Copy link
Collaborator Author

Depetrol commented Jul 17, 2024

A simple fix would be to change the Python generator. Instead of calling getLiteralTimeValue on a target code expression, the generator could just insert this expression verbatim, like we do with all target code expressions. This is robust, as then the Python interpreter evaluates the expression and we don't make assumptions about its meaning.

Thanks for the comment -- I implemented your proposed fix by inserting the expression verbatim if the expression is a code implementation. This implementation indeed fixes the bug for STP_offset in a more robust way. However this implementation will not work for the STP(STAA) value.

So also with any other value for STP_offset your program should never advance beyond (t_0, N), where N is the maximum microstep and t_0 is the timestamp of the start tag.

I've updated the test. The previous test was setting STP_offset, but the one that should be set is STP. The intention is to set STAA(STP) to FOREVER (currently set to 1000s because FOREVER is not supported) so that each reactor will wait until all messages have arrived. Otherwise STP violations would occur. However in this case, simply inserting the expression verbatim will not work, because the value of STAA(STP) is used in function findMaxSTP in compile time. To support FOREVER value in STAA(STP), the only option is to add FOREVER into Lingua Franca Syntax, or to do special handling for the {=FOREVER=} keyword.

Also note that if you use microsteps in this way, you have to be careful, as you will eventually run out of microstep values.

Microstep overflow is a concern in this dataflow fashion. If the application runs at 120 loops per second, a 32-bit unsigned integer would overflow in ~400 days. This could be resolved with changing the microstep to a 64-bit unsigned integer.

@Depetrol
Copy link
Collaborator Author

I've created a GitHub issue #2374 for parsing STP target code. This is a bigger change and perhaps should be addressed in another PR. I kept inserting the expression verbatim for STP_offset in this PR because it seems like a valid fix.

@Depetrol Depetrol requested a review from cmnrd July 19, 2024 17:57
Copy link
Collaborator

@cmnrd cmnrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adjusting the PR. I will dismiss my request for changes, but I will leave approving to Edward as I cannot say I understand the meaning of setting the STP offset to forever. For me, this indicates that federates should wait forever before processing an event. If the idea is to avoid waiting, why not set the offset to 0? Either way, this should be documented appropriately and explained to users.

@cmnrd cmnrd dismissed their stale review July 23, 2024 09:50

Concerns have been addressed.

@Depetrol
Copy link
Collaborator Author

I think a more accurate description of this PR that reflects code change is immediate start of federates with STP offset under decentralized coordination & a fix that allows target code as STP_offset. Setting the STP to a large time is a custom user behavior that could utilize this change.

@Depetrol Depetrol changed the title Allow dataflow-like behavior when setting STP_offset to FOREVER Immediate start of federates with STP offset under decentralized coordination & fix target code STP_offset Jul 23, 2024
Copy link
Collaborator

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited the description to clarify why it is useful to set the STP_offset to {= FOREVER =}.

@edwardalee edwardalee added this pull request to the merge queue Jul 23, 2024
Merged via the queue into lf-lang:master with commit ee141f8 Jul 23, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix enhancement Enhancement of existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants