Mutations #1290

lhstrh · 2022-07-12T00:00:14Z

lhstrh
Jul 12, 2022
Maintainer

This discussion is meant as a depository of ideas discussed during meetings around the topic of mutations. Please feel welcome to edit.

Related discussions: #1191
Related issues: ...
Related PRs: ...

Top-level mutations in a federation

In a federation (i.e., a top-level federated reactor), each reactor instance in the federated reactor gets mapped to its own process to become a federate. Each such instance is instantiated only once; other instances that it communicates with reside in other processes and are interacted with through the network.

It would seem natural to take a similar approach (i.e., have a unique representation of each element) with other entities in the federated reactor, such as reactions, mutations, actions, state, etc. However, one such element might interact with more than one federate (let's call it a shared element), which then begs the question: which federate should it be mapped to? We've come up with some options:

N case

prohibit the existence of shared elements
map shared elements and reactor instances that share them to the same federate

N+1 case

create a separate federate to host shared elements

Instantaneous effects

WIP...

petervdonovan · 2023-06-04T00:41:24Z

petervdonovan
Jun 4, 2023
Maintainer

Here I begin to discuss the implementation of ideas which I previously have stated only vaguely regarding modal models, causality interfaces, low-level cross-reaction optimization, and the question of what information should appear in an ideal IR for LF.

This is not (yet) a proposal. This is intended to be a living document with the potential (but not the promise) to evolve into a proposal. Read statements in this document as being prefixed by "imagine if" or "don't you think it would be nice if."

User-visible objectives

Recursive instantiation of modal reactors should be possible, together with the memory management/garbage collection features that are associated with recursive instantiation. This should be implemented in such a way as to provide as much power as mutations, and more static analyzability. (The question of why I like recursive instantiation is a separate discussion; here I focus on implementation only.)
- Note that some implementation ideas described here might also be compatible with mutations, for the crux of implementing either alternative is finding a way to make an arbitrarily large change to a subprogram at runtime without breaking the whole program or triggering costly global restructuring.
- I also comment that there should not be a stop-the-world pause, proportional in length to the number of instantiated reactors, when one small part of the program undergoes a mutation/mode change.
There should be an ABI for reactors.

Developer-visible objectives

Local (re-)scheduling of subprograms should be possible in later releases of the implementation without compromising the correctness of the whole programs, subject to a reasonably liberal set of requirements on the interfaces of the subprograms. This type of flexibility toward optimization is an alternative to pursuing excellent performance from the start.
The initial implementation should make scalable sparsity in balanced tree-shaped structures an explicit goal, but not scalable sparsity with respect to bank and multiport widths; such properties are important, but they can be addressed later. The initial implementation should perhaps not even allow banks.
Down to a reasonably low level of the implementation, interactions between reactors should be local. That is, a reactor should have direct interactions only with its immediate children, and centralized, global program state should be minimized (eliminated?). In particular, the reactors should not all interact with the state of one common scheduler abstraction, nor should the implementation rely on tables with global information about all reactors in the program, such as is_present fields, startup reactions, shutdown reactions, or modal reactors.
Compilation logic should also be local. For instance, the level assignment and cycle detection algorithms should not access any more details about the internals of a reactor than necessary after the relevant information has already been exposed from that reactor.
There should be no global level assignment. One reason for this is that it is not scalable to recompute a global level assignment at runtime in the presence of mutations or recursively instantiated modal reactors.

High-level implementation details

The design described here is actually quite similar to the current design because I (mostly) like the current design. The difference is that many things which are global in the current design are recast to be reactor-local.

Lowered/specialized reactor representations

Lowered/specialized reactor representations should come with information (TPOs) about the precedence relations among the reactors' ports as an alternative to exposing more fine-grained information about their contents. The lowest-level compile-time representation of a reactor class should include information not only about the semantically explicit precedence relations, but also about the spurious precedence relations. This is an extension of the "generics" concept, which specializes reactors (thereby changing their interfaces) according to their type parameters: we can further specialize reactors, and further restrict the applicability of their interfaces, by associating them with their spurious precedence relations. This is a generalization of Ani's TPO concept from federates to arbitrary reactors.
To schedule the lowered representation of a given reactor $A$ (where $A$ is specialized up to a desired TPO and concrete types), select any deadlock-free combination of TPOs of $A$'s children that respects the semantically explicit precedence relations among $A$'s children's ports. Those TPOs will be schedulable by the induction hypothesis. The base case is when $A$ has no children.
The semantically explicit precedence relations of a reactor constitute an interface that is the LUB of what its interface would be when it is in any one of its possible modes. In other words, a reactor should expose the the least conservative overapproximation of all of its modes of operation. Such an interface defines a set of valid TPOs and is an overapproximation iff its corresponding set of valid TPOs is a subset of the semantically explicit interface's set of valid TPOs.

Executing local levels

Define the executables of a reactor $A$ to be the reactions, actions, timers, and startup/shutdown triggers of $A$, as well as the input ports of the immediate children of $A$ and the output ports of $A$.
Any given reactor $A$ should have local level assignments. These level assignments should specify the levels of the executables of $A$. They should also specify the levels of all exposed ports of the immediate children of $A$. In this level assignment, only the reactions and ports should have levels greater than zero. There is also a special level that is later than all levels that have executables in them.
The execution of a level $L$ of $A$ should consist of executing those executables of $A$ which are at $A$'s level $L$. It should also involve execution of callbacks registered by $A$'s children in $A$'s level $L$.
The execution of a level $L$ of $A$ happens when $A$ receives information from its containing reactor about which of its levels are ready to execute.
Execution of an output port of $A$ or an input port of a child reactor of $A$ should mean the same as informing the parent of $A$ or child reactor of $A$, respectively, that the data on the executed port is known and ready for processing.

MLAAs: Initiating level execution

A reactor $A$ receives information from its containing reactor $B$ about which of its levels are ready to execute when $B$ executes a callback that is registered by $A$.
$A$ knows where to register its callback because its parent $B$ tells $A$ the location of the callback array of $B$ that is associated with the first level of $B$ that is greater than the first $k$ levels of the ports of $A$, for all $k$.
A child reactor $C$ of $A$ guarantees that its output port $q$ is set when it is notified that all inputs having level lower than that of $q$ in the TPO of $C$ are known.

TAGs: Initiating tag execution

Any given reactor $A$ should have a local event queue.
When a logical action is scheduled, it should go on a reactor's local event queue. The reactor should also register a callback in its container's event queue corresponding to the earliest event on its local queue.

Implementing federated execution

Observe that the API exposed by a given reactor allows full control of time and level advancement, as well as receipt of messages.
Wrap a federate (which is a reactor exposing such an API) with a program that controls its time and level advancement, but that implements the communication protocol for federated execution, which uses sockets and which permits asynchrony.
Let the container of the federate use a proxy in place of the federate which abstracts the asynchrony between itself and the wrapper of the federate.
Let there be two message types: TAG and NET, where TAG is not a tag advance grant but a tag-and-local-level advance grant. The local level referenced is relative to the TPO of the current federate; it is up to the container of that federate, which acts like a code-generated RTI, to interpret that local level in the context of its other child reactors and output ports. Furthermore, the NET refers to the conservative next event tag-and-local level which it would have in the presence of all possible network inputs. This eliminates the need for an LTC because like an LTC, this type of NET would be sufficient to unconditionally guarantee the absence of all activity before a given logical time-and-local-level. Tag-and-local-level advance grants may also obviate the need for null messages/"port absent" messages.
Design should be based on the principle that the more fine-grained TAG (including level) is sufficient to eliminate the deadlock problem because it describes the level of detail on which there are no cycles. PTAGs are not necessary.

Lower-level implementation details

The execution of an input port $p$ of a reactor $A$ may involve the execution of one or many reactions that are associated with $p$; this is an implementation detail of $A$ that is encoded in a callback that is provided by $A$ and that need not be known by the parent of $A$. This is a non-required performance optimization that is closely related to the "last enabling reaction" optimization currently implemented in the C runtime.
Only lowered/specialized reactors can be shipped in object files; however, one object file can include a broad collection of many implementations corresponding to different combinations of types and TPOs. The ability to use object files at all is a positive feature that not all languages that are similar to LF can have. "Due largely to its complex semantics, the Esterel language has no mechanism for separate compilation or pre-compiled component libraries" (write Potop-Butucaru, Edwards, and Berry).
There is no requirement that the callback passed from a child to a parent originated at that child, nor is there a requirement that the callback passed from a parent to a child originated at that parent. This allows zero-overhead hierarchy, like what we currently have in the C target (but not, to my knowledge, in the C++ target). It has been argued that zero-overhead hierarchy is a premature optimization, but I do not reject the possibility that it could be very valuable.
There should exist a common ABI for composable simulations. Reactors should implement this ABI, and it should be possible to implement "native" reactors that are not written in LF, but that are usable in LF because they implement the ABI. Furthermore, the ABI should allow the mixing of reactors that may be implemented using completely separate toolchains, e.g. because different toolchains are implemented as part of separate research efforts. Concretely, this ABI might be a struct format that provides some collection of function pointers; this must be used instead of globally defining functions because of namespace issues.
The backing representation in memory of data types of a reactor and of its state variables, as seen by users of the reactor that interact with it via the common ABI, must be specified at the language level. There is no requirement that this backing representation be used, but it must be possible to make it appear as though it were used.
The lower-level representation of the interfaces of reactors should not have any notion of time, but this lower-level representation should expose information about some number of "ticks" that can be interpreted by application code as time. Concretely: a reactor's lower-level interface may be specified by a TPO of the form ((p0 p1 p2 ...)*)*, where p0, p1, p2, ... are ports; this means that the ports can be present in the specified order in successive sub-sub ticks corresponding to levels for an arbitrary number of sub-ticks corresponding to microsteps, and all this can happen for an arbitrary number of super-ticks corresponding to nanoseconds. The notion of time should come from interpreting the super-ticks as nanoseconds, and it should be connected to reality by writing code that externally manages the execution of a given reactor so that it advances in sync with physical time. Furthermore, it may be desirable to leave the option open to allow TPOs to take other forms, e.g. with multiple levels of hierarchy below microsteps.
Time advancement of the top-level reactor should be managed by an "execution shell" like what is used in most implementations of Esterel.
Using the ABI it should be possible to
- Advance forward by k ticks (levels, microsteps, or nanoseconds), where k is a natural number.
- Conclude the current repeated group ( ... )* of ticks.
- Register events at the current tick, e.g. by executing the callback of an input port.
- Ask how much memory is required in order to instantiate a reactor.
- Instantiate with the current time and get input callbacks.
- Provide output callbacks.
- Provide a callback for registering an event on an event queue.

Speculations that I still need to think about more

In the long run, it should be possible to implement dynamic joining and leaving of federates in the form of mode changes in arbitrarily nested reactors. It should be abstracted whether a given reactor is just a regular reactor, or if it is in fact coordinating with child reactors that are federates. Such a reactor would likely have extra logic, much like an RTI, and could be implemented in a general-purpose language taking care that it meets the common ABI.
In local optimizations that flatten hierarchical subprograms or otherwise cross syntactic reactor boundaries by "unrolling" contained reactors, static information must be known about the structure of the subprogram that is unrolled. Such information can be computed for each possible combination of modes in a reactor and its children, or for the combinations that are anticipated to be the worst case or common case or that otherwise are anticipated to be the most performance-critical.
The interface of a partially lowered reactor need not be limited to the minimum information necessary to define the set of TPOs. Other information could be included that might be useful for static analysis (see this discussion). But this is too complicated to be considered in the near future.

Banks might be implemented as trees that flatten themselves at initialization time. A bank of $k$ reactors $A$ might be expressed as new Bank<A, k> where Bank is a reactor.
Common patterns that use banks might alternatively be implemented extralinguistically as "parallel patterns" in reactors that are written in general-purpose languages but that implement the ABI and accept reactors that implement the ABI.

Possible optimizations

A given reactor can have separate queues for different kinds of reactions. Instead of using a reaction function pointer, the reactor can execute the reaction function on every element of the queue when the queue's level is reached.

Addressing potential concerns

Code size and compilation time: Our generics implementation is akin to that found in Rust and C++ ("stenciling"). There are other ways to get behavior similar to generics, such as the Java approach, which is similar to the void* approach. The downside is that code size can be increased. Also, similar pieces of code can get compiled multiple times, which is redundant work.

I concede that code size is an important problem for embedded platforms. However, this is a problem that can be solved later, even after a stenciling approach is fully implemented. One approach would be to keep a content-addressed store of IR that is duplicated, and just reuse the same functions for subsequent usages of the same piece of IR.
Compilation times are not currently at the forefront of my attention, to my knowledge. Our current implementation does expensive things, including checking a fully expanded graph representing all reactor instances and their ports for cycles and in some cases generating relatively large amounts of code for initializeTriggerObjects (although this latter issue was much alleviated by heroic efforts by Edward). Performance optimization should focus on the biggest problems first.

Overall, the code size and compilation time issues introduced by this approach might be small in comparison to other code size and compilation time issues. Indeed, they might even solve some of these other issues. For example, they would enable compositional cycle detection, which can result in asymptotically better performance in programs with repeated structure (and even infinitely better performance in the case of infinite programs -- programs with recursive instantiation are infinite). They could also reduce code size by simplifying the startup code.

Heap space: This approach involves creating more events than would be needed if there were just one central event queue. In particular, it requires child reactors to create extra events in their parent reactors to remind their parents to activate them again when they have something to do.

This problem can be mitigated by flattening subgraphs to reduce the number of layers of hierarchy.
The interaction between the program and the memory system is more complex than just the amount of memory used. With a central event queue accessible to all cores, there is cache coherence overhead. Pieces of the event queue are duplicated across the L1 caches of the different cores, or perhaps across scratchpads if this is FlexPRET (though this is highly speculative). So if the cost of data structure replication is considered, it is possible that modularizing in this way could effectively reduce the amount of memory overhead.

Performance: This proposal introduces abstractions. Abstractions can introduce a cost. In particular, if communications between sibling reactors have to go through the common parent, then this extra indirection can introduce cost.

This can also be mitigated by flattening subgraphs.
An overarching idea in this approach is to make extensive use of callbacks. It is true that callbacks should be exchanged only between child and parent. However, when a callback is passed from reactor $A$ to reactor $B$ (where one of these is a child, and the other a parent), there is no requirement that $A$ is the originator of the callback. There is not even a requirement that $A$ knows where the callback came from -- that is the abstraction. All $A$ knows is that "this callback is what has to happen when $B$ does $x$." The result is that we can get all the benefits of abstraction, and in certain cases we can ensure that the indirection cost is only incurred on startup or perhaps on a mode change, when callbacks are passed around.
Abstractions are necessary in order to prove that optimization-enabling program transformations do not change the observable behavior of subgraphs. They reduce the problem to demonstrating that changes do not cross the abstraction boundary.

Advantages over alternative designs

I believe the design sketched above to offer solutions to several architectural problems that we have faced. If one does not follow the design sketched above, then one must consider alternative solutions to these problems. Here are some alternative solutions; I will explain why I am dissatisfied with them.

Using mutations instead of mode changes

It is not yet clear to me exactly what mutations should mean. However, they seem to be associated with almost arbitrary changes to program structure. Reactors can add children within themselves and connect the children to any of their other children, and they can add siblings and connect to their siblings.

Mode changes are distinguished by the fact that there is only a limited, pre-defined set of possible changes that can be made within any single reactor. The possible local states can be enumerated concisely at compile time. Since this allows for a combinatorially large number of global program states, this should not place fundamental limitations on expressiveness. It should, however, allow invariants regarding the local program structure to be trivially proved by simply checking that they hold true for each of the possible modes. Perhaps the most important invariant is whether a given pair of ports will ever have a zero-delay path between them, but it is possible to imagine other interesting invariants such as whether the presence of a given port at a given time guarantees the presence of another port at another time.

Supporting mutations with widely spaced levels

Perhaps the most fundamental problem with dynamic program structure is the need to recompute a global level assignment at runtime. One way to avoid this is to start with levels that are widely spaced; then, if reactions are added, give them levels that are in between the levels that are already used.

There are two problems with this approach. The first problem -- that this makes it impractical to use a statically allocated array of pointers to statically allocated arrays of enqueued reactions -- is easily circumvented, although it may be worth keeping in mind since such a data structure is central to some of the most efficient execution strategies that we currently support.

The second problem -- that this approach doesn't really work in the sense that it cannot in general obviate the need to do global level reassignments at runtime -- is more grave. It may seem that because a 64-bit number can be astronomically large, there is plenty of space to add more levels in between the existing ones. However, consider the case when we want to add a reaction whose level is between two levels. Reactions can be added before it or after it, so we may wish to put its level halfway between the two existing levels. In such a scheme, the minimum spacing between levels can in the worst case decrease by a factor of two for each reaction that is added! By the time 64 reactions are added, we must recompute the global level assignment. It is not necessary to prototype the "widely spaced levels" strategy in order to find out whether this will be a problem; we know it will be a problem, and we already understand it well, so let's not waste time unless we know of a solution.

Supporting mutations with run-time cycle detection

Another problem that must be solved when programs can change their structure at run-time is that in the absence of guaranteed invariants that are sufficient to prove cycle-freedom, one must contend with cycles that appear at runtime.

This is related to the level assignment/scheduling problem in that it is meant to discover programs that are not schedulable. The design space is large and dependent on the scheduling strategy. However, it is in any case highly desirable to ensure that errors relating to cycles happen at compile time or initialization time rather than accepting that they may occur at any time, or -- even worse -- failing to detect cycles at all. Furthermore, no cycle detection strategy, compile-time or run-time, other than the one sketched above, has yet been proposed that would not a) occur at run-time, and b) scale linearly in the number of instantiated reactors and in the number of mutations executed.

Supporting federated execution with LF programs that are annotated with TPO

I think that the TPO solution that we are currently moving toward is preferable to spawning many threads to support unordered reactions, and I think am glad that it guarantees deadlock freedom. I also think that fed-gen -- generating separate LF programs for each federate -- is an improvement over the preceding design.

However, I think that the architecture proposed here is a desirable refinement on the design that we are moving toward now. It allows an RTI-style coordination of time advancement without creating a single point of failure and without requiring one central process (an RTI) to know the structure of all communicating federates. This is because it should be possible to compile a black-box reactor that implements the ABI and that coordinates the time advancement of only the subprogram that it contains, rather than coordinating the time advancement of any whole program. Similarly, it should be possible to compile a wrapper around the instantiated federate that handles network messages and regulates the time advancement of the instantiated federate; this allows a compiled reactor that implements the ABI to be used in a federate without changing the runtime implementation of the compiled reactor.

Supporting custom timescales

We currently support only the nanosecond timescale. This is part of the semantics of LF in the sense that the smallest time units in LF are nanoseconds. However, it is also part of the low-level implementation. This has caused problems for us when we have considered using microseconds instead of nanoseconds on 32-bit platforms. In the current runtime implementation for both C and C++, changes would have to be made deep within the runtime to change the base time unit to any unit other than nanoseconds.

This proposal allows us to remove the timescale from the low-level implementation by moving the code for waiting for the next event on the event queue and for assigning explicit meaning to the fields of tag objects. Rather than keeping this code in the heart of the runtime, it can be moved out and into the execution shell.

An added benefit of moving such code into the execution shell is that it makes compiled reactors less platform-specific.

Using LF programs in non-LF-based simulations

Although we have users who wish to use other simulations (MQSim) in LF programs, we are not currently aware of users who wish to use LF code in other simulations. This is probably because LF is new enough that there do not currently exist any complex, renowned simulations like MQSim that are implemented in LF. However, this will change if LF is successful.

In order to integrate a simulation written using one framework into a simulation written in another framework, it is necessary to manage the time advancement of the nested simulation. In current implementations of Lingua Franca, it would be necessary to make changes deep within the runtime; for example, in the C implementation, the wait_until function would likely need to be changed to allow it to wait on a different kind of condition.

By contrast, if an LF program consists of a main reactor that has some ABI, combined with an execution shell that interacts with the ABI to manage time advancement, then all that would be required in order to integrate the LF program into a containing simulation would be to discard the execution shell and adapt the containing simulation to interact directly with the ABI.

Integrating multiple target languages into one LF program

Although this has not been done yet, it is certainly possible to write LF programs involving many target languages; this can be trivially done by having LF programs use C, C++, or Rust code that calls into compiled code in other languages. However, to take advantage of this in the current, either the LF maintainers or their users must maintain complex build systems that integrate multiple toolchains. It is not as simple as providing object files with reaction functions because the reaction functions have to be compatible with certain self structs and port structs, which must be defined in generated header files which must in turn be translated between languages -- this is possible and tools exist for this, but it may not prove to be highly convenient and in any case it forces reaction functions to operate on very specific data structures, which might not be idiomatic for different target languages and which may not come with the same compile-time guarantees that would be possible when larger pieces of code are compiled to or for e.g. Rust.

Construction of multilingual programs could be greatly simplified by defining an ABI for reactors and allowing reactors (compiled down from a TPO-specifying JSON IR using arbitrary toolchains) to be distributed in object files. Certainly not all aspects of the proposal here are required for defining an ABI -- for example, any current implementation of LF that compiles reactors to a collection of procedures in machine code could be documented in detail so that the various procedures could in principle be used as a well-defined ABI -- but to my knowledge this has not yet been done, and for the C target at least I know that it would be difficult due to the complex initialization in initializeTriggerObjects which be difficult to describe as a clean interface to any subprogram.

Implementation steps

Define a JSON format that can represent reactor definitions. Reactor definitions should have TPOs (that are just hierarchical ticks, sub-ticks, sub-sub-ticks, etc. without explicit notions of nanoseconds and microsteps), lists of instantiated contained reactors, and lists of connections, where each connection involves a single port on its left and a single port on its right.
To minimize the complexity of an initial working implementation, do not implement state variables or reactions. Also do not implement actions (but do implement local event queues). Reactors with these features must be blackboxed reactors that implement the ABI.
Write an interpreter for the JSON format by treating each reactor as a component that implements the basic interface, which is close to the interface of the ABI.
Make the JSON format easier to work with if needed, e.g. by allowing it to be converted to and from a more human-readable language. Avoid discussions about this human-readable language. Anyone who does not like the human-readable language should develop their own human-readable representation of the canonical JSON core intermediate language.
Implement modular compile-time cycle detection and make it work with modes and recursive instantiation.
Manually port some existing LF programs to this JSON format.
Lower the JSON IR to binaries without attempting any optimization.
Look for ways to introduce state variables and reactions into the core JSON language.
Look for ways to introduce actions into the core JSON language while making them analyzable, e.g. by constraining them to certain well-defined time smears.
Possibly add lifetime annotations in lieu of implementing tokens or smart pointers.
Find a way to lower a subset of LF to the JSON IR.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutations #1290

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Mutations #1290

lhstrh Jul 12, 2022 Maintainer

Top-level mutations in a federation

N case

N+1 case

Instantaneous effects

Replies: 1 comment

petervdonovan Jun 4, 2023 Maintainer

User-visible objectives

Developer-visible objectives

High-level implementation details

Lowered/specialized reactor representations

Executing local levels

MLAAs: Initiating level execution

TAGs: Initiating tag execution

Implementing federated execution

Lower-level implementation details

Speculations that I still need to think about more

Possible optimizations

Addressing potential concerns

Advantages over alternative designs

Using mutations instead of mode changes

Supporting mutations with widely spaced levels

Supporting mutations with run-time cycle detection

Supporting federated execution with LF programs that are annotated with TPO

Supporting custom timescales

Using LF programs in non-LF-based simulations

Integrating multiple target languages into one LF program

Implementation steps

lhstrh
Jul 12, 2022
Maintainer

petervdonovan
Jun 4, 2023
Maintainer