Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a TLA+ Specification: EWD998Chan #75

Merged
merged 26 commits into from
Apr 6, 2024

Commits on Apr 10, 2023

  1. Implementation of EWD998 based on the TLA+ spec EWD998Chan.

    The implementation took two hours and did not involve running the code.
    
    Disclaimer: The code first used a home-grown wire format that we replaced
    with Json to remove error-prone ad-hoc parsing.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    0d341ee View commit details
    Browse the repository at this point in the history
  2. This node possesses the token, and therefore must retain it unchanged.

    The resolution of this translation error took approximately 15 minutes and required the execution of the code using a debugger.
    
    [Bug]
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    2e24bf9 View commit details
    Browse the repository at this point in the history
  3. Add (Json) logging.

    Notice that logging happes at two points in time when the system is in a
    consistent state; a) when a message is received^1 and b) when a message
    is sent.
    
    1 A message is conceptually received, not when the receiver thread accepts
      the message from the ServerSocket, but when the message is taken out
      of the inbox.  While it might be tempting to log in the receiver thread,
      it would break the happen-before relationship of some log statements.
      For example, it could appear as if a node first received a payload
      message that activated the node, and it then passes the token (which
      a node is only allowed to do when it is inactive).
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    69f2708 View commit details
    Browse the repository at this point in the history
  4. A first naive trace validation mapping:

    The approach by Ron Pressler [1] does not work well for validating a trace that is extremely partial, i.e., it only contains messages that cross the network.  EWD998's trace neither has the values of all variables nor logs every state of the system.  Using non-determinism to define the missing values and states is not practical, as we end up in a TLA+ fragment that TLC cannot handle, such as the following sub-formula of the next-state relation, where the first conjunct defines an infinite set of functions from which the second conjunction filters an infinite subset of functions.
    
    ```tla
    SomeAct ==
        /\ ...
        /\ inbox' \in [Node -> Seq(Msg)]
        /\ \E idx \in DOMAIN inbox[Initiator]':
                inbox[Initiator][idx]' = Trace[TLCGet("level")].msg
        /\ ...
    ```
    
    A workaround is to define TLC state and action constraints that match each line in the trace to the current state of a prefix of some behavior defined by the original spec.  The trace can only be mapped to a behavior iff TLC can match every line in the trace to a state in the prefix.  If not, a TLC postcondition will be violated the unmatched logline.
    
    [1] https://www.youtube.com/watch?v=TP3SY0EUV2A
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    5e0b9f5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b501a91 View commit details
    Browse the repository at this point in the history
  6. A fundamental problem of distributed systems is the absence of a glob…

    …al clock, which is why the trace/log file is unordered. For example, the receiver of a message might log the receipt of a message before the sender logs the send event. The canonical solution in distributed systems is a vector clock.
    
    Note that EWD998ChanID models a vector clock to interface with https://bestchai.bitbucket.io/shiviz/
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    61d6be7 View commit details
    Browse the repository at this point in the history
  7. Include the node's identify in the json logging to be compatible with…

    … ShiViz (https://bestchai.bitbucket.io/shiviz).
    
    Regex:
    
    ^{"event":"(<|>|d)","node":(?<host>[0-9]?),"pkt":{("snd":(?<src>[0-9]?),"rcv":(?<rcv>[0-9]?),"msg":(?<event>({"type":"tok","q":-?[0-9]*,"color":"(white|black)"}|{"type":"(pl|trm|w|d)"})),|)"vc":(?<clock>.*)}}
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    7897628 View commit details
    Browse the repository at this point in the history
  8. The naive approach faces a challenge because the high-level spec EWD9…

    …98Chan models token passing as a single, atomic step, but in a real system, passing the token is not atomic. The token is transmitted via the network.
    
    To handle this problem, this technique introduces (explicit) stuttering steps, where the variables remain unchanged, but the length of the behavior increases.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    98c4d40 View commit details
    Browse the repository at this point in the history
  9. The trace validation still faces challenges, including the fact that …

    …nodes deactivating are not logged in the implementation. This leads to a rejected trace when a seemingly active node passes the token.
    
    TLC's limitation in supporting the composition of actions (compare section 7.3 (page 76ff) of Lamport's Specifying Systems) prevents the  TraceSpec  from defining its next-state relation by conjoining  Deactivate \cdot PassToken.  However, this limitation can be overcome by defining a suitable action that deactivates the node when it passes the token.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    ad8988f View commit details
    Browse the repository at this point in the history
  10. The previous commit showed how (manually) composed actions can handle…

    … missing state changes in the trace/log file, and we see no reason why this approach won't generalize beyond this particular example.
    
    With the EWD998 implementation, however, we can (consistently) log when a node deactivates, thereby increasing the coverage of trace validation by including more system state.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    55e9a64 View commit details
    Browse the repository at this point in the history
  11. Trace validation has begun to reject traces due to the presence of te…

    …rmination ("trm") messages in the implementation, which are not included in the EWD998 model. This represents the first discrepancy that has been discovered between the specifications and the actual code, but it is something that could have easily been spotted through a manual comparison between the two.
    
    To resolve this issue, the TraceNextConstraint has been defined to allow for "trm" messages and is based on finite stuttering, similar to the way the IsRecvToken is handled.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    7f157c8 View commit details
    Browse the repository at this point in the history
  12. Trace validation detects another divergence.

    The SendMsg action in the high-level specification does not allow a message to be sent from a node to itself.  While self-messaging is safe and does not violate the properties of the specification, the implementation will be changed instead.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    ff7a49a View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    75673c4 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    3ec089e View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    27d721c View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    bb52145 View commit details
    Browse the repository at this point in the history
  17. Trace validation found another divergence within ~30 seconds after ge…

    …nerating approximately 10k states.
    
    The initiator initiates another token round even when the previous token round is conclusive. The system is in this state after a token round with all nodes white and inactive except for an active but white initiator.
    
    This is a first attempt at a bugfix.
    
    ```
    << "Failed matching the trace to (a prefix of) a behavior:",
       [ pkt |->
             [ vc |->
                   [ 1 |-> 342,
                     3 |-> 300,
                     0 |-> 216,
                     2 |-> 341,
                     5 |-> 313,
                     4 |-> 313,
                     6 |-> 320,
                     7 |-> 413 ],
               msg |-> [type |-> "pl"],
               snd |-> 0,
               rcv |-> 7 ],
         event |-> "<" ],
       "TLA+ debugger breakpoint hit count 2576" >>  FALSE
    Error: Assumption line 212, col 5 to line 215, col 84 of module EWD998ChanTrace is false.
    ...
    ```
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    5665744 View commit details
    Browse the repository at this point in the history
  18. Revert "Trace validation found another divergence within ~30 seconds

    after generating approximately 10k states." and finally implement the
    EWD998 specification.
    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    4da7373 View commit details
    Browse the repository at this point in the history
  19. Ignore Toolbox files

    lemmy committed Apr 10, 2023
    Configuration menu
    Copy the full SHA
    3dc8465 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2023

  1. Add EWD998ChanTrace.tla spec to CI.

    Signed-off-by: Markus Alexander Kuppe <[email protected]>
    lemmy committed Apr 12, 2023
    Configuration menu
    Copy the full SHA
    15a360d View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2023

  1. Refactor (shorten) EWD998ChanTrace by reusing existing sub-actions of…

    … EWD998Chan where possible.
    
    Signed-off-by: Markus Alexander Kuppe <[email protected]>
    lemmy committed Apr 24, 2023
    Configuration menu
    Copy the full SHA
    72e6509 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2023

  1. Configuration menu
    Copy the full SHA
    7a77cf3 View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2023

  1. Configuration menu
    Copy the full SHA
    f6914f9 View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2024

  1. Configuration menu
    Copy the full SHA
    c8c9159 View commit details
    Browse the repository at this point in the history
  2. jsonschema.exceptions.ValidationError: Additional properties are not …

    …allowed ('config' was unexpected)
    
    [Build]
    lemmy committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    88faa37 View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2024

  1. Mirror PR into markdown note.

    lemmy committed Apr 6, 2024
    Configuration menu
    Copy the full SHA
    115c50c View commit details
    Browse the repository at this point in the history