Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec of the Disruptor concurrency library. #150

Merged
merged 7 commits into from
Sep 18, 2024

Conversation

nicholassm
Copy link
Contributor

The Disruptor is a concurrency library originally developed and open sourced by LMAX Exchange for low latency communication via a ring buffer between producer and consumer threads.

This PR adds a spec of the Disruptor lib and verifies that data races do not occur.

Signed-off-by: Nicholas Schultz-Møller <[email protected]>
Signed-off-by: Nicholas Schultz-Møller <[email protected]>
@ahelwer
Copy link
Collaborator

ahelwer commented Sep 13, 2024

Nice spec! Thanks for the contribution.

@nicholassm
Copy link
Contributor Author

Thanks, you're welcome. :-)

@nicholassm
Copy link
Contributor Author

I can see a lot of checks failing due to different number of states, distinct states, etc. compared to what I get when running the toolbox. Can it be a TLC version issue? Or symmetry vs. non-symmetry sets for model values that differ somehow?

Kind regards
Nicholas

@ahelwer
Copy link
Collaborator

ahelwer commented Sep 14, 2024

Those json fields are optional, although useful as a regression test. I will take a look tomorrow.

@nicholassm
Copy link
Contributor Author

It looks like it's not my model that fails but a module called MCtcp that results in an exit code 1... What gives?

@ahelwer
Copy link
Collaborator

ahelwer commented Sep 14, 2024

Sorry, the spec checking has debug output enabled so it's very verbose. You want to search for the text ERROR:root in the raw log. @lemmy should we keep debug logging enabled for model-checking in the CI? It makes it difficult to sift through the output.

Anyway here is the result:

2024-09-13T21:10:08.0204980Z INFO:root:specifications/Disruptor/Disruptor_MPMC.cfg
2024-09-13T21:10:08.2670270Z INFO:root:specifications/Disruptor/Disruptor_MPMC.cfg in 0.3s vs. 10s expected
2024-09-13T21:10:08.2671430Z ERROR:root:Model specifications/Disruptor/Disruptor_MPMC.cfg expected result success but got 255
2024-09-13T21:10:08.2674970Z ERROR:root:java -enableassertions -Dtlc2.TLC.ide=Github -Dutil.ExecutionStatisticsCollector.id=abcdef60f238424fa70d124d0c77ffff -XX:+UseParallelGC -cp deps/tools/tla2tools.jar:deps/apalache/bin/apalache-mc/lib/apalache.jar:deps/community/modules.jar:deps/tlapm/library tlc2.TLC specifications/Disruptor/Disruptor_MPMC.tla -config specifications/Disruptor/Disruptor_MPMC.cfg -workers auto -lncheck final -cleanup
2024-09-13T21:10:08.2677880Z TLC2 Version 2.20 of Day Month 20?? (rev: caf0c33)
2024-09-13T21:10:08.2680380Z Running breadth-first search Model-Checking with fp 31 and seed -471696565524114047 with 3 workers on 3 cores with 1593MB heap and 64MB offheap memory [pid: 5082] (Mac OS X 14.6.1 aarch64, Eclipse Adoptium 17.0.12 x86_64, MSBDiskFPSet, DiskStateQueue).
2024-09-13T21:10:08.2682070Z Error: TLC threw an unexpected exception.
2024-09-13T21:10:08.2682650Z This was probably caused by an error in the spec or model.
2024-09-13T21:10:08.2683300Z See the User Output or TLC Console for clues to what happened.
2024-09-13T21:10:08.2683890Z The exception was a tlc2.tool.ConfigFileException
2024-09-13T21:10:08.2684460Z : TLC found an error in the configuration file at line 5
2024-09-13T21:10:08.2685000Z It was expecting }, but did not find it.
2024-09-13T21:10:08.2685510Z Finished in 00s at (2024-09-13 21:10:08)
2024-09-13T21:10:08.2685800Z 
2024-09-13T21:10:08.2686070Z INFO:root:specifications/Disruptor/Disruptor_MPMC_liveliness.cfg
2024-09-13T21:10:08.5685730Z INFO:root:specifications/Disruptor/Disruptor_MPMC_liveliness.cfg in 0.3s vs. 10s expected
2024-09-13T21:10:08.5687240Z ERROR:root:Model specifications/Disruptor/Disruptor_MPMC_liveliness.cfg expected result success but got 255
2024-09-13T21:10:08.5691920Z ERROR:root:java -enableassertions -Dtlc2.TLC.ide=Github -Dutil.ExecutionStatisticsCollector.id=abcdef60f238424fa70d124d0c77ffff -XX:+UseParallelGC -cp deps/tools/tla2tools.jar:deps/apalache/bin/apalache-mc/lib/apalache.jar:deps/community/modules.jar:deps/tlapm/library tlc2.TLC specifications/Disruptor/Disruptor_MPMC.tla -config specifications/Disruptor/Disruptor_MPMC_liveliness.cfg -workers auto -lncheck final -cleanup
2024-09-13T21:10:08.5695900Z TLC2 Version 2.20 of Day Month 20?? (rev: caf0c33)
2024-09-13T21:10:08.5698330Z Running breadth-first search Model-Checking with fp 130 and seed -5831432232768886009 with 3 workers on 3 cores with 1593MB heap and 64MB offheap memory [pid: 5083] (Mac OS X 14.6.1 aarch64, Eclipse Adoptium 17.0.12 x86_64, MSBDiskFPSet, DiskStateQueue).
2024-09-13T21:10:08.5700840Z Error: TLC threw an unexpected exception.
2024-09-13T21:10:08.5701540Z This was probably caused by an error in the spec or model.
2024-09-13T21:10:08.5702440Z See the User Output or TLC Console for clues to what happened.
2024-09-13T21:10:08.5703240Z The exception was a tlc2.tool.ConfigFileException
2024-09-13T21:10:08.5704050Z : TLC found an error in the configuration file at line 5
2024-09-13T21:10:08.5704850Z It was expecting }, but did not find it.
2024-09-13T21:10:08.5705510Z Finished in 00s at (2024-09-13 21:10:08)
2024-09-13T21:10:08.5705920Z 
2024-09-13T21:10:08.5706210Z INFO:root:specifications/Disruptor/Disruptor_SPMC.cfg
2024-09-13T21:10:08.8707050Z INFO:root:specifications/Disruptor/Disruptor_SPMC.cfg in 0.3s vs. 10s expected
2024-09-13T21:10:08.8708400Z ERROR:root:Model specifications/Disruptor/Disruptor_SPMC.cfg expected result success but got 255
2024-09-13T21:10:08.8712860Z ERROR:root:java -enableassertions -Dtlc2.TLC.ide=Github -Dutil.ExecutionStatisticsCollector.id=abcdef60f238424fa70d124d0c77ffff -XX:+UseParallelGC -cp deps/tools/tla2tools.jar:deps/apalache/bin/apalache-mc/lib/apalache.jar:deps/community/modules.jar:deps/tlapm/library tlc2.TLC specifications/Disruptor/Disruptor_SPMC.tla -config specifications/Disruptor/Disruptor_SPMC.cfg -workers auto -lncheck final -cleanup
2024-09-13T21:10:08.8716690Z TLC2 Version 2.20 of Day Month 20?? (rev: caf0c33)
2024-09-13T21:10:08.8719090Z Running breadth-first search Model-Checking with fp 9 and seed 5823395496300575269 with 3 workers on 3 cores with 1593MB heap and 64MB offheap memory [pid: 5084] (Mac OS X 14.6.1 aarch64, Eclipse Adoptium 17.0.12 x86_64, MSBDiskFPSet, DiskStateQueue).
2024-09-13T21:10:08.8721110Z Error: TLC threw an unexpected exception.
2024-09-13T21:10:08.8721820Z This was probably caused by an error in the spec or model.
2024-09-13T21:10:08.8722650Z See the User Output or TLC Console for clues to what happened.
2024-09-13T21:10:08.8723490Z The exception was a tlc2.tool.ConfigFileException
2024-09-13T21:10:08.8724220Z : TLC found an error in the configuration file at line 6
2024-09-13T21:10:08.8724960Z It was expecting }, but did not find it.
2024-09-13T21:10:08.8726340Z Finished in 00s at (2024-09-13 21:10:08)
2024-09-13T21:10:08.8726770Z 

I believe the reason for the failure is that in your model files you have many sets defined as:

Writers = { w1 w2 }

but this needs to be comma-delimited like:

Writers = { w1, w2 }

I suggest running TLC locally against these model files (will probably need to be done using the command line) to save time on the debug loop; debug-via-CI-run is very time-consuming!

Copy link
Collaborator

@muenchnerkindl muenchnerkindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the nice contribution. You'll find a few comments in the individual files (comments for the multi-writer version are analogous to those for the single-writer version). I am looking forward to seeing this spec added to the collection!

specifications/Disruptor/RingBuffer.tla Show resolved Hide resolved
specifications/Disruptor/RingBuffer.tla Outdated Show resolved Hide resolved
specifications/Disruptor/RingBuffer.tla Show resolved Hide resolved
CONSTANTS
Writers, (* Writer/publisher thread ids. *)
Readers, (* Reader/consumer thread ids. *)
MaxPublished, (* Max number of published events. *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this constant is only relevant for model checking? It would be cleaner to separate the logical spec from the bounds imposed for model checking and either add a state constraint such as published < MaxPublished in the cfg file or write a MC version of the spec that adds extra guards to actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. I've investigated and the constant does bound the model. So I agree - that would be better to have as a state constraint. But then I can't model check the liveliness property in the model (that all consumers eventually always read all published events) as liveliness cannot be verified when state or action constraints are specified.
Is there a "clean" third option that you know of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware that "hardcoding" MaxPublished doesn't change anything semantically WRT liveness checking except that it makes TLC not print the warning about action and state-constraints.

Copy link
Contributor Author

@nicholassm nicholassm Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. If I replace MaxPublished with a state-constraint, I cannot make my liveliness property fail (by adding an error in the spec).
As I read the relevant pages in Lamport's book, it's because WF_x(a) is false when adding a state-constraint because a is enabled and thus WF_x(a) is false which in turn never makes the liveliness property fail.
However, if I add the check with the MaxPublished constant as a part of an action then the action is not enabled and therefore the liveliness property can fail (if it's wrong).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late reply: now I am confused. First, I would rather have defined the liveness property as

Liveliness == \A r \in Readers : \A i \in 1 .. MaxPublished :
  <>[](i \in 1 .. published => Len(consumed[r]) >= i /\ consumed[r][i] = i-1)

(with the second conjunct on the right-hand side being optional). The reason is that the property that you assert will not hold if writers are allowed to continue publishing even after reaching the (artificial) bound since readers would then be able to update their consumed sequence as well, whereas the above property should always hold. (Even more general, the bound on i could be Nat but that would require an override of Nat during model checking so that TLC doesn't complain about an infinite quantifier bound, and that would be a little heavy.)

I then tried commenting out the guard next < MaxPublished in BeginWrite and adding the constraint

StateConstraint == published <= MaxPublished

for model checking. This appears to work just fine, and when I comment out the fairness condition on BeginRead, TLC gives me the expected failure of the temporal property.

I would even suggest removing the fairness condition on BeginWrite from the spec, since why should writers be required to publish items forever? The above liveness property (but of course not the original one) continues to hold for the modified spec.

Of course, it's up to you to decide what you intend as your specification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan's Liveliness property would allow to model consumed as a counter instead of a sequence (for each reader). This observation also led me to realize that the SPMC specification seems to describe the multicast configuration, since all readers are consuming every value. It might be helpful to include this in a comment for clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If consumed remains a sequence, you could add the following "action property" (with IsPrefix defined at https://github.com/tlaplus/CommunityModules/blob/master/modules/SequencesExt.tla#L219-L225 :

\* We only ever append to the history variable  consumed  .   
Increments ==
    [][\A r \in Readers: IsPrefix(consumed[r], consumed'[r])]_vars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stephan's Liveliness property would allow to model consumed as a counter instead of a sequence (for each reader). This observation also led me to realize that the SPMC specification seems to describe the multicast configuration, since all readers are consuming every value. It might be helpful to include this in a comment for clarity.

Hi @lemmy, I wrote in the top that it's a Single Producer Multiple Consumer Disruptor - that's where I thought I made it clear that it is that configuration of the Disruptor. But perhaps I am assuming that everyone knows all consumers read all events (i.e. "multicast" behaviour) and that is not common knowledge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late reply: now I am confused. First, I would rather have defined the liveness property as

Liveliness == \A r \in Readers : \A i \in 1 .. MaxPublished :
  <>[](i \in 1 .. published => Len(consumed[r]) >= i /\ consumed[r][i] = i-1)

(with the second conjunct on the right-hand side being optional). The reason is that the property that you assert will not hold if writers are allowed to continue publishing even after reaching the (artificial) bound since readers would then be able to update their consumed sequence as well, whereas the above property should always hold. (Even more general, the bound on i could be Nat but that would require an override of Nat during model checking so that TLC doesn't complain about an infinite quantifier bound, and that would be a little heavy.)

I then tried commenting out the guard next < MaxPublished in BeginWrite and adding the constraint

StateConstraint == published <= MaxPublished

for model checking. This appears to work just fine, and when I comment out the fairness condition on BeginRead, TLC gives me the expected failure of the temporal property.

I would even suggest removing the fairness condition on BeginWrite from the spec, since why should writers be required to publish items forever? The above liveness property (but of course not the original one) continues to hold for the modified spec.

Of course, it's up to you to decide what you intend as your specification.

I did the changes (see a new PR) and I quite like it: Getting rid of the artificial model constraint in the BeginWrite action, remove the requirement to have producers publish forever and moving the bounding of the model to a state constraint. Elegant and a better model.

I still have some work to do for the MPMC model as it's more complex to write the liveliness property because there's multiple producers and hence it more difficult to express what sequence number a consumer can actually read. (More details will follow).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear that it worked out! I didn't look very much at the MPMC model, please let me know if you want me to.

specifications/Disruptor/Disruptor_SPMC.cfg Outdated Show resolved Hide resolved
specifications/Disruptor/Disruptor_SPMC.cfg Outdated Show resolved Hide resolved
specifications/Disruptor/Disruptor_SPMC.cfg Show resolved Hide resolved
specifications/Disruptor/Disruptor_SPMC.tla Show resolved Hide resolved
@nicholassm
Copy link
Contributor Author

Hi @muenchnerkindl and @ahelwer - super nice with all the feedback - much appreciated.
I'll get busy incorporating your suggestions.

Signed-off-by: Nicholas Schultz-Møller <[email protected]>
Signed-off-by: Nicholas Schultz-Møller <[email protected]>
Signed-off-by: Nicholas Schultz-Møller <[email protected]>
@nicholassm
Copy link
Contributor Author

Hi @muenchnerkindl, @ahelwer and @lemmy, I've fixed all issues related to running the model-checking and I think I've added all your good suggestions. Let me know if you think I can improve the contribution any further. :-)

@nicholassm
Copy link
Contributor Author

Hi guys, anything else I can do to improve the contribution? While it's fresh in my head :-)
Many thanks.

@ahelwer ahelwer merged commit 7ebd914 into tlaplus:master Sep 18, 2024
7 checks passed
published, (* Write cursor. One for the producer. *)
read, (* Read cursors. One per consumer. *)
consumed, (* Sequence of all read events by the Readers. *)
pc (* Program Counter of each Writer/Reader. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since the value for each process only alternates between Advance and Access, I would consider renaming the variable to something more descriptive, like hasAccess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants