Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synchronise churn with outbound governor #4854

Merged
merged 17 commits into from
Apr 22, 2024
Merged

synchronise churn with outbound governor #4854

merged 17 commits into from
Apr 22, 2024

Conversation

coot
Copy link
Contributor

@coot coot commented Apr 17, 2024

  • flake.nix: set -Werror GHC option for all project packages

  • testing-utils: added set properties

  • policies: declare {min,max}ChainSyncTimeout in the policies module

  • peer-selection: extended PeerSelectionCounters

  • churn: implemented explicit synchronisation

  • churn: added two simple testnet tests

  • peer-selection-tests: order imports

  • peer-selection: renamed peerStateToCounters

  • peer-selection: removed localRoots from counters

  • peer-selection: removed counters cached

  • churn: added ChurnCounters tracer

  • peer-selection: use HasCallStack where pickPeers is used

  • peer-selection: use peerSelectionStateToCounters

  • peer-selection: remove not needed subtraction

  • peer-selection: introduced PeerSelectionView

  • check-stylish: added ignore file

  • Branch

    • Updated changelog files.
    • Commit sequence broadly makes sense
    • Commits have useful messages
    • The documentation has been properly updated
    • New tests are added if needed and existing tests are updated
    • If serialization changes, user-facing consequences (e.g. replay from genesis) are confirmed to be intentional.
  • Pull Request

    • Self-reviewed the diff
    • Useful pull request description at least containing the following information:
      • What does this PR change?
      • Why these changes were needed?
      • How does this affect downstream repositories and/or end-users?
      • Which ticket does this PR close (if any)? If it does, is it linked?
    • Reviewer requested

@coot coot requested review from newhoggy and a team as code owners April 17, 2024 09:27
@coot coot changed the title coot/churn synchronise churn with outbound governor Apr 17, 2024
@coot coot added the churn Issues / PRs related to churn label Apr 17, 2024
@coot coot self-assigned this Apr 17, 2024
@coot
Copy link
Contributor Author

coot commented Apr 17, 2024

Effectiveness of PeerSelection tests.

Each comment describes a modification of the following expression and a list of test failures it caused. If there were a lot of failures, I only mentioned how many tests failed. To execute the tests I run:

cabal run ouroboros-network:sim-tests -- -p '/PeerSelection/'
peerSelectionStateToView
  :: Ord peeraddr
  => PeerSelectionState peeraddr peerconn
  -> PeerSelectionSetsWithSizes peeraddr
peerSelectionStateToView
    PeerSelectionState {
        knownPeers,
        establishedPeers,
        activePeers,
        publicRootPeers,
        localRootPeers,
        inProgressPromoteCold,
        inProgressPromoteWarm,
        inProgressDemoteWarm,
        inProgressDemoteHot
      }
    =
    PeerSelectionView {
      viewRootPeers                          = size rootPeersSet,

      viewKnownPeers                         = size   knownPeersSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- failures
      viewAvailableToConnectPeers            = size $ availableToConnectSet
                                                      Set.\\ bigLedgerSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- only PeerSelectionView invariant failure
      viewColdPeersPromotions                = size $ inProgressPromoteCold
                                                      Set.\\ bigLedgerSet,
      viewEstablishedPeers                   = size   establishedPeersSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- only PeerSelectionView invariant failure
      viewWarmPeersDemotions                 = size $ inProgressDemoteWarm
                                                      Set.\\ bigLedgerSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- only PeerSelectionView invariant failure (exception when generating test case)
      -- ```
      -- *** Failed! (after 61 tests and 27 shrinks):
      -- Exception while generating shrink-list:
      --   Map.!: given key is not an element in the map
      --   CallStack (from HasCallStack):
      --     error, called at libraries/containers/containers/src/Data/Map/Internal.hs:622:17 in containers-0.6.8-7acc:Data.Map.Internal
      -- Exception thrown while showing test case:
      --   Map.!: given key is not an element in the map
      --   CallStack (from HasCallStack):
      --     error, called at libraries/containers/containers/src/Data/Map/Internal.hs:622:17 in containers-0.6.8-7acc:Data.Map.Internal
      -- ```
      viewWarmPeersPromotions                = size $ inProgressPromoteWarm
                                                      Set.\\ bigLedgerSet,
      -- `activePeerSet -> establishedPeerSet`
      -- *as above 
      -- * ledger peers progresses towards established target (from above)
      -- * ledger peers progresses towards active target (from below)
      viewActivePeers                        = size $ activePeersSet,
      -- Removed ``Set.intersection` inProgressDemoteHot`
      -- * ledger peers progresses towards active target (from above)
      viewActivePeersDemotions               = size $ activePeersSet
                                                      `Set.intersection` inProgressDemoteHot,

      -- Using Set.empty
      -- * PeerSelectionView invariant
      -- * safety no excess busyness
      viewKnownBigLedgerPeers                = size   bigLedgerSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- 35 out of 80 tests failed :) 
      viewAvailableToConnectBigLedgerPeers   = size $ availableToConnectSet
                                                      `Set.intersection` bigLedgerSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- * big ledger peers progresses towards established target (from below)
      viewColdBigLedgerPeersPromotions       = size $ bigLedgerSet
                                                      `Set.intersection` inProgressPromoteCold,
      -- `establishedBigLedgerPeers -> estblichedPeersSet`
      -- * PeerSelectionView invariant
      -- * ...
      -- 33 out of 80 tests failed
      viewEstablishedBigLedgerPeers          = size   establishedBigLedgerPeersSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- * PeerSelectionView invariant
      viewWarmBigLedgerPeersDemotions        = size $ inProgressDemoteWarm
                                                      `Set.intersection` bigLedgerSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- * PeerSelectionView invariant
      viewWarmBigLedgerPeersPromotions       = size $ inProgressPromoteWarm
                                                      `Set.intersection` bigLedgerSet,
      -- `activeBigLedgerPeerSet -> establishedBigLedgerPeerSet`
      -- * PeerSelectionView invariant
      viewActiveBigLedgerPeers               = size   activeBigLedgerPeersSet,
      -- Removed ``Set.intersection` inProgressDemoteHot`
      -- * PeerSelectionView invariant
      viewActiveBigLedgerPeersDemotions      = size $ bigLedgerSet
                                                      `Set.intersection` inProgressDemoteHot,


      -- `knownBootstrapPeersSet -> Set.empty`
      -- * PeerSelectionView invariant
      viewKnownBootstrapPeers                = size   knownBootstrapPeersSet,
      -- `knownBootstrapPeers -> knownPeersSet`
      -- * PeerSelectionView invariant
      viewColdBootstrapPeersPromotions       = size $ knownBootstrapPeersSet
                                                      `Set.intersection` inProgressPromoteCold,
      -- `establishedBootstrapPeersSet -> establishedPeersSet`
      -- * PeerSelectionView invariant
      viewEstablishedBootstrapPeers          = size   establishedBootstrapPeersSet,
      -- Removed ``Set.intersection` inProgressDemoteWarm`
      -- * PeerSelectionView invariant
      viewWarmBootstrapPeersDemotions        = size $ establishedBootstrapPeersSet
                                                      `Set.intersection` inProgressDemoteWarm,
      -- `establishedBootstrapPeersSet -> establishedPeersSet`
      -- * PeerSelectionView invariant
      viewWarmBootstrapPeersPromotions       = size $ establishedBootstrapPeersSet
                                                      `Set.intersection` inProgressPromoteWarm,
      -- `activeBootstrapPeersSet -> establishedBootstrapPeersSet`
      -- * PeerSelectionView invariant
      viewActiveBootstrapPeers               = size   activeBootstrapPeersSet,
      -- PASSED
      -- `activeBootstrapPeersSet -> establishedBootstrapPeersSet`
      viewActiveBootstrapPeersDemotions      = size $ activeBootstrapPeersSet
                                                      `Set.intersection` inProgressDemoteHot,

      -- `establishedLocalRootsPeersSet -> activeLocalRootsPeersSet`
      -- 35 out of 80 tests failed
      viewEstablishedLocalRootPeers          = size $ activeLocalRootsPeersSet,
      -- Removed ``Set.intersection` inProgressPromoteWarm`
      -- 28 out of 80 tests failed
      viewWarmLocalRootPeersPromotions       = size $ establishedLocalRootsPeersSet
                                                      `Set.intersection` inProgressPromoteWarm,
      -- `activeLocalRootsPeersSet -> establishedLocalRootsPeersSet`
      -- 34 out of 80 tests failed
      viewActiveLocalRootPeers               = size   establishedLocalRootsPeersSet,
      -- Removed ``Set.intersection` inProgressDemoteHot`
      -- 35 out of 80 tests failed
      viewActiveLocalRootPeersDemotions      = size $ activeLocalRootsPeersSet
                                                      `Set.intersection` inProgressDemoteHot,

      viewKnownSharedPeers                   = size   knownSharedPeersSet,
      viewColdSharedPeersPromotions          = size $ knownSharedPeersSet
                                                      `Set.intersection` inProgressPromoteCold,
      viewEstablishedSharedPeers             = size   establishedSharedPeersSet,
      viewWarmSharedPeersDemotions           = size $ establishedSharedPeersSet
                                                      `Set.intersection` inProgressDemoteWarm,
      viewWarmSharedPeersPromotions          = size $ establishedSharedPeersSet
                                                      `Set.intersection` inProgressPromoteWarm,
      viewActiveSharedPeers                  = size   activeSharedPeersSet,
      viewActiveSharedPeersDemotions         = size $ activeSharedPeersSet
                                                      `Set.intersection` inProgressDemoteHot
    }

@coot coot added the outbound-governor Issues / PRs related to outbound-governor label Apr 18, 2024
@coot coot removed a link to an issue Apr 18, 2024
@coot coot linked an issue Apr 18, 2024 that may be closed by this pull request
Comment on lines +564 to 577
mbCounters <- atomically $ do
-- Update counters
counters <- readTVar countersVar
let !counters' = peerSelectionStateToCounters decisionState
if counters' /= counters
then writeTVar countersVar counters'
>> return (Just counters')
else return Nothing

-- Trace counters
traverse_ (traceWith countersTracer) mbCounters

traverse_ (traceWith tracer) decisionTrace
traceWithCache countersTracer
(countersCache decisionState)
newCounters

Copy link
Contributor

@crocodile-dentist crocodile-dentist Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could rethink how we trace these things. Overall, if my understanding is correct, each decision changes only a few things in the state, but here it looks like we re-compute everything to trace. Maybe we could recompute only the things that did change in this step, and the tracer could take the parts that did not change and add those parts that we had to re-calculate.

Maybe this could be a separate issue, since the patch looks correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't know which things have changed, so we'll need to recompute them anyway. But maybe there's something clever I am missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud, but maybe the job/monitoring action that records a change could return some sort of lens that references which fields it has touched, along with the entire counters record (most of which would remain unevaluated). Then code here could apply this lens to left and right hand sides just to see if there is indeed a difference, and also use the lens to update just the changed counters in countersVar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fancy idea: we could use NoThunks (or its approach) to build a monoid that combines two records, picking the evaluated thunks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have quite a few relationships between PeerSelectionCounters record fields which we'd need to preserve to really avoid recomputing things. This would require some fancy machine to derive the right instance - might be a nice challenge. Here's a basic idea which seems to work when there are no dependencies between fields.

Testing also will be non trivial, because constructed record will not be without thunks (no NoThunks cannot be used), simply because this will require Generics and to will leave us with fields which are thunks themselves (although they will point to computed values).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I don't think that the nothunks approach is something we'd like to include in production because it relies on unsafePerformIO and some rather fancy GHC runtime API. If we have a performance regression which cannot be accepted, we'll look into how to cache part of the data structure as you propose. I hope that this won't be a problem because these structures are rather small (e.g. less than 1000 entries and in standard configuration even less than 100).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method you propose is very clever, but as you say maybe it won't be a problem. We could create a separate issue to perform some benchmarks to convince ourselves that the impact is indeed negligible, otherwise we can try to resolve it separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coot added 6 commits April 19, 2024 19:17
`PeerSelectionCounters` provide now raw data in terms of sizes of active
/ established / known sets.  Added `PeerSelectionCountersHWC` pattern
synonym, which calculates sizes in terms of hot / warm / cold sets.

The counters include:
* public roots (excluding big ledger peers)
* big ledger peers
* bootstrap peers
* local roots
* shared peers (e.g. peers received through peer sharing)

Co-authored-by: Armando Santos (@bolt12)
Co-authored-by: Marcin Szamotulski (@coot)
Chrun now explicitly synchronises with outbound governor using
`PeerSelectionCounters`.  Each churn action can timeout.

Co-authored-by: Armando Santos (@bolt12)
Co-authored-by: Marcin Szamotulski (@coot)
coot added 10 commits April 19, 2024 19:17
Also export it from `Governor` module, it is useful in `cardano-node`.
`localRoots` didn't count local connections, but the targets. We don't
expose other targets in EKG metrics, so there's no reason to actually
include local root targets.
Since `PeerSelectionCounters` are stored in a `TVar` we don't need to
cache them in `PeerSelectionState`.
Use `peerSelectionStateToCounters` to compute numbers of peers over
which outbound-governor is making decisions.
Local roots are always disjoint with big ledger peers.  This is ensured
when we are adding new big ledger peers and when the local roots has
changed, there's no need to subtract them in `EstablishedPeers.aboveTargetOther`.
PeerSelectionView is a generalisation of PeerSelectionCounters useful
internally in the outbound-governor.  It allows us to not duplicate the
logic of computing counters separately for churn and the outbound
governor, which can help us to introduce bugs.
`Ouroboros.Network.PeerSelection.Governor.Type` excluded from
`check-stylish`, to preserve large export of record names in pattern
synonyms.
@coot coot added this pull request to the merge queue Apr 22, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 22, 2024
@coot coot added this pull request to the merge queue Apr 22, 2024
Merged via the queue into master with commit 46a691f Apr 22, 2024
13 checks passed
@coot coot deleted the coot/churn branch April 22, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
churn Issues / PRs related to churn outbound-governor Issues / PRs related to outbound-governor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Outbound Governor Counters Synchronisation in churn used by outbound governor
2 participants