synchronise churn with outbound governor #4854

coot · 2024-04-17T09:27:30Z

flake.nix: set -Werror GHC option for all project packages
testing-utils: added set properties
policies: declare {min,max}ChainSyncTimeout in the policies module
peer-selection: extended PeerSelectionCounters
churn: implemented explicit synchronisation
churn: added two simple testnet tests
peer-selection-tests: order imports
peer-selection: renamed peerStateToCounters
peer-selection: removed localRoots from counters
peer-selection: removed counters cached
churn: added ChurnCounters tracer
peer-selection: use HasCallStack where pickPeers is used
peer-selection: use peerSelectionStateToCounters
peer-selection: remove not needed subtraction
peer-selection: introduced PeerSelectionView
check-stylish: added ignore file
Branch
- Updated changelog files.
- Commit sequence broadly makes sense
- Commits have useful messages
- The documentation has been properly updated
- New tests are added if needed and existing tests are updated
- If serialization changes, user-facing consequences (e.g. replay from genesis) are confirmed to be intentional.
Pull Request
- Self-reviewed the diff
- Useful pull request description at least containing the following information:
  - What does this PR change?
  - Why these changes were needed?
  - How does this affect downstream repositories and/or end-users?
  - Which ticket does this PR close (if any)? If it does, is it linked?
- Reviewer requested

Stolen from `ouroboros-consensus`.

coot · 2024-04-17T17:07:07Z

Effectiveness of `PeerSelection` tests.

Each comment describes a modification of the following expression and a list of test failures it caused. If there were a lot of failures, I only mentioned how many tests failed. To execute the tests I run:

cabal run ouroboros-network:sim-tests -- -p '/PeerSelection/'

peerSelectionStateToView
  :: Ord peeraddr
  => PeerSelectionState peeraddr peerconn
  -> PeerSelectionSetsWithSizes peeraddr
peerSelectionStateToView
    PeerSelectionState {
        knownPeers,
        establishedPeers,
        activePeers,
        publicRootPeers,
        localRootPeers,
        inProgressPromoteCold,
        inProgressPromoteWarm,
        inProgressDemoteWarm,
        inProgressDemoteHot
      }
    =
    PeerSelectionView {
      viewRootPeers                          = size rootPeersSet,

      viewKnownPeers                         = size   knownPeersSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- failures
      viewAvailableToConnectPeers            = size $ availableToConnectSet
                                                      Set.\\ bigLedgerSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- only PeerSelectionView invariant failure
      viewColdPeersPromotions                = size $ inProgressPromoteCold
                                                      Set.\\ bigLedgerSet,
      viewEstablishedPeers                   = size   establishedPeersSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- only PeerSelectionView invariant failure
      viewWarmPeersDemotions                 = size $ inProgressDemoteWarm
                                                      Set.\\ bigLedgerSet,
      -- Removed `Set.\\ bigLedgerSet`
      -- only PeerSelectionView invariant failure (exception when generating test case)
      -- ```
      -- *** Failed! (after 61 tests and 27 shrinks):
      -- Exception while generating shrink-list:
      --   Map.!: given key is not an element in the map
      --   CallStack (from HasCallStack):
      --     error, called at libraries/containers/containers/src/Data/Map/Internal.hs:622:17 in containers-0.6.8-7acc:Data.Map.Internal
      -- Exception thrown while showing test case:
      --   Map.!: given key is not an element in the map
      --   CallStack (from HasCallStack):
      --     error, called at libraries/containers/containers/src/Data/Map/Internal.hs:622:17 in containers-0.6.8-7acc:Data.Map.Internal
      -- ```
      viewWarmPeersPromotions                = size $ inProgressPromoteWarm
                                                      Set.\\ bigLedgerSet,
      -- `activePeerSet -> establishedPeerSet`
      -- *as above 
      -- * ledger peers progresses towards established target (from above)
      -- * ledger peers progresses towards active target (from below)
      viewActivePeers                        = size $ activePeersSet,
      -- Removed ``Set.intersection` inProgressDemoteHot`
      -- * ledger peers progresses towards active target (from above)
      viewActivePeersDemotions               = size $ activePeersSet
                                                      `Set.intersection` inProgressDemoteHot,

      -- Using Set.empty
      -- * PeerSelectionView invariant
      -- * safety no excess busyness
      viewKnownBigLedgerPeers                = size   bigLedgerSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- 35 out of 80 tests failed :) 
      viewAvailableToConnectBigLedgerPeers   = size $ availableToConnectSet
                                                      `Set.intersection` bigLedgerSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- * big ledger peers progresses towards established target (from below)
      viewColdBigLedgerPeersPromotions       = size $ bigLedgerSet
                                                      `Set.intersection` inProgressPromoteCold,
      -- `establishedBigLedgerPeers -> estblichedPeersSet`
      -- * PeerSelectionView invariant
      -- * ...
      -- 33 out of 80 tests failed
      viewEstablishedBigLedgerPeers          = size   establishedBigLedgerPeersSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- * PeerSelectionView invariant
      viewWarmBigLedgerPeersDemotions        = size $ inProgressDemoteWarm
                                                      `Set.intersection` bigLedgerSet,
      -- Removed ``Set.intersection` bigLedgerSet`
      -- * PeerSelectionView invariant
      viewWarmBigLedgerPeersPromotions       = size $ inProgressPromoteWarm
                                                      `Set.intersection` bigLedgerSet,
      -- `activeBigLedgerPeerSet -> establishedBigLedgerPeerSet`
      -- * PeerSelectionView invariant
      viewActiveBigLedgerPeers               = size   activeBigLedgerPeersSet,
      -- Removed ``Set.intersection` inProgressDemoteHot`
      -- * PeerSelectionView invariant
      viewActiveBigLedgerPeersDemotions      = size $ bigLedgerSet
                                                      `Set.intersection` inProgressDemoteHot,


      -- `knownBootstrapPeersSet -> Set.empty`
      -- * PeerSelectionView invariant
      viewKnownBootstrapPeers                = size   knownBootstrapPeersSet,
      -- `knownBootstrapPeers -> knownPeersSet`
      -- * PeerSelectionView invariant
      viewColdBootstrapPeersPromotions       = size $ knownBootstrapPeersSet
                                                      `Set.intersection` inProgressPromoteCold,
      -- `establishedBootstrapPeersSet -> establishedPeersSet`
      -- * PeerSelectionView invariant
      viewEstablishedBootstrapPeers          = size   establishedBootstrapPeersSet,
      -- Removed ``Set.intersection` inProgressDemoteWarm`
      -- * PeerSelectionView invariant
      viewWarmBootstrapPeersDemotions        = size $ establishedBootstrapPeersSet
                                                      `Set.intersection` inProgressDemoteWarm,
      -- `establishedBootstrapPeersSet -> establishedPeersSet`
      -- * PeerSelectionView invariant
      viewWarmBootstrapPeersPromotions       = size $ establishedBootstrapPeersSet
                                                      `Set.intersection` inProgressPromoteWarm,
      -- `activeBootstrapPeersSet -> establishedBootstrapPeersSet`
      -- * PeerSelectionView invariant
      viewActiveBootstrapPeers               = size   activeBootstrapPeersSet,
      -- PASSED
      -- `activeBootstrapPeersSet -> establishedBootstrapPeersSet`
      viewActiveBootstrapPeersDemotions      = size $ activeBootstrapPeersSet
                                                      `Set.intersection` inProgressDemoteHot,

      -- `establishedLocalRootsPeersSet -> activeLocalRootsPeersSet`
      -- 35 out of 80 tests failed
      viewEstablishedLocalRootPeers          = size $ activeLocalRootsPeersSet,
      -- Removed ``Set.intersection` inProgressPromoteWarm`
      -- 28 out of 80 tests failed
      viewWarmLocalRootPeersPromotions       = size $ establishedLocalRootsPeersSet
                                                      `Set.intersection` inProgressPromoteWarm,
      -- `activeLocalRootsPeersSet -> establishedLocalRootsPeersSet`
      -- 34 out of 80 tests failed
      viewActiveLocalRootPeers               = size   establishedLocalRootsPeersSet,
      -- Removed ``Set.intersection` inProgressDemoteHot`
      -- 35 out of 80 tests failed
      viewActiveLocalRootPeersDemotions      = size $ activeLocalRootsPeersSet
                                                      `Set.intersection` inProgressDemoteHot,

      viewKnownSharedPeers                   = size   knownSharedPeersSet,
      viewColdSharedPeersPromotions          = size $ knownSharedPeersSet
                                                      `Set.intersection` inProgressPromoteCold,
      viewEstablishedSharedPeers             = size   establishedSharedPeersSet,
      viewWarmSharedPeersDemotions           = size $ establishedSharedPeersSet
                                                      `Set.intersection` inProgressDemoteWarm,
      viewWarmSharedPeersPromotions          = size $ establishedSharedPeersSet
                                                      `Set.intersection` inProgressPromoteWarm,
      viewActiveSharedPeers                  = size   activeSharedPeersSet,
      viewActiveSharedPeersDemotions         = size $ activeSharedPeersSet
                                                      `Set.intersection` inProgressDemoteHot
    }

ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/Types.hs

ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/EstablishedPeers.hs

crocodile-dentist · 2024-04-19T10:28:42Z

ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor.hs

+      mbCounters <- atomically $ do
+        -- Update counters
+        counters <- readTVar countersVar
+        let !counters' = peerSelectionStateToCounters decisionState
+        if counters' /= counters
+          then writeTVar countersVar counters'
+            >> return (Just counters')
+          else return Nothing
+
+      -- Trace counters
+      traverse_ (traceWith countersTracer) mbCounters
+
      traverse_ (traceWith tracer) decisionTrace
-      traceWithCache countersTracer
-                     (countersCache decisionState)
-                     newCounters



Maybe we could rethink how we trace these things. Overall, if my understanding is correct, each decision changes only a few things in the state, but here it looks like we re-compute everything to trace. Maybe we could recompute only the things that did change in this step, and the tracer could take the parts that did not change and add those parts that we had to re-calculate.

Maybe this could be a separate issue, since the patch looks correct.

But we don't know which things have changed, so we'll need to recompute them anyway. But maybe there's something clever I am missing.

Just thinking out loud, but maybe the job/monitoring action that records a change could return some sort of lens that references which fields it has touched, along with the entire counters record (most of which would remain unevaluated). Then code here could apply this lens to left and right hand sides just to see if there is indeed a difference, and also use the lens to update just the changed counters in countersVar.

A fancy idea: we could use NoThunks (or its approach) to build a monoid that combines two records, picking the evaluated thunks.

We have quite a few relationships between PeerSelectionCounters record fields which we'd need to preserve to really avoid recomputing things. This would require some fancy machine to derive the right instance - might be a nice challenge. Here's a basic idea which seems to work when there are no dependencies between fields.

Testing also will be non trivial, because constructed record will not be without thunks (no NoThunks cannot be used), simply because this will require Generics and to will leave us with fields which are thunks themselves (although they will point to computed values).

btw, I don't think that the nothunks approach is something we'd like to include in production because it relies on unsafePerformIO and some rather fancy GHC runtime API. If we have a performance regression which cannot be accepted, we'll look into how to cache part of the data structure as you propose. I hope that this won't be a problem because these structures are rather small (e.g. less than 1000 entries and in standard configuration even less than 100).

The method you propose is very clever, but as you say maybe it won't be a problem. We could create a separate issue to perform some benchmarks to convince ourselves that the impact is indeed negligible, otherwise we can try to resolve it separately.

ouroboros-network-testing/src/Ouroboros/Network/Testing/Utils.hs

ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/Types.hs

@bolt12

`PeerSelectionCounters` provide now raw data in terms of sizes of active / established / known sets. Added `PeerSelectionCountersHWC` pattern synonym, which calculates sizes in terms of hot / warm / cold sets. The counters include: * public roots (excluding big ledger peers) * big ledger peers * bootstrap peers * local roots * shared peers (e.g. peers received through peer sharing) Co-authored-by: Armando Santos (@bolt12) Co-authored-by: Marcin Szamotulski (@coot)

@bolt12

Chrun now explicitly synchronises with outbound governor using `PeerSelectionCounters`. Each churn action can timeout. Co-authored-by: Armando Santos (@bolt12) Co-authored-by: Marcin Szamotulski (@coot)

Also export it from `Governor` module, it is useful in `cardano-node`.

`localRoots` didn't count local connections, but the targets. We don't expose other targets in EKG metrics, so there's no reason to actually include local root targets.

Since `PeerSelectionCounters` are stored in a `TVar` we don't need to cache them in `PeerSelectionState`.

Use `peerSelectionStateToCounters` to compute numbers of peers over which outbound-governor is making decisions.

Local roots are always disjoint with big ledger peers. This is ensured when we are adding new big ledger peers and when the local roots has changed, there's no need to subtract them in `EstablishedPeers.aboveTargetOther`.

PeerSelectionView is a generalisation of PeerSelectionCounters useful internally in the outbound-governor. It allows us to not duplicate the logic of computing counters separately for churn and the outbound governor, which can help us to introduce bugs.

`Ouroboros.Network.PeerSelection.Governor.Type` excluded from `check-stylish`, to preserve large export of record names in pattern synonyms.

flake.nix: set -Werror GHC option for all project packages

30b7588

Stolen from `ouroboros-consensus`.

coot requested review from newhoggy and a team as code owners April 17, 2024 09:27

coot changed the title ~~coot/churn~~ synchronise churn with outbound governor Apr 17, 2024

coot added the churn Issues / PRs related to churn label Apr 17, 2024

coot self-assigned this Apr 17, 2024

This was linked to issues Apr 17, 2024

Outbound Governor Counters #4845

Closed

Synchronisation in churn used by outbound governor #4617

Closed

coot added the outbound-governor Issues / PRs related to outbound-governor label Apr 18, 2024

coot removed a link to an issue Apr 18, 2024

Outbound Governor Counters #4845

Closed

coot linked an issue Apr 18, 2024 that may be closed by this pull request

Outbound Governor Counters #4845

Closed

crocodile-dentist reviewed Apr 18, 2024

View reviewed changes

ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/Types.hs Outdated Show resolved Hide resolved

coot force-pushed the coot/churn branch from e9d3b67 to c2eb5b6 Compare April 19, 2024 08:21