Stop relying on ChannelMonitor persistence after manager read #3322

TheBlueMatt · 2024-09-18T01:28:18Z

When we discover we've only partially claimed an MPP HTLC during
ChannelManager reading, we need to add the payment preimage to
all other ChannelMonitors that were a part of the payment.

We previously did this with a direct call on the ChannelMonitor,
requiring users write the full ChannelMonitor to disk to ensure
that updated information made it.

This adds quite a bit of delay during initial startup - fully
resilvering each ChannelMonitor just to handle this one case is
incredibly excessive.

Instead, we rewrite the MPP claim replay logic to use only (new) data included in ChannelMonitors, which has a nice side-effect of teeing up future ChannelManager-non-persistence features as well as makes our PaymentClaimed event generation much more robust.

arik-so · 2024-09-24T03:40:35Z

A bunch of tests are still trying to call provide_payment_preimage()

In aa09c33 we added a new secret in `ChannelManager` with which to derive inbound `PaymentId`s. We added read support for the new field, but forgot to add writing support for it. Here we fix this oversight.

When we started tracking which channels had MPP parts claimed durably on-disk in their `ChannelMonitor`, we did so with a tuple. This was fine in that it was only ever accessed in two places, but as we will start tracking it through to the `ChannelMonitor`s themselves in the coming commit(s), it is useful to have it in a struct instead.

When we claim an MPP payment, then crash before persisting all the relevant `ChannelMonitor`s, we rely on the payment data being available in the `ChannelManager` on restart to re-claim any parts that haven't yet been claimed. This is fine as long as the `ChannelManager` was persisted before the `PaymentClaimable` event was processed, which is generally the case in our `lightning-background-processor`, but may not be in other cases or in a somewhat rare race. In order to fix this, we need to track where all the MPP parts of a payment are in the `ChannelMonitor`, allowing us to re-claim any missing pieces without reference to any `ChannelManager` data. Further, in order to properly generate a `PaymentClaimed` event against the re-started claim, we have to store various payment metadata with the HTLC list as well. Here we take the first step, building a list of MPP parts and metadata in `ChannelManager` and passing it through to `ChannelMonitor` in the `ChannelMonitorUpdate`s.

When we claim an MPP payment, then crash before persisting all the relevant `ChannelMonitor`s, we rely on the payment data being available in the `ChannelManager` on restart to re-claim any parts that haven't yet been claimed. This is fine as long as the `ChannelManager` was persisted before the `PaymentClaimable` event was processed, which is generally the case in our `lightning-background-processor`, but may not be in other cases or in a somewhat rare race. In order to fix this, we need to track where all the MPP parts of a payment are in the `ChannelMonitor`, allowing us to re-claim any missing pieces without reference to any `ChannelManager` data. Further, in order to properly generate a `PaymentClaimed` event against the re-started claim, we have to store various payment metadata with the HTLC list as well. Here we store the required MPP parts and metadata in `ChannelMonitor`s and make them available to `ChannelManager` on load.

In a coming commit we'll use the existing `ChannelManager` claim flow to claim HTLCs which we found partially claimed on startup, necessitating having a full `ChannelManager` when we go to do so. Here we move the re-claim logic down in the `ChannelManager`-read logic so that we have that.

Here we wrap the logic which moves claimable payments from `claimable_payments` to `pending_claiming_payments` to a new utility function on `ClaimablePayments`. This will allow us to call this new logic during `ChannelManager` deserialization in a few commits.

In the next commit we'll start using (much of) the normal HTLC claim pipeline to replay payment claims on startup. In order to do so, however, we have to properly handle cases where we get a `DuplicateClaim` back from the channel for an inbound-payment HTLC. Here we do so, handling the `MonitorUpdateCompletionAction` and allowing an already-completed RAA blocker.

When we claim an MPP payment, then crash before persisting all the relevant `ChannelMonitor`s, we rely on the payment data being available in the `ChannelManager` on restart to re-claim any parts that haven't yet been claimed. This is fine as long as the `ChannelManager` was persisted before the `PaymentClaimable` event was processed, which is generally the case in our `lightning-background-processor`, but may not be in other cases or in a somewhat rare race. In order to fix this, we need to track where all the MPP parts of a payment are in the `ChannelMonitor`, allowing us to re-claim any missing pieces without reference to any `ChannelManager` data. Further, in order to properly generate a `PaymentClaimed` event against the re-started claim, we have to store various payment metadata with the HTLC list as well. Here we finally implement claiming using the new MPP part list and metadata stored in `ChannelMonitor`s. In doing so, we use much more of the existing HTLC-claiming pipeline in `ChannelManager`, utilizing the on-startup background events flow as well as properly re-applying the RAA-blockers to ensure preimages cannot be lost.

TheBlueMatt · 2024-09-30T21:10:10Z

Ugh, sorry, fixed. Also rebased and fixed an issue introduced in #3303

When we discover we've only partially claimed an MPP HTLC during `ChannelManager` reading, we need to add the payment preimage to all other `ChannelMonitor`s that were a part of the payment. We previously did this with a direct call on the `ChannelMonitor`, requiring users write the full `ChannelMonitor` to disk to ensure that updated information made it. This adds quite a bit of delay during initial startup - fully resilvering each `ChannelMonitor` just to handle this one case is incredibly excessive. Over the past few commits we dropped the need to pass HTLCs directly to the `ChannelMonitor`s using the background events to provide `ChannelMonitorUpdate`s insetad. Thus, here we finally drop the requirement to resilver `ChannelMonitor`s on startup.

Because the new startup `ChannelMonitor` persistence semantics rely on new information stored in `ChannelMonitor` only for claims made in the upgraded code, users upgrading from previous version of LDK must apply the old `ChannelMonitor` persistence semantics at least once (as the old code will be used to handle partial claims).

codecov · 2024-09-30T21:20:32Z

Codecov Report

Attention: Patch coverage is 90.65657% with 37 lines in your changes missing coverage. Please review.

Project coverage is 89.66%. Comparing base (a661c92) to head (b0fa756).

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	89.36%	23 Missing and 9 partials ⚠️
lightning/src/chain/channelmonitor.rs	96.00%	2 Missing ⚠️
lightning/src/ln/reload_tests.rs	88.88%	0 Missing and 2 partials ⚠️
lightning/src/ln/channel.rs	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3322      +/-   ##
==========================================
- Coverage   89.68%   89.66%   -0.02%     
==========================================
  Files         126      126              
  Lines      103168   103370     +202     
  Branches   103168   103370     +202     
==========================================
+ Hits        92522    92686     +164     
- Misses       7934     7971      +37     
- Partials     2712     2713       +1

Flag	Coverage Δ
	`89.66% <90.65%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TheBlueMatt · 2024-10-01T13:03:10Z

Tagging 0.1 because of the first commit.

TheBlueMatt added 8 commits September 30, 2024 21:04

Add missing inbound_payment_id_secret write in ChannelManager

0b48d67

In aa09c33 we added a new secret in `ChannelManager` with which to derive inbound `PaymentId`s. We added read support for the new field, but forgot to add writing support for it. Here we fix this oversight.

TheBlueMatt force-pushed the 2024-06-mpp-claim-without-man branch from 98ecb3d to 5437927 Compare September 30, 2024 21:10

TheBlueMatt added 2 commits September 30, 2024 21:12

TheBlueMatt force-pushed the 2024-06-mpp-claim-without-man branch from 5437927 to b0fa756 Compare September 30, 2024 21:13

TheBlueMatt added this to the 0.1 milestone Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop relying on ChannelMonitor persistence after manager read #3322

Stop relying on ChannelMonitor persistence after manager read #3322

TheBlueMatt commented Sep 18, 2024

arik-so commented Sep 24, 2024

TheBlueMatt commented Sep 30, 2024

codecov bot commented Sep 30, 2024

TheBlueMatt commented Oct 1, 2024

Stop relying on ChannelMonitor persistence after manager read #3322

Are you sure you want to change the base?

Stop relying on ChannelMonitor persistence after manager read #3322

Conversation

TheBlueMatt commented Sep 18, 2024

arik-so commented Sep 24, 2024

TheBlueMatt commented Sep 30, 2024

codecov bot commented Sep 30, 2024

Codecov Report

TheBlueMatt commented Oct 1, 2024