Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop relying on ChannelMonitor persistence after manager read #3322

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

When we discover we've only partially claimed an MPP HTLC during
ChannelManager reading, we need to add the payment preimage to
all other ChannelMonitors that were a part of the payment.

We previously did this with a direct call on the ChannelMonitor,
requiring users write the full ChannelMonitor to disk to ensure
that updated information made it.

This adds quite a bit of delay during initial startup - fully
resilvering each ChannelMonitor just to handle this one case is
incredibly excessive.

Instead, we rewrite the MPP claim replay logic to use only (new) data included in ChannelMonitors, which has a nice side-effect of teeing up future ChannelManager-non-persistence features as well as makes our PaymentClaimed event generation much more robust.

@arik-so
Copy link
Contributor

arik-so commented Sep 24, 2024

A bunch of tests are still trying to call provide_payment_preimage()

In aa09c33 we added a new secret
in `ChannelManager` with which to derive inbound `PaymentId`s. We
added read support for the new field, but forgot to add writing
support for it. Here we fix this oversight.
When we started tracking which channels had MPP parts claimed
durably on-disk in their `ChannelMonitor`, we did so with a tuple.
This was fine in that it was only ever accessed in two places, but
as we will start tracking it through to the `ChannelMonitor`s
themselves in the coming commit(s), it is useful to have it in a
struct instead.
When we claim an MPP payment, then crash before persisting all the
relevant `ChannelMonitor`s, we rely on the payment data being
available in the `ChannelManager` on restart to re-claim any parts
that haven't yet been claimed. This is fine as long as the
`ChannelManager` was persisted before the `PaymentClaimable` event
was processed, which is generally the case in our
`lightning-background-processor`, but may not be in other cases or
in a somewhat rare race.

In order to fix this, we need to track where all the MPP parts of
a payment are in the `ChannelMonitor`, allowing us to re-claim any
missing pieces without reference to any `ChannelManager` data.

Further, in order to properly generate a `PaymentClaimed` event
against the re-started claim, we have to store various payment
metadata with the HTLC list as well.

Here we take the first step, building a list of MPP parts and
metadata in `ChannelManager` and passing it through to
`ChannelMonitor` in the `ChannelMonitorUpdate`s.
When we claim an MPP payment, then crash before persisting all the
relevant `ChannelMonitor`s, we rely on the payment data being
available in the `ChannelManager` on restart to re-claim any parts
that haven't yet been claimed. This is fine as long as the
`ChannelManager` was persisted before the `PaymentClaimable` event
was processed, which is generally the case in our
`lightning-background-processor`, but may not be in other cases or
in a somewhat rare race.

In order to fix this, we need to track where all the MPP parts of
a payment are in the `ChannelMonitor`, allowing us to re-claim any
missing pieces without reference to any `ChannelManager` data.

Further, in order to properly generate a `PaymentClaimed` event
against the re-started claim, we have to store various payment
metadata with the HTLC list as well.

Here we store the required MPP parts and metadata in
`ChannelMonitor`s and make them available to `ChannelManager` on
load.
In a coming commit we'll use the existing `ChannelManager` claim
flow to claim HTLCs which we found partially claimed on startup,
necessitating having a full `ChannelManager` when we go to do so.

Here we move the re-claim logic down in the `ChannelManager`-read
logic so that we have that.
Here we wrap the logic which moves claimable payments from
`claimable_payments` to `pending_claiming_payments` to a new
utility function on `ClaimablePayments`. This will allow us to call
this new logic during `ChannelManager` deserialization in a few
commits.
In the next commit we'll start using (much of) the normal HTLC
claim pipeline to replay payment claims on startup. In order to do
so, however, we have to properly handle cases where we get a
`DuplicateClaim` back from the channel for an inbound-payment HTLC.

Here we do so, handling the `MonitorUpdateCompletionAction` and
allowing an already-completed RAA blocker.
When we claim an MPP payment, then crash before persisting all the
relevant `ChannelMonitor`s, we rely on the payment data being
available in the `ChannelManager` on restart to re-claim any parts
that haven't yet been claimed. This is fine as long as the
`ChannelManager` was persisted before the `PaymentClaimable` event
was processed, which is generally the case in our
`lightning-background-processor`, but may not be in other cases or
in a somewhat rare race.

In order to fix this, we need to track where all the MPP parts of
a payment are in the `ChannelMonitor`, allowing us to re-claim any
missing pieces without reference to any `ChannelManager` data.

Further, in order to properly generate a `PaymentClaimed` event
against the re-started claim, we have to store various payment
metadata with the HTLC list as well.

Here we finally implement claiming using the new MPP part list and
metadata stored in `ChannelMonitor`s. In doing so, we use much more
of the existing HTLC-claiming pipeline in `ChannelManager`,
utilizing the on-startup background events flow as well as properly
re-applying the RAA-blockers to ensure preimages cannot be lost.
@TheBlueMatt
Copy link
Collaborator Author

Ugh, sorry, fixed. Also rebased and fixed an issue introduced in #3303

When we discover we've only partially claimed an MPP HTLC during
`ChannelManager` reading, we need to add the payment preimage to
all other `ChannelMonitor`s that were a part of the payment.

We previously did this with a direct call on the `ChannelMonitor`,
requiring users write the full `ChannelMonitor` to disk to ensure
that updated information made it.

This adds quite a bit of delay during initial startup - fully
resilvering each `ChannelMonitor` just to handle this one case is
incredibly excessive.

Over the past few commits we dropped the need to pass HTLCs
directly to the `ChannelMonitor`s using the background events to
provide `ChannelMonitorUpdate`s insetad.

Thus, here we finally drop the requirement to resilver
`ChannelMonitor`s on startup.
Because the new startup `ChannelMonitor` persistence semantics rely
on new information stored in `ChannelMonitor` only for claims made
in the upgraded code, users upgrading from previous version of LDK
must apply the old `ChannelMonitor` persistence semantics at least
once (as the old code will be used to handle partial claims).
Copy link

codecov bot commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 90.65657% with 37 lines in your changes missing coverage. Please review.

Project coverage is 89.66%. Comparing base (a661c92) to head (b0fa756).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 89.36% 23 Missing and 9 partials ⚠️
lightning/src/chain/channelmonitor.rs 96.00% 2 Missing ⚠️
lightning/src/ln/reload_tests.rs 88.88% 0 Missing and 2 partials ⚠️
lightning/src/ln/channel.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3322      +/-   ##
==========================================
- Coverage   89.68%   89.66%   -0.02%     
==========================================
  Files         126      126              
  Lines      103168   103370     +202     
  Branches   103168   103370     +202     
==========================================
+ Hits        92522    92686     +164     
- Misses       7934     7971      +37     
- Partials     2712     2713       +1     
Flag Coverage Δ
89.66% <90.65%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt TheBlueMatt added this to the 0.1 milestone Oct 1, 2024
@TheBlueMatt
Copy link
Collaborator Author

Tagging 0.1 because of the first commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants