Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the descriptor in keychain::Changeset #1203

Merged
merged 13 commits into from
May 9, 2024

Conversation

danielabrozzoni
Copy link
Member

@danielabrozzoni danielabrozzoni commented Nov 10, 2023

Fixes #1101

  • Moves keychain::ChangeSet inside keychain/txout_index.rs as now the ChangeSet depends on miniscript
  • Slightly cleans up tests by introducing some constant descriptors
  • The KeychainTxOutIndex's internal SpkIterator now uses DescriptorId
    instead of K. The DescriptorId -> K translation is made at the
    KeychainTxOutIndex level.
  • The keychain::Changeset is now a struct, which includes a map for last
    revealed indexes, and one for newly added keychains and their
    descriptor.

Changelog notice

API changes in bdk:

  • Wallet::keychains returns a impl Iterator instead of BTreeMap
  • Wallet::load doesn't take descriptors anymore, since they're stored in the db
  • Wallet::new_or_load checks if the loaded descriptor from db is the same as the provided one

API changes in bdk_chain:

  • ChangeSet is now a struct, which includes a map for last revealed
    indexes, and one for keychains and descriptors.
  • KeychainTxOutIndex::inner returns a SpkIterator<(DescriptorId, u32)>
  • KeychainTxOutIndex::outpoints returns a BTreeSet instead of &BTreeSet
  • KeychainTxOutIndex::keychains returns a impl Iterator instead of
    &BTreeMap
  • KeychainTxOutIndex::txouts doesn't return a ExactSizeIterator anymore
  • KeychainTxOutIndex::last_revealed_indices returns a BTreeMap
    instead of &BTreeMap
  • KeychainTxOutIndex::add_keychain has been renamed to KeychainTxOutIndex::insert_descriptor, and now it returns a ChangeSet
  • KeychainTxOutIndex::reveal_next_spk returns Option
  • KeychainTxOutIndex::next_unused_spk returns Option
  • KeychainTxOutIndex::unbounded_spk_iter returns Option
  • KeychainTxOutIndex::next_index returns Option
  • KeychainTxOutIndex::reveal_to_target returns Option
  • KeychainTxOutIndex::revealed_keychain_spks returns Option
  • KeychainTxOutIndex::unused_keychain_spks returns Option
  • KeychainTxOutIndex::last_revealed_index returns Option
  • KeychainTxOutIndex::keychain_outpoints returns Option
  • KeychainTxOutIndex::keychain_outpoints_in_range returns Option
  • KeychainTxOutIndex::last_used_index returns None if the keychain has never been used, or if it doesn't exist

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

@danielabrozzoni danielabrozzoni changed the title Issue/1101 Include the descriptor in keychain_txout_index::Changeset Nov 10, 2023
@danielabrozzoni danielabrozzoni changed the title Include the descriptor in keychain_txout_index::Changeset Include the descriptor in keychain::Changeset Nov 13, 2023
@danielabrozzoni danielabrozzoni self-assigned this Nov 13, 2023
@danielabrozzoni danielabrozzoni added this to the 1.0.0-alpha.3 milestone Nov 13, 2023
@danielabrozzoni danielabrozzoni marked this pull request as ready for review November 13, 2023 09:03
Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this forward. This is my initial round of comments!

crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Show resolved Hide resolved
@evanlinjin
Copy link
Member

Oh and it seems like you need to rebase.

@danielabrozzoni
Copy link
Member Author

Hey @evanlinjin and @LLFourn, thanks a lot for your reviews. I've given some thoughts about them:

  1. I agree that having a BTreeMap with everything inside is not optimal, and a structure with both last_revealed and descriptor would be better. That being said, we need to make sure that users don't pass in a (keychain, index) without the correspondent descriptor.
  2. The more I think about it, the more I convince myself that we shouldn't let the user modify the descriptor - otherwise, what's the point of having a db checksum? I think that changing it silently (as my code is doing right now), or making the changeset monotone as lloyd suggested, might be potential footguns. (I might not have fully comprehended lloyd's proposal though!)

It seems to me that it would be optimal to have append return a Result, and error in the two cases; but the Append trait doesn't specify any return value for append. How should I go about this?

@evanlinjin
Copy link
Member

Just quoting @LLFourn's response to adding an error to Append trait methods:

Because we should design things so that there should be no possible error

The errors would have to be programmer errors which should be handled via panic or just carry on with best effort.

I think there would most of the time be ways to re-architect the changeset to avoid errors completely. That's the trade off. Either design it so there can't be errors (may mean bigger more monotone changesets) or just accept the error.

@evanlinjin
Copy link
Member

evanlinjin commented Dec 5, 2023

What do we think about using debug_assert! in the Append implementation for now? This will check if we are mutating the descriptor of an already-existing keychain. In release mode, the consecutive changes to a keychain's descriptor should just be ignored.

The add_keychain method should error if a user is trying to modify an already-existing keychain. The changesets are returned by KeychainTxOutIndex methods. The KeychainTxOutIndex should never create consecutive changesets that modify the same keychain's descriptor. So I think my suggestion is fair.

Copy link
Member

@notmandatory notmandatory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to think of how to keep ChangeSet<K> monotone and I think we should not change the keychain::ChangeSet and instead enforce that descriptors don't change at the bdk::wallet::ChangeSet level. This puts responsibility on the devs who who don't use bdk::wallet to keep their data consistent. They can either follow how we do it in Wallet or go their own way if they know what they're doing. I created #1234 to demonstrate this approach, would love some feed back.

crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
crates/chain/tests/test_keychain_txout_index.rs Outdated Show resolved Hide resolved
@LLFourn
Copy link
Contributor

LLFourn commented Dec 7, 2023

I've been trying to think of how to keep ChangeSet<K> monotone and I think we should not change the keychain::ChangeSet and instead enforce that descriptors don't change at the bdk::wallet::ChangeSet level. This puts responsibility on the devs who who don't use bdk::wallet to keep their data consistent. They can either follow how we do it in Wallet or go their own way if they know what they're doing. I created #1234 to demonstrate this approach, would love some feed back.

This is actually becoming my intuition too but have an idea a bit different #1234. I think the descriptor has to be in the keychain changeset because that's where it's stored in memory and it has to be there to produce new addresses. Having it elsewhere adds friction and removing that is the point of this effort.

We can keep KeychainTxOutIndex simple and keep changesets montone by removing the type parameter K from it. There is a perfectly coherent bit of functionality with its own persistence mechanism that cannot conflict with anything else that we can carve out here and leave labeling to later. For now we can let Wallet handle its own labeling with the KeychainKind enum in its own addition to its changeset.

So here's my suggestion:

  1. Use descriptor hash to identify descriptor (as @notmandatory does in his approach). Let's call it DescriptorId and make it type DescriptorId = sha256::Hash for now.
  2. Rename KeychainTxOutIndex to DescriptorTxOutIndex
  3. Define it as follows:
#[derive(Clone, Debug)]
pub struct DescriptorTxOutIndex {
    inner: SpkTxOutIndex<(DescriptorId, u32)>,
    // descriptors of each keychain
    descriptors: BTreeMap<DescriptorId, Descriptor<DescriptorPublicKey>>,
    // last revealed indexes
    last_revealed: BTreeMap<DescriptorId, u32>,
    // lookahead settings for each descriptor
    lookahead: BTreeMap<DescriptorId, u32>,
}

#[derive(Clone, Debug, PartialEq, serde::Deserialize, serde::Serialize)]
pub struct ChangeSet {
    revealed: BTreeMap<DescriptorId, u32>
    added: BTreeSet<Descriptor<DescriptorPublicKey>
}

Methods in DescriptorTxOutIndex should take a DescriptorId.

This changeset is monotone and error free and gives us most of the useful things that KeychainTxOutIndex did. We could of course keep KeychainTxOutIndex and implement with a DescriptorTxOutIndex internally but I think we should just do Wallet first and add that later if we want it. I guess the examples won't be too hard to change (it might even simplify them a bit!).

So here's what Wallet changeset would become:

/// The changes made to a wallet by applying an [`Update`].
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize, Default)]
pub struct ChangeSet {
    /// Changes to the [`LocalChain`].
    ///
    /// [`LocalChain`]: local_chain::LocalChain
    pub chain: local_chain::ChangeSet,

    /// Changes to [`IndexedTxGraph`].
    ///
    /// [`IndexedTxGraph`]: bdk_chain::indexed_tx_graph::IndexedTxGraph
    pub indexed_tx_graph: indexed_tx_graph::ChangeSet<
        ConfirmationTimeHeightAnchor,
        descriptor::ChangeSet,
    >,

    /// Stores they keychains of the wallet and their descriptors.
    /// This should only be non-empty in the first changeset of the wallet.
    keychains: BTreeMap<KeychainKind, DescriptorId>.

    /// Stores the network type of the wallet.
    pub network: Option<Network>,   
}

Of course this is non-monotone but it's better to have these hacks up here than down in bdk_chain. Since our API doesn't allow you to mutate descriptors it could also be fixed by #1193 i.e. setting keychains and network in the header. There is a question about whether keychains should have a DescriptorId or a Descriptor in it. If you use DescriptorId you always have to make sure the Descriptor is in the index and have an error case where a DescriptorId gets added to keychains but not indexed_tx_graph. Using the Descriptor in keychains redundantly stores the descriptors but remove the error case. I think I lean to just have DescriptorId since I don't see it causing too much trouble.

We'll probably want to add a descriptor_id() to DescriptorExt in bdk_chain.

Thoughts?

@LLFourn
Copy link
Contributor

LLFourn commented Dec 8, 2023

Thoughts?

So after sleeping on it I have changed my view a little bit. The design above is the most in line with the overall design philosophy and removes all possible invalid representations at the bdk_chain level. But it is a very nice feature to label descriptors and we use it in examples and almost every user will want to attach some kind of application semantics to the descriptor. So I want to modify the above proposal to keep KeychainTxOutIndex but have it internally function mostly like what was suggested above so that we remove most of the surface area for footguns.

#[derive(Clone, Debug)]
pub struct KeychainTxOutIndex<K> {
    inner: SpkTxOutIndex<(DescriptorId, u32)>,
    // descriptors of each keychain
    descriptors: BTreeMap<DescriptorId, Descriptor<DescriptorPublicKey>>,
    // last revealed indexes
    last_revealed: BTreeMap<DescriptorId, u32>,
    // lookahead settings for each descriptor
    lookahead: BTreeMap<DescriptorId, u32>,

    keychains: BTreeMap<K, DescriptorId> 
}

#[derive(Clone, Debug, PartialEq, serde::Deserialize, serde::Serialize)]
pub struct ChangeSet<K> {
    revealed: BTreeMap<DescriptorId, u32>,
    keychains_added: BTreeMap<K, Descriptor<DescriptorPublicKey>
}

This means the non-montonicity is restricted to keychains_added field where we can implement keychains_added by just replacing descriptors with the most recent one (i.e. just use BTreeMap::append). Replacing a keychain will not be a permanent break of anything -- You could just re-replace it with the correct one and all the other values like last_revealed would return to their correct values. In fact this even makes it possible to replace a keychain with another descriptor while not losing the old coins. Since the last revealed is associated with the descriptor rather than keychain it will also always load all the old spks. So this turns a bug into a feature in a relatively safe way -- it's probably not a feature that you should ever use though!

@ValuedMammal
Copy link
Contributor

@LLFourn Thanks for the detailed explanation. It helped clarify for me why persisting descriptor data at the wallet level is perhaps not enough. I think this direction is well aligned with the related effort of separating out a "header" changeset.

@notmandatory
Copy link
Member

notmandatory commented Dec 8, 2023

I'll take a crack at implementing your new KeychainTxOutIndex<K> design, but if @danielabrozzoni gets to it first in this PR that's fine too. At least I'll be a better reviewer if I play with it a bit. @danielabrozzoni has this one, I'll still help review.

@notmandatory
Copy link
Member

Please rebase now that #1183 has been merged.

@evanlinjin
Copy link
Member

@LLFourn very nice!

@notmandatory
Copy link
Member

I fixed some new CI issues with #1247, it just needs to be ACKd and merged and then this PR rebased.

danielabrozzoni and others added 8 commits May 8, 2024 15:49
...secret keys in the wallet in Wallet::load
We only need to loop though entries of `other`. The logic before was
wasteful because we were also looping though all entries of `self` even
if we do not need to modify the `self` entry.
This fixes the bug with changesets not being monotone. Previously, the
result of applying changesets individually v.s. applying the aggregate
of changesets may result in different `KeychainTxOutIndex` states.

The nature of the changeset allows different keychain types to share the
same descriptor. However, the previous design did not take this into
account. To do this properly, we should keep track of all keychains
currently associated with a given descriptor. However, the API only
allows returning one keychain per spk/txout/outpoint (which is a good
API).

Therefore, we rank keychain variants by `Ord`. Earlier keychain variants
have a higher rank, and the first keychain will be returned.
@danielabrozzoni
Copy link
Member Author

After a call with @evanlinjin we decided to go with his solution of using Ord by Keychain for methods that return a K (one thing I didn't understand, is that methods that take in a K will continue to work normally).

Then, either of us will open an issue in order to discuss further improvements (I don't like this solution that much and proposed one where we'd return only the latest keychain added, using a Vec instead of a set, but it still needs some discussion).

At the moment, replacing descriptors is still allowed, and having multiple keychains for the same descriptor is still allowed, even when using insert_descriptor.

@evanlinjin evanlinjin force-pushed the issue/1101 branch 2 times, most recently from f3c463a to 9822040 Compare May 9, 2024 06:32
Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 86711d4

/// a situation where a descriptor has no associated keychain(s), and relevant [`TxOut`]s,
/// [`OutPoint`]s and [`Script`]s (of that descriptor) will not be return by [`KeychainTxOutIndex`].
/// Therefore, reassigning the descriptor of a single keychain is not recommended.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good explanation!

@notmandatory notmandatory dismissed their stale review May 9, 2024 17:14

Re-reviewing.

Copy link
Member

@notmandatory notmandatory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b990293

Great job everyone sticking with this one! it was a real journey.

@notmandatory notmandatory dismissed LLFourn’s stale review May 9, 2024 18:03

Daniella and Evan have incorporated review comments.

@notmandatory notmandatory merged commit fb7ff29 into bitcoindevkit:master May 9, 2024
12 checks passed
keychains: BTreeMap<K, Descriptor<DescriptorPublicKey>>,
inner: SpkTxOutIndex<(DescriptorId, u32)>,
// keychain -> (descriptor, descriptor id) map
keychains_to_descriptors: BTreeMap<K, (DescriptorId, Descriptor<DescriptorPublicKey>)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Descriptor does not need to be here. You can just look it up in descriptor_ids_to_descriptors.

@LLFourn
Copy link
Contributor

LLFourn commented May 10, 2024

POST merge ACK. Thanks for just one of the non-perfect options and pushing this one through. Let's see how we feel about it after using it.

I left one comment that I think should be addressed.

@notmandatory notmandatory mentioned this pull request May 10, 2024
28 tasks
@LLFourn LLFourn mentioned this pull request Jun 6, 2024
6 tasks
evanlinjin added a commit that referenced this pull request Jun 13, 2024
8dd1744 refactor(chain): compute txid once for `KeychainTxOutIndex::index_tx` (志宇)
639d735 refactor(chain): change field names to be more sane (志宇)
5a02f40 docs(chain): fix docs (志宇)
c77e12b refactor(chain): `KeychainTxOutIndex` use `HashMap` for fields (志宇)
4d3846a chore(chain): s/replenish_lookahead/replenish_inner_index/ (LLFourn)
8779afd chore(chain): document insert_descriptor invariants better (LLFourn)
69f2a69 refactor(chain): improve replenish lookeahd internals (LLFourn)
5a584d0 chore(chain): Fix Indexed and KeychainIndexed documentaion (Lloyd Fournier)
b8ba5a0 chore(chain): Improve documentation of keychain::ChangeSet (LLFourn)
101a09a chore(chain): Standardise KeychainTxOutIndex return types (LLFourn)
bce070b chore(chain): add type IndexSpk, fix clippy type complexity warning (Steve Myers)
4d2442c chore(chain): misc docs and insert_descriptor fixes (LLFourn)
bc2a8be refactor(keychain): Fix KeychainTxOutIndex range queries (LLFourn)
3b2ff0c Write failing test for keychain range querying (LLFourn)

Pull request description:

  Fixes #1459

  This reverts part of the changes in #1203. There the `SpkTxOutIndex<(K,u32)>` was changed to `SpkTxOutIndex<(DescriptorId, u32>)`. This led to a complicated translation logic in  `KeychainTxOutIndex` (where the API is based on `K`) to transform calls to it to calls to the underlying `SpkTxOutIndex` (which now indexes by `DescriptorId`). The translation layer was broken when it came to translating range queries from the `KeychainTxOutIndex`. My solution was just to revert this part of the change and remove the need for a translation layer (almost) altogether. A thin translation layer remains to ensure that un-revealed spks are filtered out before being returned from the `KeychainTxOutIndex` methods.

  I feel like this PR could be extended to include a bunch of ergonomics improvements that are easier to implement now. But I think that's the point of #1451 so I held off and should probably go and scope creep that one instead.

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### Bugfixes:

  * [x] This pull request breaks the existing API
  * [x] I've added tests to reproduce the issue which are now passing
  * [x] I'm linking the issue being fixed by this PR

ACKs for top commit:
  evanlinjin:
    ACK 8dd1744

Tree-SHA512: 283e6b6d4218902298e2e848fe847a6c85e27af4eee3e4337e3dad6eacf9beaa08ac99b1dce7b6fb199ca53931e543ea365728a81c41567a2e510cce77b12ac0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Reintroduce descriptor data to bdk::Wallet persistence
6 participants