Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SpEnoteStore serializable #25

Open
wants to merge 37 commits into
base: seraphis_lib
Choose a base branch
from

Conversation

DangerousFreedom1984
Copy link

@DangerousFreedom1984 DangerousFreedom1984 commented Nov 22, 2023

The idea is to make the SpEnoteStore class serializable so we can store/load it into/from files, as it is really important for the wallet to recover the owned/spent enotes.
I tried to keep the serialization pattern used in seraphis (using binary_archive) and I did the following modifications:

  • Added new ser_* structs into serialization_demo_types.h
  • Added the corresponding functions at serialization_demo_utils
  • Modified the checkpoint_cache.* and removed the const of the variables there
  • Modified the enote_store:
    • added setter/getter functions to modify the private variables
  • Added the comparison operator for all the used serializable structs/classes
  • Added one unit_test to test the enote_store serialization

Depends upon:
monero-project#9069
monero-project#9077

Copy link

@rbrunner7 rbrunner7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took quite some work to grind through all those classes, nice that you took that on you, so we can just profit and have it comfortable :)

Nice code, but I had a number of question marks popping up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question to further my knowledge: What's the connection between serialization and equality comparison operators? Are they needed for serialization in one way or another?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not needed for the serialization. The only purpose is testing. Maybe they could be useful too later.

/// setters
void set_from_serializable(const serialization::ser_CheckpointCache &serializable);
// void set_min_checkpoint_index(std::uint64_t min_checkpoint_index) {m_min_checkpoint_index = min_checkpoint_index;}
bool operator==(const CheckpointCache&other);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little nitpick: CheckpointCache&other looks strange to me without any blank. Saw it alread in one or two other places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will fix it.


/// stored checkpoints
std::map<std::uint64_t, rct::key> m_checkpoints;
serializable_map<std::uint64_t, rct::key> m_checkpoints;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure my understanding is correct, but isn't turning this into serializable_map a "creeping in" of serialization into the library class that we want to avoid, with great effort, by using special serialization classes like here ser_CheckpointCache?

Not sure how it's already done as a pattern in other similar cases, and not sure whether having to copy maps one element at a time because of different type between this class and ser_CheckpointCache is a good trade-off for absolute "purity" here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to avoid the classes from serialization/containers.h ? We are still using binary_archive with them. What do we want avoid actually?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am not the best person to ask, because I myself as the "master" over the Seraphis library probably wouldn't go through all the complications with those ser_ classes and add serialization instrumentation directly to the Seraphis library struct and classes ...

But now, as we do separate, I think we could well go the extra mile and keep any serialization stuff out of the Seraphis library, as a matter of principle, and for not mixing things.

Maybe one day somebody will want to serialize Seraphis library structures with a completely different technology than binary_archive where, at least in theory, such a serializable_map would be not helpful in a good case and a problem with some bad luck.

@@ -105,19 +106,22 @@ class SpEnoteStore final
bool has_enote_with_key_image(const crypto::key_image &key_image) const;
/// get the legacy [ legacy identifier : legacy intermediate record ] map
/// - note: useful for collecting onetime addresses and viewkey extensions for key image recovery
const std::unordered_map<rct::key, LegacyContextualIntermediateEnoteRecordV1>& legacy_intermediate_records() const
const serializable_unordered_map<rct::key, LegacyContextualIntermediateEnoteRecordV1>& legacy_intermediate_records() const

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question, of course, of using serializable things in Seraphis library objects as in the checkpoint cache.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer :p

std::vector<unsigned char> tx_extra;

BEGIN_SERIALIZE_OBJECT();
FIELD(block_index);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we, quite in general, version all these classes with a version field? I can imagine smaller changes in the Seraphis enote origin context class that alone will not yet lead to SpEnoteOriginContextV2 but will lead to small changes in ser_SpEnoteOriginContextV1 and its serialization that we absolutely want to handle cleanly with the help of a version field?

Consider that the very simple serialization format used here has no meta info stored together with the data, thus without version fields you would have to resort to ugly heuristics in such cases.

I wonder whether we should treat any new such SERIALIZE_OBJECT block without VERSION_FIELD as deffective and have version fields strictly mandatory ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. If we change the type of one variable for example then we wouldn't know that in the serialization. On the other hand, if you dont break the serialization you dont need to know if changes were made. Since it would be deserializing correctly again. I can't think of an example of 'small changes' now that would really require adding a version field. If the changes are big (for example adding another field element) then we would need to have another ser_struct. Can you think of an example to better support the use of a version field?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, imagine you need one more uint64 in that struct. Or origin_status changes from char to int. Or something similar. The issue is: How would you read "old" files? Files that do not yet have have that extra int? How would you know, without seeing "Ah, this is still version 0, not yet version 1 with the new int", that you have to read the old data differently?

See here for a nice example in the existing codebase.

END_SERIALIZE();
};

using ser_LegacyEnoteVariant = tools::variant<ser_LegacyEnoteV1,ser_LegacyEnoteV2, ser_LegacyEnoteV3, ser_LegacyEnoteV4, ser_LegacyEnoteV5>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may misremember, but I have something at the back of my mind that either @jeffro256 or @j-berman once explained that before variants correctly serialize with binary_archive you have to add some additional declarations somewhere for them. Stuff like this that is now in the file cryptonote_basic.h:

VARIANT_TAG(binary_archive, cryptonote::txin_to_key, 0x2);

As I do not see you adding something like that: Am I mistaken? Is that not needed in your cases of variants?


Just now saw your own question regarding this: "(why) do I need to use the VARIANT_TAG when serializing variants?" A guess: It works without such tags, but only up to the first change, when you have to support more variants?

ser_enote.view_tag = enote.view_tag.data;
serializable_enote_variant_out = ser_enote;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all my own programming I use to add asserts at the end of such constructs, here with a final else, so if we ever have LegacyEnoteV6 and I forget to adjust this code to handle it, in addition to maybe 20 other places that have to deal with that 6th type, I will immediately notice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree. I will add the final conditional.

enote_record_out.amount = enote_record.amount;
enote_record_out.amount_blinding_factor = enote_record.amount_blinding_factor;
enote_record_out.key_image = enote_record.key_image;
memcpy(enote_record_out.address_index.bytes, enote_record.address_index.bytes, sizeof(enote_record.address_index));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That memcpy looks suspicious to me, and I wonder why such a blunt "hammer" should be necessary to just copy something, but I don't know the codebase enough to really judge. Maybe just have a good look again?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure. I didnt think much about it and just followed the way koe did it. I do also think that there is a better way to do it.

make_serializable_sp_enote_spent_context_v1(enote_record.spent_context, enote_record_out.spent_context);
}
//-------------------------------------------------------------------------------------------------------------------
void make_serializable_checkpointcacheconfig(const CheckpointCacheConfig &cache_config, ser_CheckpointCacheConfig &cache_config_out)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be make_serializable_checkpoint_cache_config?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so as the struct has just one big name. But yeah, looks better with underlines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, you test converting between the "real" classes and the ser_ classes, which is certainly nice, but you do not test the actual serialization to bytes itself. Wouldn't it be important to test that as well, with a binary_archive and a memory stream for example? I mean, just one forgotten FIELD line, and all goes wrong, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. Actually my code is not working as it is now. I just spotted many mistakes which I will correct soon. Definitely I need more unit_tests. I will work on them.

@jeffro256
Copy link

jeffro256 commented Nov 28, 2023

Hey @DangerousFreedom1984! I wrote a PR here monero-project#9077 that would allow you to serialize/deserialize the STL containers directly, without needing the serializable_* wrappers. This will let you address @rbrunner7's comment about making the SpEnoteStore too tightly coupled with the specific type of serialization.

You can simply use FIELD() etc with std::map, std::set, std::vector, etc, no wrapper needed.

@DangerousFreedom1984
Copy link
Author

Hey @DangerousFreedom1984! I wrote a PR here monero-project#9077 that would allow you to serialize/deserialize the STL containers directly, without needing the serializable_* wrappers. This will let you address @rbrunner7's comment about making the SpEnoteStore too tightly coupled with the specific type of serialization.

You can simply use FIELD() etc with std::map, std::set, std::vector, etc, no wrapper needed.

Thank you @jeffro256 ! I think it will look much better with the changes that you propose. I will rebuild this considering that your changes will be approved.
One question from my side: do you think that we need a file that defines the serializable types (like the seraphis_impl/serialization_demo_types.h) or should we define the serializable structs directly where they are defined? What would be the advantages/disadvantages of them?

It looks clean to me as it is now because if I want to serialize something, I just need to search for the serializable definition and then apply the corresponding functions to do it. On the other hand, all structs (or almost all) should be serializable and it should not be a big deal to call the serialization functions on them. It also seems to me that it is more efficient to do that directly. From my understanding now, I would vote to not have these ser_* structs, what do you think?

@rbrunner7
Copy link

I would vote to not have these ser_* structs, what do you think?

It was the decision of @UkoeHB to not directly equip the Seraphis library classes with serialization capability, under the impression of existing serialization code that pretty much degenerated into a mess over time, so it may indeed be a good idea to make sure that this never happens to parts of the Seraphis library.

I for one have the tendency to grant him as the architect a strong vote in such matters that we shouldn't overrule lightly.

And as I argued already elsewhere, there are a number of serialization technologies around, that also may get used for the Seraphis classes, which would be another argument for being "agnostic" in the library itself.

jeffro256 added a commit to jeffro256/monero that referenced this pull request Nov 28, 2023
Some downstream code (most notably PR UkoeHB#25) wants to use the src/serialization lib
for storing information persistently. When one builds classes/machines wishing to serialize containers, they must use
the `serializable_*` container classes. In this case, this makes the Seraphis library code unnecessarily tightly coupled
with the src/serialization code since one cannot swap out their type of storage format without major refactoring of class
field types. By serializing STL containers directly, we can abstract the serialization details away, making for much
cleaner design. Also small bonus side effect of this change is that STL containers with custom Comparators, Allocators,
and Hashers are serializable. `std::multimap` is added to the list of serializable containers.

Depends upon monero-project#9069.
@UkoeHB UkoeHB force-pushed the seraphis_lib branch 2 times, most recently from d493a6d to 279be2b Compare February 9, 2024 00:11
@DangerousFreedom1984 DangerousFreedom1984 marked this pull request as ready for review February 17, 2024 08:48
// EnoteStore serialization

// LegacyEnote types
struct ser_LegacyEnoteV1 final

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to point out that after PR monero-project#9069, you can actually write free functions for serialization, which means that you can serialize directly from that function and don't have to create new types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! So I can use something like that for (almost) all structs in seraphis?

BLOB_SERIALIZER(struct);

serialization::dump_binary(struct, blob_struct));
serialization::parse_binary(blob_struct, struct_recovered));

I will try to do so and simplify this PR. Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can do that. And for non-blob types, you can write a do_serialize() function. I might make changes to the serialization header to make this an easier process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it might be better to leave as-is for now so it matches with the rest of the file's serialization. I will probably refactor all these classes out later

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been working on that all day. Now I can serialize the enote_store and I dont use the serialization_demo_*. I'm trying to make a tx serializable now without using the serialization_demo_*. I have to fix some bugs. I will post my changes as soon as I fix it.

UkoeHB and others added 12 commits March 21, 2024 09:48
* SpTxCoinbaseV1: remove block_reward field

Not storing/serializing `block_reward` saves us a few bytes on coinbase transactions, and makes it so that you can't
initialize a coinbase transaction that has a block reward not matching its output sum.
* direct & compact tx serialization

txs are [de]serialized directly from their classes and sizes of containers are not serialized
if they can be implied.
Implement async wallet scanner.

Adds a new functional test for direct wallet2 -> live RPC daemon
interactions. This sets up a framework to test pointing the
Seraphis wallet lib to a live daemon.

Tests sending and scanning:
- a normal transfer
- a sweep single (0 change)
- to a single subaddress
- to 3 subaddresses (i.e. scanning using additional pub keys)

* scan machine: option to force reorg avoidance increment first pass

- when pointing to a daemon that does not support returning empty
blocks when the client requests too high of a height, we have to
be careful in our scanner to always request blocks below the chain
tip, in every request.
- by forcing the reorg avoidance increment on first pass, we make
sure clients will always include the reorg avoidance increment
when requesting blocks from the daemon, so the client can expect
the request for blocks should *always* return an ok height.

* core tests: check conversion tool on all legacy enote version types

Stil TODO:
- check complete scanning on all enote types
- hit every branch condition for all enote versions

* conn pool mock: epee http client connection pool

- Enables concurrent network requests using the epee http client.
- Still TODO for production:

1) close_connections
2) require the pool respect max_connections

* enote finding context: IN LegacyUnscannedChunk, OUT ChunkData

- finds owned enotes by legacy view scanning a chunk of blocks

* async: function to remove minimum element from token queue

- Useful when we want to remove elements of the token queue in an
order that is different than insertion order.

* async scanner: scan via RPC, fetching & scanning parallel chunks

*How it works*

Assume the user's wallet must start scanning blocks from height
5000.

1. The scanner begins by launching 10 RPC requests in parallel to
fetch chunks of blocks as follows:

```
request 0: { start_height: 5000, max_block_count: 20 }
request 1: { start_height: 5020, max_block_count: 20 }
...
request 9: { start_height: 5180, max_block_count: 20 }
```

2. As soon as any single request completes, the wallet immediately
parses that chunk.
  - This is all in parallel. For example, as soon as request 7
  responds, the wallet immediately begins parsing that chunk in
  parallel to any other chunks it's already parsing.
3. If a chunk does not include a total of max_block_count blocks,
and the chunk is not the tip of the chain, this means there was a
"gap" in the chunk request. The scanner launches another parallel
RPC request to fill in the gap.
  - This gap can occur because the server will **not** return a
  chunk of blocks greater than 100mb (or 20k txs) via the
  /getblocks.bin` RPC endpoint
  ([`FIND_BLOCKCHAIN_SUPPLEMENT_MAX_SIZE`](https://github.com/monero-project/monero/blob/053ba2cf07649cea8134f8a188685ab7a5365e5c/src/cryptonote_core/blockchain.cpp#L65))
  - The gap is from `(req.start_height + res.blocks.size())` to
  `(req.start_height + req.max_block_count)`.
4. As soon as the scanner finishes parsing the chunk, it
immediately submits another parallel RPC request.
5. In parallel, the scanner identifies a user's received (and
spent) enotes in each chunk.
  - For example, as soon as request 7 responds and the wallet
  parses it, the wallet scans that chunk in parallel to any other
  chunks it's already scanning.
6. Once a single chunk is fully scanned locally, the scanner
launches a parallel task to fetch and scan the next chunk.
7. Once the scanner reaches the tip of the chain (the terminal
chunk), the scanner terminates.

*Some technical highlights*

- The wallet scanner is backwards compatible with existing daemons
(though it would need to point to an updated daemon to realize the
perf speed-up).
- On error cases such as the daemon going offline, the same wallet
errors that wallet2 uses (that the wallet API expects) are
propagated up to the higher-level Seraphis lib.
- The implementation uses an http client connection pool (reusing
the epee http client) to support parallel network requests
([related](seraphis-migration/wallet3#58)).
- A developer using the scanner can "bring their own blocks/network
implementation" to the scanner by providing a callback function of
the following type as a param to the async scanner constructor:
`std::function<bool(const cryptonote::COMMAND_RPC_GET_BLOCKS_FAST::request, cryptonote::COMMAND_RPC_GET_BLOCKS_FAST::response)>`

---------

Co-authored-by: jeffro256 <[email protected]>
This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses
(amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how
the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed,
saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive
database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing
scheme.

In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming
most ring members will RingCT enotes.

We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under
the legacy indexing scheme.

Co-authored-by: SNeedlewoods <[email protected]>
@DangerousFreedom1984 DangerousFreedom1984 force-pushed the serialize_enote_store branch 2 times, most recently from d8d2fed to de93f63 Compare May 30, 2024 08:51
@DangerousFreedom1984 DangerousFreedom1984 marked this pull request as ready for review May 31, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants