Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockchain sync: reduce disk writes from 2 to 1 per tx #9135

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeffro256
Copy link
Contributor

@jeffro256 jeffro256 commented Jan 24, 2024

Summary

Pros:

  • During sync, instead of performing 1 write, then 1 read, then one write for each tx in the chain, we just write once. This increases the lifespan of the disk and speeds up badly buffered / not buffered I/O. On a newer NVME and with a Ryzen 9 3900X, blockchain sync was around 3-4% faster. Differences will be more pronounced for systems bottle-necked by disk speed.
  • This PR is backwards compatible to receive NOTIFY_NEW_BLOCK commands, but the code paths between handle_notify_new_block and handle_notify_new_fluffy_block are merged for less code surface and review time.

Cons:

  • Complicated review

Hopefully this will move monerod towards being slightly more workable for hard drives in the future.

Design

New: cryptonote::ver_non_input_consensus()

I have created a function cryptonote::ver_non_input_consensus() in tx_verification_utils that checks all consensus rules for a group of transactions besides the checks in Blockchain::check_tx_inputs(). For Blockchain::handle_block_to_main_chain, this is the condition that txs must satisfy before being attempted to be checked for inputs and added to blocks. This function is the most important component that MUST be correct or otherwise chain splits / inflation could occur. To audit the correctness of this function, start at the function cryptonote::core::handle_incoming_txs() in the old code and step through of the rules checked until the end of the function cryptonote::tx_memory_pool::add_tx(). cryptonote::ver_non_input_consensus() should cover all of those rules.

Modified: core::handle_incoming_tx[s]()

Before, cryptonote::core::handle_incoming_txs() was responsible for parsing all txs (inside blocks and for pool), checking their semantics, passing those txs to the mempool, and notifying ZMQ. Now, cryptonote::core::handle_incoming_txs() is deleted and there is only cryptonote::core::handle_incoming_tx(). cryptonote::core::handle_incoming_tx() is now basically just a wrapper around tx_memory_pool::add_tx(), additionally triggering ZMQ events, and is only called for new transaction notifications from the protocol handler (not block downloads).

Modified: tx_memory_pool::add_tx()

All of the consensus checks besides Blockchain::check_tx_inputs() inside of add_tx() were removed and replaced with a call to cryptonote::ver_non_input_consensus(). The relay checks remain the same.

Modified: Blockchain::add_block()

add_block() now takes a structure called a "pool supplement" which is simply a map of TXIDs to their corresponding cryptonote::transaction and transaction blob. When handle_block_to_main_chain attempts to take transactions from the transaction pool to add a new block, if that fails, then it falls back on taking txs from the pool supplement. The pool supplement has all the non-input consensus rules checked after the PoW check is done. If the block ends up getting handled in Blockchain::handle_alternative_block, then the pool supplement transactions are added to the tx_memory_pool after their respective alt PoW checks.

Modified: t_cryptonote_protocol_handler::handle_notify_new_fluffy_block()

The main difference with this function now is that we construct a pool supplement and pass that to core::handle_incoming_block() instead of calling core::handle_incoming_txs() to add everything to the mempool first.

Modified: t_cryptonote_protocol_handler::try_add_next_blocks()

The changes are very similar to the changes made to handle_notify_new_fluffy_block.

Modified: t_cryptonote_protocol_handler::handle_notify_new_block()

Before, this function has separate handling logic, but now we just convert the NOTIFY_NEW_BLOCK request into a NOTIFY_NEW_FLUFFY_BLOCK request and call handle_notify_new_block with it. This saves us having to make the same changes to both code paths.

@jeffro256
Copy link
Contributor Author

jeffro256 commented Jan 25, 2024

I'm thinking about having core::handle_incoming_txs basically do nothing except pass the tx to tx_memory_pool::add_tx, which then passes the transaction through the verify_pool_supplement tests. This would make the code so much more robust against future discrepancy between changes to the pool rules and the verify_pool_supplement rules, but it would require some more refactoring.

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stopped my review because I (think) found a breaking change to ZMQ - this no longer broadcasts a transaction first seen in a new block. This case is explicitly mentioned in the docs. It looks like txes are still broadcast while chain sync is occurring, so this breaking change makes things real inconsistent.

I think you'll have to carry around a ZMQ object until handle_main_chain to get this working. This could arguable improve how alternate block txes are handled (by not broadcasting them), but then there is the reorg case where txes are seen for the first time on the reorg.

I'm not certain how hard this is to hack together, and I hope we don't have to revert the docs (and thereby make it hell on downstream projects).

contrib/epee/include/net/levin_protocol_handler.h Outdated Show resolved Hide resolved
src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Show resolved Hide resolved
fee = tx.rct_signatures.txnFee;
}

const uint64_t fee = get_tx_fee(tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now throws on error instead of returning false. Worth a try/catch (or is this verified elsewhere before this) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be verified inside ver_non_input_consensus(), but it's worth double checking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see another check for the fee, just a check for overflow on inputs and outputs, done separately for each.

I'm not certain how an exception escaping this function alters the behavior of the code (you'd probably have a better idea than me at this point).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In core::check_tx_semantic, we check input overflow, output overflow, but also that inputs sum > outputs sum for v1 transactions.

src/cryptonote_core/tx_pool.cpp Show resolved Hide resolved
src/cryptonote_core/tx_verification_utils.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
@jeffro256 jeffro256 force-pushed the bc_sync_skip_mempool branch 2 times, most recently from 10b0c2b to fe370e9 Compare January 26, 2024 23:47
Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more confident that this will work. But I will probably do a third pass after your responses to these questions.

src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
fullConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});
}
LOG_DEBUG_CC(context, "PEER SUPPORTS FLUFFY BLOCKS - RELAYING THIN/COMPACT WHATEVER BLOCK");
fluffyConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});
Copy link
Contributor

@vtnerd vtnerd Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is forcing fluffy blocks on someone that explicitly requested no fluffy blocks. But losing chain sync until they disable the flag is basically the same thing with more steps.

Copy link
Collaborator

@0xFFFC0000 0xFFFC0000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffro256 Putting the benchmark results I sent in DM here, until we find which operations actually causing the slow-down. results-2500blocks-5iter.txt

@0xFFFC0000
Copy link
Collaborator

New performance results: the performance problem of pop_blocks specifically related to this PR has been fixed in the new push. The only remaining part is we still have a little bit of performance drop for sync operation. I am attaching the file in case anyone wants to check it.

results-2500blocks-5iter-v2.txt

@jeffro256
Copy link
Contributor Author

I think I've found a reason why the sync time of this PR looks slower than the sync time of master that test script: between the call to pop_blocks and flush_txpool, which is several seconds in some cases, the master node can use the popped txs already inside the mempool to skip most the checks (especially Blockchain::check_tx_inputs) before validating a block which gives it a significant boost. This state won't happen organically during normal sync, so this test script doesn't quite capture the normal behavior during sync when you didn't already have the txs in the mempool.

To fix the script, instead of doing:

  1. pop_blocks
  2. flush_txpool
  3. Wait for sync

You could do:

  1. Start monerod offline
  2. pop_blocks
  3. flush_txpool
  4. stop_daemon
  5. Start monerod online
  6. Wait for sync

This does have the downside of including the start-up time as the sync time, and the choice of peers on new instance may affect the speed at which it syncs, but you could minimize these effects by increasing the number of popped blocks.

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Mainly curious about your response to one more ZMQ related thing - I think we'll have to accept it as a new "feature" of the design.

src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
@@ -1196,7 +1198,7 @@ bool Blockchain::switch_to_alternative_blockchain(std::list<block_extended_info>
block_verification_context bvc = {};

// add block to main chain
bool r = handle_block_to_main_chain(bei.bl, bvc, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marker for me to remember to investigate further. The false bool was to prevent notifications of a failed reorg - hopefully the new code retains the same consideration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have an open question about this - see #6347 . But it looks like notify was being ignored, and this is just cleanup on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notify was being ignored with the newest versions of master, this PR reflects that behavior, but I don't know if this was the original intended behavior... it looks like it isn't

src/cryptonote_core/blockchain.cpp Show resolved Hide resolved
src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
@@ -390,20 +329,29 @@ namespace cryptonote
++m_cookie;

MINFO("Transaction added to pool: txid " << id << " weight: " << tx_weight << " fee/byte: " << (fee / (double)(tx_weight ? tx_weight : 1)) << ", count: " << m_added_txs_by_id.size());
if (tvc.m_added_to_pool && meta.matches(relay_category::legacy))
m_blockchain.notify_txpool_event({txpool_event{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this now notifies during a "return" to txpool call, where it wasn't being notified in that situation previously. The documentation doesn't list anything about this particular case, so we may have to accept this change. Its a rather rare edge case anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this conditional on !kept_by_block which would prevent notifications on return to txpool and reorgs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind that comment, this would cause alt block handling to not notify

fee = tx.rct_signatures.txnFee;
}

const uint64_t fee = get_tx_fee(tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see another check for the fee, just a check for overflow on inputs and outputs, done separately for each.

I'm not certain how an exception escaping this function alters the behavior of the code (you'd probably have a better idea than me at this point).

const crypto::hash &tx_hash = new_block.tx_hashes[tx_idx];

blobdata tx_blob;
std::vector<blobdata> tx_blobs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick on performance, you can move tx_blobs and missed_txs before the loop, and .clear() right here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a readability thing for me, but I personally don't like making the scope of variables any wider than it needs to be, especially for such an already complex function. If there's evidence that it measurably impacts performance, however, I would definitely be okay changing it to what you're suggesting.

relevant link

@jeffro256
Copy link
Contributor Author

Okay @vtnerd the last commits should hopefully handle ZMQ tx notifications better. We only notify A) when an incoming relayed transaction is new and added to the pool, B) a tx from a pool supplement was used to add a block, or C) an alt block contained a new tx and it was added to the pool.

Copy link
Collaborator

@j-berman j-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking solid -- I have mostly nits + 1 comment on the latest zmq change


// Cache the hard fork version on success
if (verified)
ps.nic_verified_hf_version = hf_version;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ps is const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nic_verified_hf_version is marked mutable


const std::unordered_set<crypto::hash> blk_tx_hashes(blk.tx_hashes.cbegin(), blk.tx_hashes.cend());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a check here that blk_entry.txs.size() == blk_tx_hashes.size() && blk_tx_hashes.size() == blk.tx_hashes.size()

This guarantees there aren't duplicates and that all blk_tx_hashes will map 1-to-1 with tx_entries. I can't find if this exact check is done somewhere else (probably is), but figure this would be a good early place for it anyway (either here or immediately after make_pool_supplement_from_block_entry inside try_add_next_blocks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a check in make_pool_supplement_from_block_entry that all deserialized transactions belong to that block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!blk_tx_hashes.count(tx_hash) in the make_pool_supplement_from_block_entry above this one checks that for all tx_entries, there's at least 1 matching block hash. Strictly going off that check (and ignoring all other code), it appears there could still be duplicates in this section's blk_entry.txs and blk.tx_hashes, and separately blk.tx_hashes could also have more hashes than are present in blk_entry.txs (which is the expected case when the make_pool_supplement_from_block_entry above handles tx_entries from a new fluffy block, not when syncing a block). In combination with the check you mentioned above, making sure all the container sizes are equal after constructing the set in this function immediately makes sure that when syncing a block, there aren't duplicate blk_entry.txs and that blk.tx_hashes captures all blk_entry.txs 1-to-1.

I don't see anything wrong with not doing the size check here, but it's a bit of a pain to verify there aren't issues surrounding this, and it seems an easy thing to check here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yeah you were right, I mistakenly thought that making sure that each tx is bound to the block would prevent dups. Technically, this also doesn't check for dups See latest commit for update. We check that for all pool supllements that that the number of txs entries is less than or equal the number of hashes. For full blocks, we check that they are equal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One edge case: if there's a dup in blk.tx_hashes, the equivalent dup in blk_entry.txs, and an extra hash in blk.tx_hashes, then the function would still return true with the dup included in the pool_supplement

Also checking blk_tx_hashes.size() == blk.tx_hashes.size() should prevent that

fullConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});
}
LOG_DEBUG_CC(context, "PEER SUPPORTS FLUFFY BLOCKS - RELAYING THIN/COMPACT WHATEVER BLOCK");
fluffyConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-fluffy-blocks meant a node wouldn't send fluffy blocks to its peers, not that a node wouldn't be relayed fluffy blocks from its peers (can quickly sanity check with monerod --no-fluffy-blocks --log-level 1 and see that new blocks are still received as fluffy blocks). Someone would have to manually build a v18 monerod that sets the final bit of m_support_flags to 0 in order to ask peers not to relay fluffy blocks to their node.

So to be clear, this PR as is shouldn't prevent current nodes on the network using any v18 release version of monerod from syncing/relaying/processing blocks, even nodes with the --no-fluffy-blocks flag (which still receive fluffy blocks today anyway).

Maybe the log could say "RELAYING FLUFFY BLOCK TO PEER" instead of "PEER SUPPORTS FLUFFY BLOCKS" because it's no longer checking if the peer supports fluffy blocks via the support_flags.

src/rpc/core_rpc_server.cpp Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_core.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/tx_pool.cpp Show resolved Hide resolved
Copy link
Collaborator

@j-berman j-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been running the latest for weeks now, running smooth on my end.

I've also combed through these changes many times now -- thanks for your work on this.

Minor comments in this latest review round, I'm ready to approve to after this.

src/blockchain_utilities/blockchain_import.cpp Outdated Show resolved Hide resolved
src/cryptonote_protocol/cryptonote_protocol_handler.inl Outdated Show resolved Hide resolved
tests/functional_tests/p2p.py Outdated Show resolved Hide resolved
res = daemon2.get_transactions([txid])
assert len(res.txs) == 1
tx_details = res.txs[0]
assert not tx_details.in_pool
Copy link
Collaborator

@j-berman j-berman Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails sporadically on this line on my local. I investigated, it looks like an existing bug unrelated to this PR.

If the transfer includes an output in its ring that unlocked in block N, after popping blocks to N-2, that tx is no longer a valid tx because that output isn't unlocked yet (it fails here). You'd expect that once the chain advances in daemon2.generateblocks above, then the tx becomes valid again and should therefore be included in a later block upon advancing, but it looks like this if statement is incorrect:

//if we already failed on this height and id, skip actual ring signature check
if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))
return false;

And it should instead be:

//if we already failed on this height and id, skip actual ring signature check
if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height) && txd.last_failed_height >= m_blockchain.get_current_blockchain_height())
  return false;

The ring sigs can become valid again if we're at a higher height than when the tx originally failed, so it should pass that if statement and continue on to the check_tx_inputs step again if so.

EDIT: slight edit to support a reorg making a tx valid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about that check being wrong. However, even your proposed changes aren't conservative enough if you want to handle popped blocks: if the chain purely goes backwards in time (which only normally happens when pop_blocks is called), a transaction output with a custom unlock_time might actually UNLOCK. This is because Blockchain::get_adjusted_time() is not monotonic, so an output that is unlocked now may become locked again in a future block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so this should be good:

if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height) && txd.last_failed_height == m_blockchain.get_current_blockchain_height()-1)

Copy link
Contributor Author

@jeffro256 jeffro256 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should be good. If we wanted to get incredibly pedantic, we would also have to check that the hard fork version is greater than or equal to HF_VERSION_DETERMINISTIC_UNLOCK_TIME, since your system's wall-time might also not be monotonic, and consensus validation of a tx with a ring containing an output with a UNIX-interpreted unlock_time isn't necessarily deterministic. But I don't think we should worry about that case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should go with if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height) && txd.last_failed_height == m_blockchain.get_current_blockchain_height()-1) IMO, since that wall-time thing won't affect any future or past transactions, it's only a technicality.

tests/functional_tests/p2p.py Show resolved Hide resolved
Copy link
Collaborator

@j-berman j-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving changes that include this commit: jeffro256@c388e12

Seems to be a github bug the PR doesn't include that commit

@selsta
Copy link
Collaborator

selsta commented Apr 18, 2024

I applied this pull request locally and comments 5 commit is missing... not sure what's going on

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really close, but I had a questions, in tx_pool.cpp in particular.

@@ -5349,6 +5433,12 @@ void Blockchain::set_user_options(uint64_t maxthreads, bool sync_on_blocks, uint
m_max_prepare_blocks_threads = maxthreads;
}

void Blockchain::set_txpool_notify(TxpoolNotifyCallback&& notify)
{
std::lock_guard<decltype(m_txpool_notifier_mutex)> lg(m_txpool_notifier_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to use boost::lock_guard instead, as we typically use boost for thread related things. I don't think it matters in this case; the suggestion is mostly for aesthetics/consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used std::lock_guard to not further cement Boost dependencies, and since std::mutex and std::lock_guard are already used within the codebase, I think it shouldn't affect binary size. However, I'm not incredibly opinionated either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using it all over the place.

@@ -5367,6 +5457,22 @@ void Blockchain::add_miner_notify(MinerNotifyCallback&& notify)
}
}

void Blockchain::notify_txpool_event(std::vector<txpool_event>&& event)
{
std::lock_guard<decltype(m_txpool_notifier_mutex)> lg(m_txpool_notifier_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

if (find_tx_failure)
MERROR_VER("BUG: Block with id: " << id << " failed to take tx: " << tx_id);
}
else if (extra_block_txs.txs_by_txid.count(tx_id)) // next try looking in the block supplement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should be a .find with a .end() check. This implementation does 4 lookups (.count(), .operator[] (twice), and .erase).

Copy link
Contributor Author

@jeffro256 jeffro256 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest commit


bool find_tx_failure = true;
// we lock to keep consistent state between have_tx() and take_tx()
epee::critical_region_t<tx_memory_pool> txpool_lock_guard(m_tx_pool);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the lock+have_tx+take_tx, why not perform a single take_tx call first, and then look in extra_block_txs when that fails? The only downside is a log statement in take_tx.

Copy link
Contributor Author

@jeffro256 jeffro256 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check latest commit. I added a flag in take_tx called suppress_missing_msgs which defaults to false, but it true here. That removed the extra locking and call to have_tx without spamming the console with thousands of bogus messages.

const blobdata &tx_blob = extra_block_tx.second.second;

tx_verification_context tvc{};
CRITICAL_REGION_LOCAL(m_tx_pool);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put this lock before the loop? This is held through the bulk of the work anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit


if(txd.last_failed_id != null_hash && m_blockchain.get_current_blockchain_height() > txd.last_failed_height && txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))
return false;//we already sure that this tx is broken for this height
if (txd.last_failed_id == top_block_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is incorrect. You need something like:

  if (txd.last_failed_height && txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))
    return false;

The first check is needed because null_hash is technically a valid value (but exceptionally rare). I think the original code should've included this.

The txd.get_current_blockchain_height() > txd.last_failed_height check can probably be removed.

However, the last check is the most important - this appears to be tracking/caching whether a tx inputs are invalid after a certain height. Your change here will force a check_tx_inputs check every new block, instead of only after a reorg.

Copy link
Contributor Author

@jeffro256 jeffro256 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change here will force a check_tx_inputs check every new block, instead of only after a reorg.

Yes, this was the intended goal. The act of checking tx unlock times against a non-monotonic moving value of get_adjusted_time() makes it so that a transaction can pass check_tx_inputs at block X, but fail at block Y>X, and then re-pass at block Z>Y. This is discussed further at monero-project/meta#966 (comment).

@@ -1139,9 +1069,6 @@ namespace cryptonote

time_t start_time;

std::unordered_set<crypto::hash> bad_semantics_txes[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been around since 2017, and its removal isn't strictly needed for this PR. I would keep it, and analyze later in a separate PR.

The locations of the inserts/finds will have to move slightly, but it's still possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be harder to review that it is removed or that the new updates to the code are correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought would be that it is harder to review with it removed. I'd have to dig into why it was added to make sure that its removal isn't going to affect anything.

}
}
//if we here, transaction seems valid, but, anyway, check for key_images collisions with blockchain, just to be sure
if(m_blockchain.have_tx_keyimges_as_spent(lazy_tx()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check was dropped, but it looks like both uses of is_transaction_ready_to_go also do this check. Is that the reason this was dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and inside check_tx_inputs

@jeffro256
Copy link
Contributor Author

Rebased to fix conflicts on CORE_RPC_VERSION_MINOR definition

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really liking this review, it cleaned up a lot, but I think I found one last issue (maybe).

{
try
{
m_txpool_notifier(event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be std::move(event) otherwise a copy occurs here.

@@ -1139,9 +1069,6 @@ namespace cryptonote

time_t start_time;

std::unordered_set<crypto::hash> bad_semantics_txes[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought would be that it is harder to review with it removed. I'd have to dig into why it was added to make sure that its removal isn't going to affect anything.

const blobdata &tx_blob = extra_block_tx.second.second;

tx_verification_context tvc{};
if ((!m_tx_pool.have_tx(txid, relay_category::all) &&
Copy link
Contributor

@vtnerd vtnerd Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changed from existing behavior. This might be intended.

The existing code calls have_tx(txid, relay_category::legacy) in core::handle_incoming_txs, and then does an add_tx with relay_method::block. The tx metadata then gets "upgraded" to relay_method::block from all other relay method types. This new relay_category::all check prevents an upgrade from happening.

This is problematic because the RPC channels (amongst other places) will continue "hiding" the tx if in relay_method::stem mode, when it needs to begin reporting the tx as seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants