Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log gRPC startup listening when it is listening #482

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

emersonian
Copy link
Contributor

@emersonian emersonian commented Apr 24, 2024

UPDATE: this PR includes only the second change mentioned below, see #484 for the first change:

Lightwalletd can take upwards of 20 minutes to start when a block cache exists on a magnetic hard drive. When lightwalletd starts up for the first time, it is able to respond to gRPC nearly immediately while the block cache synchronizes in the background. It is confusing that upon second launch the time-to-first-listen is very slow.

This pull request contains two changes:

  1. Load and maintain synchronization of the block cache in the background without blocking gRPC's first listen
  2. Only announce that gRPC is listening once it is actually listening

This is a draft open to community review. I will test this in a staging environment for a few weeks before requesting a merge.

A Docker Hub image of this branch is available for testing at: emersonian/zcash-lwd:v0.4.17-cache1


Please ensure this checklist is followed for any pull requests for this repo. This checklist must be checked by both the PR creator and by anyone who reviews the PR.

  • Relevant documentation for this PR has to be completed before the PR can be merged
  • A test plan for the PR must be documented in the PR notes and included in the test plan for the next regular release

As a note, all CI tests need to be passing and all appropriate code reviews need to be done before this PR can be merged

@emersonian
Copy link
Contributor Author

This improvement has been stable for me on a few test severs, I consider it ready to review and merge. Thanks.

@emersonian emersonian marked this pull request as ready for review April 29, 2024 19:32
@LarryRuane
Copy link
Collaborator

LarryRuane commented May 2, 2024

There's a better way to fix this, I believe. Can you check the startup performance when using this patch (applied to master):

+++ a/common/cache.go
+++ b/common/cache.go
@@ -244,12 +244,6 @@ func NewBlockCache(dbPath string, chainName string, startHeight int, syncFromHei
                offset += int64(length) + 8
                c.starts = append(c.starts, offset)
-               // Check for corruption.
-               block := c.readBlock(c.nextBlock)
-               if block == nil {
-                       Log.Warning("error reading block")
-                       c.recoverFromCorruption(c.nextBlock)
-                       break
-               }
                c.nextBlock++
        }
        c.setDbFiles

In other words, remove the deserialization and checking of each block when reading the blocks disk cache during startup. I added this check as part of #186 (2022). I remember it adding only a couple of seconds to the startup time. Either I measured incorrectly, or it's taking much longer to parse blocks, or both. This was before the sandblasting attack (creating extremely large blocks), so that most likely is the cause of the startup time increase.

When I run this patch, the log messages that say "Reading nnnnn blocks from disk cache" and "Done reading" are within the same second (as shown by their timestamps), so this should be fast enough. But please test this and let me know what you find.

This check isn't really needed because it detects corruption in the blocks file, which probably never happens. If it does get corrupted, the length fields (before each individual block) will likely be corrupted as well (the sanity check for that kind of corruption remains). The functional effect of removing this check is that if there is corruption in a serialized block, it won't be detected during startup, but is detected when we later try to read the block using a gRPC like GetBlock. That's perfectly acceptable.

@LarryRuane
Copy link
Collaborator

Closing this now because I've merged #484 which I think is a better fix. Please reopen (or open an issue) if you disagree, thanks.

@LarryRuane LarryRuane closed this Jun 13, 2024
@emersonian
Copy link
Contributor Author

Okay thanks for the startup speed improvement. Can you also merge the change from this pull request's diff that moves "Starting gRPC server version" to the line where the gRPC server is actually starting? The logging is confusing and inaccurate as-is, thanks!

@LarryRuane
Copy link
Collaborator

LarryRuane commented Jun 20, 2024

Let's reopen this, and if you don't mind, can you update the PR accordingly? Thanks!
(Or, if you'd prefer, I can make a new PR.)

@LarryRuane LarryRuane reopened this Jun 20, 2024
@LarryRuane LarryRuane changed the title Reduce gRPC startup time Log gRPC startup listening when it is listening Jun 27, 2024
@LarryRuane
Copy link
Collaborator

Force pushed to reduce this PR to just the second commit (correct logging)

@LarryRuane
Copy link
Collaborator

@emersonian - Could you review this PR now? It should take only a minute. Thanks.

@emersonian
Copy link
Contributor Author

It looks good to me, thanks! When should we anticipate the next release?

@LarryRuane
Copy link
Collaborator

@emersonian - I changed it slightly; can you review again? I think it's good to log the version and build information early (like we've always done), but just don't say that the gRPC server is starting. Then later, when we do start it, log another message saying that. Some things happen before starting the gRPC server, such as trying to execute an RPC to the backend zcashd (or zebrad), and if that fails, it would be good to know (if someone ships me a log file) what version and build they were running. Here's what the log file looks like now (with the latest force-push) if I start lightwalletd while zcashd is still starting up (note the now-separated buildDate and Starting gRPC server log lines):

$ ./lightwalletd --zcash-conf-path ~/.zcash/zcash.conf --no-tls-very-insecure --log-file /dev/stdout
{"app":"lightwalletd","buildDate":"2024-07-10","buildUser":"larry","gitCommit":"6d801b8c8f78d05d8c3c14a453454d3c6c640f78","level":"info","msg":"Starting lightwalletd process version v0.4.17-6-g6d801b8","time":"2024-07-10T11:36:37-06:00"}
{"app":"lightwalletd","level":"warning","msg":"Starting insecure no-TLS (plaintext) server","time":"2024-07-10T11:36:37-06:00"}
Starting insecure server
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":1,"time":"2024-07-10T11:36:37-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":2,"time":"2024-07-10T11:36:52-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":3,"time":"2024-07-10T11:37:12-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":4,"time":"2024-07-10T11:37:37-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":5,"time":"2024-07-10T11:38:07-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":6,"time":"2024-07-10T11:38:42-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":7,"time":"2024-07-10T11:39:22-06:00"}
{"app":"lightwalletd","error":"-28: Rewinding blocks if needed...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":8,"time":"2024-07-10T11:40:07-06:00"}
{"app":"lightwalletd","level":"warning","msg":"getblockchaininfo RPC successful","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Got sapling height 419200 block height 2566379 chain main branchID c2d6d0b4","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Reading 0 blocks (since Sapling activation) from disk cache ...","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Done reading 0 blocks from disk cache","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Starting gRPC server on 127.0.0.1:9067","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Adding block to cache 419369 0000000002959f10f2df11187a7316a977a256e0e2e5c73dca90431120efdede","time":"2024-07-10T11:41:01-06:00"}
{"app":"lightwalletd","level":"info","msg":"Adding block to cache 419817 0000000001513d6001d2ebadd058251889fb35125bc0b79b9bf7d5352127bcf3","time":"2024-07-10T11:41:05-06:00"}

As for another release, I think we should do one right away with this and a few other good changes we've had recently.

@LarryRuane LarryRuane self-assigned this Jul 17, 2024
@LarryRuane LarryRuane merged commit 033d54e into zcash:master Jul 17, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants