Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid shell out tar for genesis archive creation #3079

Merged
merged 3 commits into from
Oct 11, 2024

Conversation

steviez
Copy link

@steviez steviez commented Oct 4, 2024

Problem

Spawning a subprocess to utilize tar incurs some overhead. Moreso, it introduces some variability based on what version of tar the caller happens to have available in PATH

Summary of Changes

Use the tar crate instead of invoking tar in a subprocess.

Fixes #

@steviez
Copy link
Author

steviez commented Oct 4, 2024

I believe this change is the right thing to do regardless. However, the motivation for looking at this was that many people had reported issues running solana-test-validator, seemingly related to variation on what version of tar is available (per solana-labs#34625).

I was unable to reproduce the issue running solana-test-validator locally, with both v1.18 and v2.0 versions. My local setup:

$ sysctl -a | grep machdep.cpu.brand_string
machdep.cpu.brand_string: Apple M3 Max

$ sw_vers
ProductName:		macOS
ProductVersion:		14.6.1
BuildVersion:		23G93

$ tar --help | head -n 1
tar(bsdtar): manipulate archive files

It seems that I'm not the only one who was unable to reproduce, such as #2838 (comment). I'm probably going to try to track someone down who was able to reproduce so I can validate that the test validator issue is resolved with this change

@ferric-sol
Copy link

ferric-sol commented Oct 5, 2024

@steviez I can reproduce:

➜  rocket sysctl -a | grep machdep.cpu.brand_string

machdep.cpu.brand_string: Apple M1 Pro
➜  rocket sw_vers
ProductName:		macOS
ProductVersion:		15.0
BuildVersion:		24A335
➜  rocket tar --help | head -n 1
tar(bsdtar): manipulate archive files
➜  rocket
➜  rocket solana-test-validator

Notice! No wallet available. `solana airdrop` localnet SOL after creating one

Ledger location: test-ledger
Log: test-ledger/validator.log
Error: failed to start validator: Failed to create ledger at test-ledger: io error: Error checking to unpack genesis archive: Archive error: extra entry found: "._genesis.bin" Regular
➜  rocket rm -rf test-ledger
➜  rocket solana-test-validator

Notice! No wallet available. `solana airdrop` localnet SOL after creating one

Ledger location: test-ledger
Log: test-ledger/validator.log
Error: failed to start validator: Failed to create ledger at test-ledger: io error: Error checking to unpack genesis archive: Archive error: extra entry found: "._genesis.bin" Regular

oddly works on the master branch and the steviez:bstore_no_tar_shellout branch:

➜  solana-no-tar git:(master) rm -rf test-ledger
➜  solana-no-tar git:(master) solana-test-validator

Notice! No wallet available. `solana airdrop` localnet SOL after creating one

Ledger location: test-ledger
Log: test-ledger/validator.log
Error: failed to start validator: Failed to create ledger at test-ledger: io error: Error checking to unpack genesis archive: Archive error: extra entry found: "._genesis.bin" Regular


➜  solana-no-tar git:(master) validator/solana-test-validator
+ exec cargo run --manifest-path=validator/Cargo.toml --bin solana-test-validator --
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.97s
     Running `target/debug/solana-test-validator`

Notice! No wallet available. `solana airdrop` localnet SOL after creating one

Ledger location: test-ledger
Log: test-ledger/validator.log
⠂ Initializing...                                                                                                                                                                                                                                              Waiting for fees to stabilize 1...
Identity: 5My4TkDSM7ZkSRqCeT4KPdSmFtV3AyjRhhgxdmhTVbSK
Genesis Hash: 6fLRk1NmfA4Mwm5sJYdKFZgtCjNCxj8zqhsdBJMeMVfL
Version: 2.1.0
Shred Version: 38337
Gossip Address: 127.0.0.1:1024
TPU Address: 127.0.0.1:1027
JSON RPC URL: http://127.0.0.1:8899
WebSocket PubSub URL: ws://127.0.0.1:8900
⠋ 00:00:12 | Processed Slot: 20 | Confirmed Slot: 20 | Finalized Slot: 0 | Full Snapshot Slot: - | Incremental Snapshot Slot: - | Transactions: 19 | ◎499.999905000

@steviez
Copy link
Author

steviez commented Oct 8, 2024

Thanks for trying this out @ferric-sol ! I am a bit confused as well as to what we would have changed between v1.18.25 and the tip of master. Per our DM's, it sounds like you are willing to git bisect this one. If so, thank you very much and lemme know when you're able to get to it / how it goes

@steviez
Copy link
Author

steviez commented Oct 10, 2024

After digging around, solana-labs#35213 looked like a probable culprit for change in behavior; @ferric-sol did the git bisect and confirmed this to be the case. This change was pushed to master after v1.18 was cut; the reports I have seen of the issue have been with <= v1.18.

It is still a mystery to me as to why the issue would show on an M1, but not my M3. In any case, the issue goes away when we do NOT shell out, so this change is still good to go

Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Are there tests for genesis archive creation? Or iow, how was this change tested?

@steviez
Copy link
Author

steviez commented Oct 11, 2024

Oops, never hit enter to send my response

Code looks good. Are there tests for genesis archive creation? Or iow, how was this change tested?

The function create_new_ledger() unpacks the archive; however, it doesn't check the actual contents of what was unpacked. The multinode demos exercise this by way of having additional validators download and unpack genesis.tar.bz2; if the additional nodes were unable to reconstruct proper genesis, those tests would fail.

That being said, I extended an existing unit test to simply read both genesis.bin and genesis.tar.bz2 and compare to the genesis config we used to create ledger. That is done with 0f22dac

Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@steviez steviez merged commit 8f0465f into anza-xyz:master Oct 11, 2024
49 checks passed
@steviez steviez deleted the bstore_no_tar_shellout branch October 11, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants