-
-
Notifications
You must be signed in to change notification settings - Fork 310
[ARCHIVED] Lodestar Planning Meetings
## ⚠️ This section has been deprecated for https://github.com/ChainSafe/lodestar/wiki/Lodestar-Planning-&-Standup-Meetings
The Lodestar team hosts planning meetings weekly on Fridays at 9:45am Eastern Time. These meetings allow the team to plan for the week, prioritise tasks and roadmap discussions related to Lodestar planning and outlook.
Previous 2021 Planning Meetings
Cayman
- Yamux implementation progressing, they want to deprecate mplex.
- Performance, but especially security will be top of mind
- Libp2p-gossipsub released. Opening up PR with libp2p update progress.
Lion
- Release strategy imrpovements. Seems solid.
- Developing tools for research and security to use on Lodestar
- Resource leak found on Validator. It was a typo: callback being un-subscribed is not the same as the one being subscribed
- Remote key signer API implementation. Last step to getting DappNode support.
-
PR to deprecate accounts CLI. Request for comments.
validator
namespace is new for key management. - Goal to make Lodestar compatible with the Gnosis Chain
- We need to find ways to change params without changing the source.
Tuyen
- Update
beacon_aggregate_and_proof
gossip scoring params - Gossipsub heartbeat: Improving performance test
- Skip checking known participants when publishing ContributionAndProof
Gajinder
- Eth1 Deposit Tracker issue on Ropsten. We were affected in the same issue alongside Lighthouse and Prysm.
- Our cache was not up to date, even while voting, we were applying the right time ranges for figuring out eth1 deposits.
- MEV mergemock PR changes are coming. Rebasing and WIP.
Dadaepo
- Doppelganger:
- Started testing it in Prater. It wasn't picking up indexes included via attestations in the block.
- Fix for that was breaking the benchmark test
- Process of updating the data structure we used to attract that is slow because the attestations to include is a lot.
- Will fix and continue with real life testing.
Phil
- Audit Block 1 starts June 7.
- Pitching for call at 4pm UTC
- Audit Points for Inclusion
-
Lion created milestones to add for audit inclusion Batch 1 due June 7
- Review docs this weekend + Monday ideally to ensure instructions they can pull are not stale
- We should not encourage anyone to install from NPM due to supply chain attack risks except for experimentation only.
-
Batch 2 inclusions for Audit due June 27
- Once these are merged, likely no other large refactors for the validator client
- Once remote key manager is merged, we can continue with DappNode integration
-
Lion created milestones to add for audit inclusion Batch 1 due June 7
-
v0.38.0
: In RC2 testing, we've added critical fixes to Ropsten. Now in beta for regression testing. Release decision scheduled for Monday. - https://twitter.com/lodestar_eth credentials to be in our possession shortly for Project specific content.
- QA Engineer job posting is finalized. Open for applications starting next week.
- Grant page for Gitcoin GR14 CLR round for ETH Core Infra is now up: https://gitcoin.co/grants/6034/lodestar-typescript-eth-consensus-client-by-chains
- Good target for v1.0.0 is pre-merge.
- Supporting Gnosis chain: When we load SSZ types, if there's anything that imports from
lodestar/types
, it will automatically bake in the preset mainnet vs minimal. In order to support the Gnosis chain, we have to create another preset because they change some of the parameters which breaks. How do we support it in a way where it makes sense?
Post v1.0.0 Priorities (To be updated on our Roadmap Issue):
- Any DOS Issues and Networking
- Example: Discv5 rate limiting
- Fixing Vision Strategy
- Example: We probably wouldn't survive a high forking environment
- Testing improvements
- Light Client
- API and spec implementation
- Beacon API
- Compatible and conforming to the
beacon-APIs
standard. See issue here. - Lion: We should always be aiming for whatever the latest released spec is. Currently we should strive for https://github.com/ethereum/beacon-APIs/releases/tag/v2.3.0
- Opportunity to finalize structure to prevent major breaking changes after
v1.0.0
release.
- Compatible and conforming to the
Should update the Lodestar Roadmap with these.
Cayman
- Feature test group is now testing gossipsub.
- Performance is probably worse, but we will see better scores.
- Libp2p has some additional metrics we can turn on. Will take a look to see if there's anything useful.
Tuyen
- There was an issue with handling unknown topics in gossipsub metrics. Merged today to fix metrics API: https://github.com/ChainSafe/lodestar/pull/3877
- Gossipsub scoring is looking better
- Consistently more than 20 mesh peers = good
- Experiencing performance issues with more peers & better scores now that more messages are sent and received.
- Tested in contabo, we had to limit mesh peers: Dhi, D, Dlo to be 4, 3, 2 where as default os 12, 8, 6.
- For contabo-19, I can only have 20 validators, whereas we usually have 50.
- Gossipsub peer PR need some more cleanup and will be ready to merge.
- Checking profiles when the performance is really bad is a problem. Before when there were less performance issues, we could concurrently run and check profile at the same time. Any ideas?
Gajinder
- Added approval to the standard keymanager API PR. Merged!
- Will refocus on running libp2p in the browser
- Troubleshooting with Lighthouse to decode our snappy and call data
- Small changes in Goerli shadow fork 2. Going well so far on consensus side.
Dadaepo
- Doppelganger Protection
- Adding tests, once it's done will open for first reviews
Phil
- v0.34.3 hotfix release
- Metrics look fine with Tuyen's analysis
- Will be reviewed by Cayman as well and will release if good
- Testing infrastrucure + QA Automation engineer
- Dade: For testing infrastructure, issues don't usually show up unless we run for a couple of days, could affect velocity. How should we manage this?
- Phil: We should define how an ideal testing infrastructure should look like. We want to put the work into a good process, but make sure it's accurately giving us metrics/data that helps find potential issues.
- Phil: Do we need a controlled devnet environment to do testing in where we can influence parameters to simulate potential issues? Or is that too much work which doesn't reflect realities of public testnets and mainnet?
- Cayman: Devnets are too small, things scale with validators and it's difficult to reproduce this on our own devnet. Should we focus on how a devnet would be beneficial for preliminary type tests such as sending/receive messages and not getting banned immediately, Beacon API endpoints, etc? Would this infrastructure be worth it? Will take additional servers and such.
- Cayman: Maybe there's a way to set slots per second higher? Compress the amount of work? Change chain params to have it happen quicker?
- Dade: The problem is not necessarily functional bugs, but rather performance regression. These are usually seen in a longer period of time. We should invest more in metrics monitoring. Should be some balance resources in setting up better metrics alongside testing infrastructure.
- Phil: True. We should ensure the work we put into a testing infrastructure yields good data and results for it to be worth the work. If we do setup a testnet in a controlled environment, it should probably focus more on functional type testing to ensure our Beacon APIs are not broken for example. And push more of the testing infrastructure to help test against a more real-world environment like Prater. It should be focused to help relieve some grunt work (automation).
Cayman
- 0.34.2 tagged and released
- Gossip hotfix included
- All older versions deprecated on NPM
- If there's a dependency on ESM, everything needs to be updated.
- Already a blocker for libp2p.
- Libp2p interfaces 4.0, we need to upgrade to ESM.
- Issue #3863 opened to update to ES modules
- No significant differences so far in performance from preliminary tests
- This will be a major bump for all.
- Do we need to persist the peerStore at all?
- It is important for figuring out bad peers from good ones. Especially if being DDOSed and reconnecting to the same bad peers.
- Multiple issues here for networking:
- Urgent issue of rebooting and it slowing down peer discovery
- Figuring out peer persistence strategy
- Mixing libp2p persistence & our own metadata system which has none.
- Lion: Any opposition to SSZ upgrade merge?
- Cayman: Does it create more problems?
- Lion: I think the problems we're seeing are not related to SSZ.
- Tuyen: Don't think it's related to gossip issues
- Cayman: It seems like we're fixing the underlying gossip issues and what's causing our scores to break down. Chance of failure there is low.
- Lion: There's also a clear path of how we can figure out our own scores from other peers.
- Cayman: Relook at API for things such as naming. Good opportunity to get the interface right.
Lion
- Gossipsub stuff
- Adding metrics and deploying
- Suspicion that P7 penalties is causing this
- 97% if messages are late, not broken. Might be some slow I/O bottleneck delaying processing of packets.
- Promise time is set at 3000ms.
- Branch to test trying not to penalize broken promises and see how that goes.
- Investigating P3 penalties if that's true and why it's happening.
- As soon as as-sha256 is converted to Typescript, Lodestar will be free of vanilla JavaScript. Tracking PR: ssz/#244
- API tests can use improvement in short-medium term:
- Have a real beacon node, with real data, with real state and query things from it like the API.
- Fast e2e test. No stubs, no fake data.
- Tracking in Add good full coverage beacon API tests
- Cayman: Is there a testing suite/tool for teams for base level of compliance?
- Lion: I don't think this exists for HTTP APIs.
Tuyen
- Stabilized go-gossipsub tests
- Working on CI: browser test failed
-
Libp2p migration: cannot rename peerstore dir should be done in this sprint.
- We should do a try catch, if you cannot rename then just delete. There's no value in keeping a backup.
- Tuyen: We cannot delete unless we do a recursive. Must test.
- Piggyback control
- CI
- Fixing low gossipsub score test, missing array in the code. Made PR to add strict boolean expression
- Libp2p: After we handle peer connect event, causing our test to fail. Emit peer connect event after libp2p after adding it to the peer connection manager.
- Finished AgentVersion PR in Lodestar.
- One concern with naming: network-global
Gajinder
- ERC20 balances now work on the light client browser demo
- DAI contract on kiln network and tested balances and updates on the browser
- Integrated functionality into merge script
- Deploying contracts,
- Lion: Should look into Consensus optimistic updates through P2P from Nimbus
Dadaepo
- Doppelganger Protection
- Added endpoints to get the liveness status of the validator indices
- Moving onto the validator logic to use data from validator indices
- Some open PRs to finish
Lion
- We should come up with a better way to permanently fix issues
- Networking layer is untested (discv5 + libp2p), also much harder to test
- Shared Investigating Node.js Performance: Event Loop and Network I/O
- Could explain why our REST API is so slow
- We should investigate this soon with more metrics to test hypothesis with queue being broken or having a bug
- Charts gap: Gap is because we are dividing by 0. Sometimes we process 0 jobs, so computing rate sometimes shows gap.
Tuyen
- SSZ-v2 investigation of performance of gossip job wait time
- Not an issue with BLS investigation
- Next epoch proposer duties with Dade, reviewing some feedback
- Gossipsub:
- PR to use dev dependencies of Libp2p 0.36.1 build issue. Requires Cayman to review.
- Heap snapshots boosting to 9GB with node still running
- Invalid gossip messages & mesh peers are better
Gajinder
- Pari has some templates for running Kurtosis in CI, we should run it too.
- Next devnet version likely starting next week Tuesday/Wednesday
- EL/CLs will constantly call each other on a heartbeat to keep matching on whether or not TTD has been hit and logging it to the user
- JWT auth completed
- Renaming of
RANDOM
toPREVRANDAO
complete - CI is running well with Nethermind
Dadaepo
- PR reviews: 50% with keymanager
- Responded to comments in review for next epoch validator duties
- Working on Add retry mechanism in executionEngine for executePayload
- Lion: Would be awesome if you can help with extending the contributions guide to include information you discovered that are informally known within the team. Whitespace, comments, etc. we need to persist this info.
- Please hold off on the PR until we merge SSZ.
Short Term Goals:
- Continue merge ready initiatives
- Troubleshoot core libp2p peering issues. Status update on Tuesday.
- Merge libp2p
- Test + Release v0.35
- Merge SSZ
- Test + Release v0.36
- Rocketpool Integration
- Light client demo ideas to pursue:
- Query data from execution PoC
- Light client on P2P from Etan
- We should aim to have a production ready v1.0.0 deployment for the merge. #goals
Cayman
- Added multi-arch build to the CI for Rocketpool Integration
- docker-build x handles it for you
- Docker build is breaking though because blst isn't prebuilt for arm64. Resolve blst-ts#51 then apply PR #3734 to disable arm64 docker build
- Talking to @zsfelfoldi, from LES Geth who is working on a Geth-eth2 light client
- Used our API and helped with out a bug
- Onboarding Marin to help with libp2p side
- Pushed more commmits on Tuyen's libp2p branch: https://github.com/ChainSafe/lodestar/pull/3661
- Tuyen: Ran into issues with old peerstore. Potentially libp2p data store structure has changed.
- Lion: Can we just use a new key and abandon the old data? No requirement to prune that.
Lion
- SSZ integrated in Lodestar, PR is massive.
- Will deploy SSZ branch in another repo and complete integration
Tuyen
- Gossip process delay seems to be bigger than before: https://github.com/ChainSafe/lodestar/issues/3732
- It went from 2 seconds to around 4 seconds
- Will continue investigating.
- Working with Dade to demonstrate how gossip works.
- Looking at Lighthouse to see how we can improve peerScore on the gossip side
Gajinder
- Added missing Bellatrix spec test and to spec runner
- Debugging the missed proposal issue with duplicate log. Merged in PR #3716 to check, validate and skip deposit events already present in DB.
- CL 1.1.9 spec (Kiln Testnet launch this week/next week). Tracker: https://github.com/ChainSafe/lodestar/issues/3731
- Will work with Marek and Nethermind to test.
- Lion: On our validator, duplicate log issue caused by Infura being down and our node lost track of the deposits, threw errors and could not propose block. Checks implemented in PR #3716 but what should the node do if the duplicate log is actually incorrect?
- If it's a reset, no issues - everything should go great.
- It makes sense for catastrophic error if it is a re-org because you're on the wrong chain.
- Post-merge, will the follow distance be reduced?
- Kintsugi has 16 blocks of follow distance
- Will discuss async
Dadaepo
- Keymanager logic to purge duties when signer keys are removed and adding tests
- Working with Tuyen on gossipsub
- Got metrics setup on local server
- Investigating missing validator keys
- Beacon chain knows of a public key not used by the validator while public key a validator using beacon chain does not know about it.
- Investigation inconclusive: Will recreate the key and try again.
- Digging into p2p specs to understand the network and other beacon/validator specs
Phil
- Tracking Ethereum bug bounty website update: https://github.com/ethereum/ethereum-org-website/pull/5361
- Weak subjectivity checkpoints discussion: To make the network safer and more robust from a long-range attack perspective, WSS checkpoints should be distributed by client teams.
- https://github.com/ChainSafe/lodestar/issues/3696
- Looking for input from others!
- Releasing SSZ code share on ChainSafe Youtube which explains some of the updates from Dapplion.
Cayman
- Reviewed libraries cleaning up dependencies
- ETH crypto library released 1.0
- Got it audited and worked with Nomic Labs to re-release it.
- Need to see performance impact on switching from bcrypto
- They have hex to bytes and bytes to hex utility functions we may use
- SSZ pieces:
- Fixing proofs and reviewing PRs
- CIs are passing
Lion SSZ:
- Path for hashing structs can be optimized further but is maybe not necessary
- Got to the point where I can run spec tests
Gajinder
- Spec 1.1.9 is out, Created tracker: https://github.com/ChainSafe/lodestar/issues/3677
- Fixed nightly builds: https://github.com/ChainSafe/lodestar/issues/3676
- Increase default timeout of api instances except for validator duties: https://github.com/ChainSafe/lodestar/pull/3684
- Update year notice on CLI: https://github.com/ChainSafe/lodestar/pull/3681
- No issues with testing proposer boost: https://github.com/ChainSafe/lodestar/issues/3678
- Backfill optimisation
- Removing proxies when SSZ refactor is merged
- weakMap: ES6 data structure you can index by object and is a way of attaching data to objects without actually attaching data to objects.
Dadaepo
- Checked added keys are actually used for the validatorDuties
- Writing tests for the implementation
- Final review from Lion to wrap up
Phil
- Metrics for
v0.34.0
look stable. Going forward with release.- Lion: Some block delays. Blocks coming to gossip much later (~5s late) on Prater. Other versions seem to be showing the same behaviour, so it might just be a network issue.
- Gajinder will investigate.
- Ethereum Foundation is adding us to their consensus layer bug bounty program
- Light client summit planned for DevConnect Amsterdam on April 20, 2022. More details to follow as planning continues.
- Rocketpool Integration: Doppelganger protection + multi-arch Docker images required before integration into testing.
Cayman
- Helping out with SSZ integration.
- Proof generation, various refactorings, added comments
- Documentation is looking good, cross-linking documentation in your IDE
- Lion: SSZ may not have as large of a performance boost as we would like, could be good for memory (ex. state cache)
- Libp2p-crypto: Think about looking at different crypto implementations where we may have an opportunity to optimize.
- Worked with Dade to go over peerDiscovery flow
Lion
- SSZ Integration:
- PR for Lodestar is at 6000 lines.
- Test all possible proofs
- Branchnodestruct proof requires all the data
- Cayman: One thing we can do is request one part of a branchnodestruct and it'll return you all the siblings of that
- The only thing
- Light client demo side down. Looking to get ChainSafe devops to maintain this.
Tuyen
- Libp2p migration
- js-libp2p v0.36.0 released
- Use datastore-core 7.0.1
- Need to test on a node
- Completed the LightClientUpdate headers fix: https://github.com/ChainSafe/lodestar/pull/3656
- Lower peer scores issue
- Adrian from Lighthouse mentioned most peers are positive and hit the max of 100. Need to investigate why peer scores are so low in Lodestar
- https://github.com/ChainSafe/lodestar/issues/3555
- We just need to update logLevel when we pass gossip attestations to fork choice.
Gajinder
- PR to add flag for kintsugi network
- Optimized backfill sync PR
- https://github.com/ChainSafe/lodestar/pull/3669
- Profiler from Tuyen said it takes 9% of CPU time for
hashTreeRoot()
which is now gone.
- Insecura Testing:
- Blank db: Synced up to the network
- Deposit root bug resolved!
- Duplicate deposit logs issue raised
Dadaepo
- Working on peerDiscovery
- Possible to use dev command to form network
- In libp2p, if we are not using any peer discovery mechanisms, we should not even configure it. It's confusing.
- Keymanager:
- Make sure that the code in validator packages are node.js agnostic. Should work in both browser and server environments.
- Confirm that the keys that are added are actually used for validator signing.
- Lion: Validator will need to have a server to serve the routes of the keymanager. We want to keep validator code strictly agnostic, there must be some wrapping somewhere.
- Right now we wrap in the CLI. CLI is aware of the keystores in filesystem, but when it has key material required, it passes it to the validator instance.
- Now we need this validator server that's getting more complex. Putting it in the CLI would be ugly. We really want to have validator metrics at some point. Where is the best place to put this logic?
- Should we create a validator package? Best architecture? How can we name these packages?
- Keymanager tells the validator what keys are available and how they will sign messages (local vs remote url to call for signatures)
- We want to add metrics soon as well
- Putting everything together in CLI is still an option. The CLI will in the future manage the metrics server for the validator as well.
- Giving
v0.34.0
the weekend to collect more test metrics before releasing - Nightly is still breaking.
- CLI is pulling previous version, not nightly version
- https://github.com/ChainSafe/lodestar/issues/3676
- Retro: Planning usually gets pushed because updates take long. Starting next week we will flip planning and updates and timebox planning ahead.
Cayman
- Reviewing SSZ PR, adding documentation
- Adding readme on why we're doing things this way
- Phil: Suggest a blog post as well
- Libp2p working on Vasco to push a new release with async peerStore
- Libp2p Typescript rewrite ongoing
- Libp2p Plan is to update all the libraries that are interoperable to v4 of interfaces > TypeScript work > Upgrade to interfaces v5.
- Next version of libp2p coming up
- We can upgrade or wait until TypeScript rewrite
- We will have to figure out how to avoid the queue that is built into pubsub by default
- Gossipsub: Update causing tests to start breaking. Made no progress for now.
- Tests aren't failing locally, only fail in the CI and don't always fail the same way.
- Things like timers aren't actually correct. Timeout milliseconds are not accurate.
- Discv5 packet filter work ongoing.
- Tuyen cut a release of discv5 working on getting performance metrics.
- Tuyen: In order to use latest version of gossipsub, we need the new version of libp2p as well.
- Libp2p Interface has a breaking change
- Cayman: We will need to upgrade in lockstep, that change cascades down to all others who depend on libp2p.
Lion
- Push new beta with discv5 update
- Remote validator signer support merged!
- Cleaned up and adding documentation to SSZ.
- Tried the new SSZ with Lodestar locally.
- Thinking about getting rid of SSZ in params and config. PR to come.
- Should we get rid of the rule where interfaces should be prefixed with
i
?- Some regular objects are getting prefixed with
i
because of this linking rule. (ex.ibeaconconfig
) - We should drop the rule.
- Some regular objects are getting prefixed with
Tuyen
- Had issue with low peers on a Prater node.
- New Discv5 version released - looks better, more peers than previously.
- Ansible now works with Mac and can deploy with it on testnets and nightly
- Submitted PR for trigger block search when receiving unknown block root gossip attestations. Please review.
- Upgrade the libp2p interface.
- Cayman: Have you noticed same performance issues with Discv5 update?
- Checked profiler and see no issues so far. Good so far, will leave node running for another day.
Gajinder
- Bug discovered where setting up 2 beacon nodes of Lodestar, and at the Bellatrix fork the nodes were diverging.
- Block by 1 node did not reach the other
- Blocks were subscribed to the correct topics but none were subscribed to them: https://github.com/ChainSafe/lodestar/issues/3639
- Pari moved our nodes to v1.1.8 spec and node froze there
- Nightly builds were failing because we had an issue with
:next
pushes - PR merged to fix: https://github.com/ChainSafe/lodestar/pull/3637
- We need to make sure the next release is an annotated tag
- Nightly builds were failing because we had an issue with
- Lion: If we have 0 peers, it should push an error like Lighthouse. If you are publishing a block, people should be picking it up. If no one is subscribed we should throw that error too.
- Cayman: We need to update libp2p interfaces > Update gossipsub > Then libp2p publish will return number of peers it published to, then in Lodestar if recipents = 0 we can see that.
Dadaepo
- Working on keymanager API, did work on some of the endpoints
- DELETE and POST completed
- TODO: Persistent slashing protection data
- Remote signer contribution merge conflicts need to be resolved
- Monitoring with docker-compose (Promoetheus + Grafana).
- Configuration for Mac doesn't work out of the box
- Will work with Tuyen on this.
Phil
- EF's Eth2 client bug bounty requires security assessment. Estimated completion April/May timeframe.
- For now we are going to have our own bug bounty with ChainSafe and raise some funds via Gitcoin to contribute to bounties. Excess funds after we transition to the Eth2 client bug bounty will be reinvested back into Gitcoin community focused on client diversity initiatives.
- Callout for website update ideas. We have until April.
- Dade will post issue for us to collaborate
- Ideas include:
- Node operators running Lodestar
- Some additional metrics
- What are some of our unique propositions compared to the other clients?
- Accessibility of Typescript (browser-friendliness/compatibility, large community of developers)
- Light client focus (browser-based decentralized applications)
- Initiate conversations to get Rocketpool to include Lodestar as a client for their node operators
- We should major bump our production releases to
v1.0.0
shortly to signal client maturity.- Thinking of sometime after gossipsub and new SSZ implemented.
Cayman
- Made some final edits & merged Tuyen's fast msg id
- We can cut a release now and include in lodestar
- Trying to unblock https://github.com/ChainSafe/lodestar/pull/3534
- Proposed a temporary workaround while upgrading to the latest libp2p-interfaces is blocked
- Working on debugging libp2p-noise v5.0.1, current status is the issue is narrowed down to the PR, but root cause not yet found
- Scoped a libp2p webrtc project with ChainSafe solutions team that ChainSafe will be working on in a few months
- Started work on the discv5 packet filter
- Code reviews
Tuyen
- Gossipsub-js: latest update is to support undefined fastMsgId function, Cayman merged it.
- Reprocess attestations: works well in contabo-20, need to rebase due to conflicting to the other PR
- Pass gossip attestations to forkchoice: fixed type issue, merged
- Look into ssz v2 to get familiar with TreeView
- Follow proposer boost spec and lighthouse implementation
- Figure out that we need to trigger block search if unknown gossip attestations received, created #3613, will work on that next
Gajinder
- Implemented merge spec upgrades for v1.1.6 and v1.1.7
- PR in for v1.1.8 spec
- Working on proposer boost, looking to merge early next week.
Lion
- Continuing on SSZ. Spec tests have passed. Released PR: https://github.com/ChainSafe/ssz/pull/223
Cayman
- Working also on computing message-id once per message for js-libp2p-gossipsub
- Getting libp2p-pubsub upstream and found the type issue.
- Internal chainsafe meeting about libp2p-webrtc project next week
- Discv5 rateLimiter next big task
Lion
- SSZ Refactor almost complete
- Good economics, fast
- Two modes: Behaves like current implementation, one "performance mode" for state transitions.
- Gajinder to review one of the algorithms
- Cayman: How do we plan to test this?
- Lion: Lots of unit tests to ensure functionality. We need to link library to Lodestar, change the stateTransition to the new format, make sure spec tests pass, do the PR, observe benchmarks.
- We should setup a session when complete to introduce the idea.
Tuyen
- Kingstugi logs showed a duplicate attestation error
- Incident of non-finality in Prater:
- Why we had a lot of epoch in checkpointStateCache? Implemented short term solution PR (#3586)
- Lion: We were not calling prune. Bigger issue is that it was not properly bounded. There can be so much that it causes OOM.
- Lion: We force the finalized state to reside in the cache (Not ideal for periods of long non-finality - We should define a threshold for reorgs)
- Lion: Lighthouse is persisting the states that are not finalized every epoch. If you have a deep reorg, you can just go to the DB and not keep the state in cache.
- Lion: Generating hashing cache from scratch is better than having an old finalized state and doing regen on many slots.
- For some nodes, the node is so busy it can't even handle ping request of other peers
- Loses a lot of peers and takes a while to recover
- Adding gossip attestations to fork choice. We haven't worked on that in a while.
- Lion: We know we should do, but choose not to because we want to nodes to work.
- Lion: We want to run the branch and benchmark it.
- Lion:Need to understand the costs of this because we don't have multi-threading.
Gajinder
- Multiple reviews on error handling and backfill sync
- Adding another validation
- Add 1.1.6 and 1.1.8 spec changes in queue
- Add CL spec 1.1.7 correction to proposer boost
- Following Kintsugi testnet issues 3-way fork
Phil
- Pushing forward with nodeJS consulting
- Push RELEASES.md for persisting process
- In discussions with Ethereum Foundation about adding Lodestar to their bug bounty program for ETH consensus clients (eth2)
- Be specific about what types of bugs are covered and not covered in a ChainSafe bug bounty for Lodestar.
Cayman
- External contributor d4nte from Status.im ENR, discv5 using for Waku. Extending our ENR to include multi address keys. Asking to push upstream.
- Haven't merged some PRs with discv5
- Asking for multiple ENR requests, so we can get more results at once. We aren't doing that currently.
- Fixed up small paragraph to consensus specs (
init proof
piece).- Lion: Contribute what you think you think make sense. Let's make recommendations.
- Example: "Soft consensus" example is rateLimiting discussion. There are some things that you absolutely mandated to do in a certain way or you would fork away. "Soft consensus" you can technically not follow but if you don't follow them, bad things happen. Light clients would follow this spec. You should follow this but you don't have to. Start with that.
Lion
- Light clients overview:
- Implemented basic consensus spec level-like client that fetch the head.
- It's still dependent on a server.
- Problem: We can get the head, we cannot prove execution.
- Portal Network is only solving the networking part of the problem.
- If we still have to rely on Patricia merkle proofs, it's still heavy. You need the full bytecode constantly.
- It won't be immediately useful, but will other people use our layer to make it useful for them?
- Proof of concept Metamask with light client is not something we can deliver on our own. It will require extensive research with execution side.
- We could provide a light client that follows the head, with good networking guarantees, spectacular security.
- We should focus on doing network experimentations and solving the head problem. Going beyond that is way out of scope.
- Discv5. consumer layer need improvement. Currently implementing fixes.
- Hopefully that will give us more metrics, more visibility of what's going on. And finding peers quicker and more metrics
- Reached out to Nimbus to see how they do their networking. We need to do more metrics research on this.
- Cayman: The number of peers will keep growing, including sync committee subnets and eventually sharding subnets.
- We should experiment and quantify what the cost of having a peer in Lodestar is. More metrics!
- We have the lowest peer counts of all the clients.
- Cayman: One thing that will help is the discv5 topic advertisement & discovery.
Tuyen
- Performance in pithos network:
- Lodestar-Geth, subscribed to all subnets, performance is good with 1000 validators.
- Investigating performance issue
- Improving validator performance
- PR for pre-compute. Lion please look at discussion: Precompute epoch transition #3383
- Started draft PR Validator to submit attestations when seeing new head
- Lion: I want to get a bunch of metrics to see why the subnet subscriptions are so erratic.
- Something valuable that can be done is improve/expand validator monitor to have resolution on if the heartbeat was not sent, or the head was wrong or was late.
- What we did in Phase 0, let's do it again.
- We need to find a way to do it and not slow the performance down.
Gajinder
- Working on weak subjecitivity checks
- Working on backfill sync from an anchor checkpoint state
- Our validator was able to interop with Lighthouse 🎊
Phil
- Hiring ongoing. More candidates being screened.
- CSCON[1] presentation to include Lisbon talk with additional updates.
- Connect light client to mainnet
- Make it more accessible and in a more stable environment
- Sync to head?
- Some networking experiments?
- Lightweight type of explorer? Compare getting data through us instead of beaconcha.in?
Light client takeaways:
- Full dream is a long ways away. Problems with execution later are different from the consensus layer.
- Execution side has lots of legacy crap and problems we are unfamiliar with.
- Make our light client robust, productionized, secure. What are the major features or things that we want to create?
- Head sync
- Networking
- Exploration
- Security,
- Specs, standard
- Make it sustainable
- No mechanism for trust when consuming data.
- Trust someone else's node or your own node.
- There are still many other points of untrusted data proofs. Light clients only solve one between node <> light client
- Is there any benefit to us putting in a light client server into Lodestar.
- We should do some universal plugin like for running a light client server that works between the clients?
Cayman
- Init proof into consensus proof
- Implement PR for the beacon API
Lion
- Mostly working on SSZ
- Optimizations that have fork everything upstream to SSZ
- Overall design simpler, easier for devs
- Cayman: Anything I can look at?
- Lion: You can take a look at my branch
- Filtering questions for first interview, take-home assignments
Tuyen
- Setup new testnet
- PR for ESLint
- Fixed small issue with scoring prams
- Persistent merkle tree, apply it, etc.
- 1:1 on rorg to fix proposing issues
- Lion: Open issue for proposing on the 1st block. Last slot on epoch, if no block received within 8 seconds, it moves to transition to the next head
Gajinder
- Worked on casingmap,
- Worked on ssz bytelist
- Working on validator interop with lighthouse beacon.
- Casing issue is resolved few other issues are coming up, by hacking the code a bit was able to hookup the validator, problems in producing blocks.
- Created a master tracker for the same
Phil
- Create updated planning roadmap for Lodestar. Will update Github with updated issue.
Cayman
- Light client refactoring
- Init proof is keyed on the database
- Keying them by block root
- Issue for Amphora up with milestones
Tuyen
- Merge spec is done. Double-checked implementation.
- Worked on handling attester duties reorg
- Merged improve computeDeltas
- Investigating multiple epoch transitions per epoch
Gajinder
-
Going through API test specs
-
Monorepo release workflow: Experimented on own forked repo. Changelog was not updating as expected
- Waiting on Cayman's response for
-
Finishing pending issues
- Updating SSZ in main lodestar including casing map update
- Updating BLS in main lodestar including update from signature functions
-
Working on adding ByteList type
- Self-Audit:
- Lion did a validator audit to ensure no slashing will happen.
- Lion: Would be better for us focus on other list of priorities before we get to a code cleanup.
- Mostly will be done online
- Geth should be ready to interop with us
- Run the testnets and keep finding issues
- Networking is only tiny updates, just a couple extra conditions for the merge.
- Spec tests have passed
- Not tested:
- Execution engine
**M0: **
- Complete
**M1: **
- First part complete. Second part should be done ideally today or tomorrow (Pre-interop)
**M2: **
- Ideal goal to complete before Interop
- Test against Geth, mergemock or on your own and play around with different settings, Besu, Nethermind
- We can do a lot of this parallelized
M3/M4/M5:
- Expected to be completed during the week.
Tuyen
- Big refactor this week. Lion to review Range sync PR.
- Tried fix locally and able to sync to head
- Validator double vote issue:
- If running docker on different networks, validator db is always fixed so it makes an invalid attestation double vote when switching from one network to another (#3219)
- getAttesterDuties should throw error instead of returning unreliable duties when head epoch is behind (#3242)
- PR open for adding unknown block sync metrics to Grafana (#3244)
- Improve the way we processs attestations in the block (#3084)
- Unknown block sync
Gajinder
- Feedback changes for declarative case, case mapping in SSZ.
- getState investigation: not printing data in metrics
- Once we get data, we can decide then to separate
- G2 issue that was raised by EF
- False positive for 4 validator processes calling the same beacon node at the same time. It was 4 validators calling the same beacon node at the same time. (#3201)
- How is this impacting the performance?
- Included recommendations
- Investigating BlocksByRange (#2556) and Backfill issues (#2637).
Phil
- Monday will be speaking with ChainSafe hiring to further define goals for hiring 2 core devs.
- Looking to schedule interviews in mid-October.
- Gajinder: Do you find the testnet updates meaningful?
- Phil: For me, yes as it does allow me to compare what I'm seeing locally to our other nodes.
- Any feedback from other team members?
Cayman
- Fix gossipsub concurrency of processing pubsub rpc messages.
- an merge two discv5 PRs and release
- Next up: Forkable light client that can sync up to the head
- Still working BLS monorepo, hiccup with dependencies.
Lion
- Greg's nodes are doing 100% right now according to beaconcha.in. Getting everything right on the head.
- How do our nodes perform in bad environments is the next step.
- Node is stable so focused on the merge
- 5 out of 7 items implemented
- Types, forkChoice, forkProcedure and stateTransition.
- BlockReduction is currently being worked on
- Small update on P2P layer and should be good to test.
Tuyen
- Gossip block issue
- Validator is still voting for the incorrect head
- Fixed the heartbeat in gossip regarding the routing issue
- Improving processing. 3x improvement and 5-7x improvement in Altair
- Removing balanceList proxy in Lodestar
Gajinder
- Did some metrics dashboard cleanup
- Draft PR for exemplar check via jsonpath inspection of our lodestar.json config for grafana dash (had thought we can use jsonpath for doing more validation later on plus its more semantic and meaningful), however will incorporate lion's feedback on doing it via simple regex 3.
- Thought over and suggested a container type declaration signature for providing casingMap with couple of optimization paths. Will try to push this one out over the weekend.
Phil
- Node Setup Article publishing on Monday
Afri
- Shared strategy structure approach to Lodestar
- We are not running all consensus spec tests
- Which tests are available
- In consensus-specs repo there is for called test/formats
- Created gist
- Lion: What we accept as params in beacon state transition is:
- beaconState
- cacheBeaconState
- epochContext
What we should do in the future is accept a tree view of the beacon state type and that's it. Not the beacon state and epoch context.
Highest Impact Node Issues
- Validator monitor metrics: 20-30% we are missing the head
- Haven't been able to propose blocks
Cayman Goals
- Light client sync coordination and demo update
Lion Goals
- Completing the merge
Tuyen Goals
- Keeping the node stable
Cayman
- Never shrink your root partition
- Implementation notes distributed to the telegram chat.
- Nothing yet from Terrence. Feedback from Alex. Take the document and make it more spec-focused. We can mold the endpoints to libp2p.
- Consensus spec into the ethereum p2p network.
- Put up some PRs for SSZ & BLS monorepos.
Lion
- 60-70% less size on memory consumption
- Closing issues, overall maintenence - reviving some PRs
- Moving towards the merge.
Tuyen
- Error starting docker resolved with fix on libp2p dependencies
- Investigating errors from deserializing weak subjectivity state. Related to larger issue of invalid state root #3066
Gajinder
- PR to extend the case transformation in container data fill. Needs review.
- Removing e2e failing tests on PR no default read of the configFile and paramsFile, persistence of these files removed
Afri
- Docker issue is fixed: Images are in the correct version.
- Lodestar Zenhub board has been cleaned and updated with proper sorting
- Lodestar will start doing 2 week sprint cycles aimed towards a 2 week release cycle.
Discussion:
- Lion: Invalid state root, is that across all nodes?
- Lion: Should we release v0.30.0?
- Tuyen: I'm not confident to release without fixing it
Recommendations from Afri for longevity of the project:
- Breaking up the monorepo to allow for more modular deployment
- Break release cycle of different modules from the main client = better way to evaluate each module's state.
- Pedantic on all module dependencies\
- Dependency bot seems overwhelmed from complexity of the monorepos
- Refactoring or rewriting towards more stable releases
- Allocate more resources to this team.
Priority 1: Light Client Initiative Purpose: To lead and improve light client demo capabilities.
- Light Client API standardization #3079
- Reorg-capable light client #3078
- Light client integration into p2p network #3077
- Draft PR for Eth2 API spec repo when spec ready
- LC Demo: Add link to our LC Discord channel on the demo
- LC Demo: Add node protection so that user doesn't kill it accidentally
Priority 2: Implement Merge Spec + Spec Tests Purpose: To prepare Lodestar for The Merge interop with other clients.
- Implement state transition
- Implement block production
- Implement networking updates
- Implement ExecutionEngine via REST
- Test interop with Geth Catalyst
Priority 3: The Road to Stability Purpose: Tighten up internal rules and QA to stabilize production builds and ensure they are safe for release every push.
- Continuous production deployment of Lodestar
- Monitoring + Notification system to alert when code doesn't work in production
- Setup a server and use watchtower to redeploy on every commit.
- Setup graphana to hook up in Discord and pick out some metrics that can alert us.
- PM will continuously monitor
Priority 4: Lodestar Team Expansion Purpose: To find motivated, talented and eager engineers to join the Lodestar team. Ideally 1-2 more full-time engineers.
- Brainstorm technical assignment for candidates
- Create job description + posting under ChainSafe
- Not mentioning blockchain may help with candidate search
- Strong emphasis in proficiency with Typescript
- Experience in complex distributed systems (networking, nodejs)
- Cryptography (nice to have)
- Security (nice to have)
- If not experienced with open-source - we need strong communication skills
- Takes initiative, self-starter, accountable
- Having an active presence in non-blockchain communities such as TypeScript, JavaScript circles.
- Further advance community building initiatives with articles, tutorials, guides, etc.
- Look into furthering the process for an Apprenticeship Program.