Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of CheckBridge and forked Tendermint #357

Closed
troggy opened this issue Nov 5, 2019 · 20 comments
Closed

Get rid of CheckBridge and forked Tendermint #357

troggy opened this issue Nov 5, 2019 · 20 comments

Comments

@troggy
Copy link
Member

troggy commented Nov 5, 2019

Bounty

We are using tendermint fork with extra home-grown ABCI called CheckBridge. It is supposed to be called every 32 blocks and return 1, otherwise the proposal is deferred (consensus engine is stopped). On a leap-node side we use checkBridge handler to ensure there is a period submitted. This is the ultimate reason for all the clusterfuck — ensure that period is submitted.

Three One problem with this approach:

  1. We need forked tendermint, which makes sync with upstream harder.
  2. It won't be working right because of the CAS — there is no period every 32nd block anymore with CAS — it is submitted later, once there are enough period votes. Solved in fix: check bridge always return success #358
  3. As of now it doesn't seem to work yet the networks are working. Solved in fix: check bridge always return success #358

This bounty attempts to address the problem purely on ABCI app side, so we can use vanilla tendermint.

Scope

  • once a period is submitted, add [periodMerkleRoot;prevPeriodHash;txHash;slotId] tuple to the pendingPeriods list in the bridgeState. If the list was empty store submission time in bridgeState.periodSubmissionTime.
  • create a block middleware to take the first item from the pendingPeriods and check if the period is landed on the root chain. Skip if the current node is not a validator or slotId of the pending period is not matching the node's slot.
    • If the period is on the root chain, remove the item from the pendingPeriods list. If pendingPeriods is not empty, set periodSubmissionTime to the current time.
    • if the period submission failed (e.g. due to out of money) — resubmit the transaction and update periodSubmissionTime
    • If the period is not on the root chain yet (tx is not mined), check periodSubmissionTime — if it is older then 10 minutes, resubmit the period transaction with higher gas price.
  • use vanilla tendermint (stick to version 0.31.7 for starters — this is the one we've forked)
  • test scenarios:
    • multiple periods submitted one-by-one
    • underpriced period submission is resubmitted successfully
    • resubmission stuck as well and it is time to submit a next period — eventually both are submitted successfully
    • submission stuck, node restarted → submission tracking should resume
  • make sure integration tests are passing

Deliverables

  • updated leap-node
  • writeup about the change (for peeps and for newsletter)

Gain for the project

  • easier code base (no forked tendermint)
  • underpriced period resubmissions

Publicly visible delivery

writeup for newsletter

Roles

bounty gardener: @troggy / 90 DAI
bounty worker: name / share
bounty reviewer: name / share

@johannbarbie
Copy link
Member

how to prevent that tendermint runs too far ahead of period submission?

@sunify
Copy link
Member

sunify commented Nov 5, 2019

Why it is bad? (asking for a friend)

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

how to prevent that tendermint runs too far ahead of period submission?

I can imagine it is possible to have temporary gaps in the period list. We are not really chaining periods anyway — prevHash in Period object is not used anyhow. Or am I missing something?

@johannbarbie
Copy link
Member

johannbarbie commented Nov 5, 2019

Why it is bad?

  • can't construct proofs
    • fast exit handler can become veeery slow
  • unordered submission problems
  • parallel submission problems

@johannbarbie
Copy link
Member

johannbarbie commented Nov 5, 2019

prevHash in Period object is not used anyhow

  • used here
  • prevHash will be critical to implement block.timestamp/now in Plasma runtime

@pinkiebell
Copy link
Contributor

I think a threshold of max. 3 pending periods cause no problems?

As of now it doesn't seem to work yet the networks are working.

That's odd. 🤔 The last time I had the leap-node running, I had those checkBridge call in the debug logs.

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

ok, we keep prevHash :) doesn't really stops us from having some temporary gaps

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

also, @sunify and me discussed an option to "suspend" the app if there is no period landed after 32 blocks. Similar like we do with CheckBridge, but done on the app side: just need to stop proxying txs to Tendermint RPC.

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

unordered submission problems

@johannbarbie you mean tracking of nonce?

@johannbarbie
Copy link
Member

johannbarbie commented Nov 5, 2019

unordered submission problems

@johannbarbie you mean tracking of nonce?

i was only referring to prevHash

@pinkiebell
Copy link
Contributor

pinkiebell commented Nov 5, 2019

also, @sunify and me discussed an option to "suspend" the app if there is no period landed after > Similar like we do with CheckBridge, but done on the app side: just need to stop proxying txs to Tendermint RPC.

What if a malicious- or bad node sends to the tendermint daemon? 🙂

Edit:
I think implementing a method to completely halt the consensus and/or mempool engine would work

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

to completely halt the consensus

that's what we doing with CheckBridge. @pinkiebell Do you know how to do that without changing Tendermint code?

@pinkiebell
Copy link
Contributor

to completely halt the consensus

that's what we doing with CheckBridge. @pinkiebell Do you know how to do that without changing Tendermint code?

Sadly, no idea. you can check the ABCI endpoints but I'm sure something like that isn't there.
Another idea without running a fork is to kill the tendermint daemon and restart once periods looking fine 😁

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

It won't be working right because of the CAS

maybe I've over-exaggerated — checkBridge checks not the current, but the previous submission which should happen at that point even with CAS.

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

unordered submission problems

I think this is solved, if all the validators keep a pendingPeriods list — proposer can just use the hash of the last period as a prevHash.

@johannbarbie
Copy link
Member

overall gut feeling:

  • super scared that it will break in unforeseen ways
  • happy to trust you guys with the architecture decision

:)

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

can't construct proofs
fast exit handler can become veeery slow

Both of these are relevant right now as well: if there is no period and checkBridge stops the consensus you can't construct proofs as well.

@troggy
Copy link
Member Author

troggy commented Nov 5, 2019

super scared that it will break in unforeseen ways

Me too. I also scared that the node is working in unforeseen way 😄

@troggy
Copy link
Member Author

troggy commented Nov 7, 2019

I see two quite separate concerns here in this issue:

  • checking that periods are landed on the root chain
  • using patched tendermint to stop the consensus in case if period submission didn't happen in time

The bounty definition as it is now addresses only the first concern assuming the second one is not needed and should be removed. While it is still not clear should we allow the chain to run once there are missed period submissions, I will split this bounty in two:

  1. More reliable period tracking (like described in this bounty). We can keep patched Tendermint and checkBridge feature.
  2. Replace checkBridge hack with something on the app side (this is to be defined) or remove the hack completely. Switch to upstream vanilla Tendermint.

@troggy
Copy link
Member Author

troggy commented Jan 31, 2020

@troggy troggy closed this as completed Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants