Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jailed validators (from downtime) attempt to jail every block #163

Closed
Reecepbcups opened this issue Apr 15, 2024 · 5 comments
Closed

jailed validators (from downtime) attempt to jail every block #163

Reecepbcups opened this issue Apr 15, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@Reecepbcups
Copy link
Member

Reecepbcups commented Apr 15, 2024

Built into #160, when a validator is jailed the logic continues to try and set jailed state.

There does not seem to be a technical issue with it, but jailing a validator every block for an entire unbonding period is no bueno

Solutions

  • Fix the issue so there is only a single 1 off jail event (ideal). Probably some key / power index to set to make it happy
  • Instant unbond jailed validators (and removal) rather than re-jailing
@Reecepbcups Reecepbcups added the bug Something isn't working label Apr 15, 2024
@Reecepbcups Reecepbcups self-assigned this Apr 15, 2024
@alexanderbez
Copy link

@Reecepbcups what makes it attempt to re-jail every block? Do we need to simply need to mark the validator as jailed? Or perhaps, there is an x/staking or x/slashing method we can override?

@Reecepbcups
Copy link
Member Author

Reecepbcups commented Apr 15, 2024

@alexanderbez here is where it's sort of stemming from: #160 (comment)

Issue: x/staking panics if pulling a jailed validator from the validator set on ApplyValUpdates (staking endblock). So we have to override the validator from being jailed so that the last ApplyValUpdates can be set as they begin unbonding. We can't just delete the validator outright (DeleteLastValidatorPower) from the CometBFT updates since that would be 2x abci.ValUpdate updates (x/staking) from a single val, resulting in a consensus panic

I think a solution here is yet another cache where:

  • Block 1 x/poa: we set the val.Jailed = false, add to cache.
  • Block 1 x/staking / x/slashing, jails the validator & slashes
  • Block 2 x/poa: sees a val in the toJail cache, and deletes power / sets to real Jail
  • Block 2 x/staking: val is really jailed, abci persisted

pretty much a delay of jail by h+1 rather than at h. This way x/staking and x/slashing can handle all their logic as normal, and so teams do not have to fork the SDK / staking

@Reecepbcups
Copy link
Member Author

Reecepbcups commented Apr 16, 2024

Update, removing this from the upstream SDK would solve the issue. Unsure why we panic vs continuing since we iter all validators anyways.

func (k Keeper) ApplyAndReturnValidatorSetUpdates(ctx context.Context) (updates []abci.ValidatorUpdate, err error) {
               ...
		if validator.Jailed {
			// panic("should never retrieve a jailed validator from the power store") // replace this line w/ continue
			continue
		}
		...
}

cosmos/cosmos-sdk#20059

@alexanderbez
Copy link

Yup, makes sense. Good catch and patch.

@Reecepbcups
Copy link
Member Author

resolved in #165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants