Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update docs for jail throttling v2 #1443

Merged
merged 12 commits into from
Nov 29, 2023
Merged

Conversation

mpoke
Copy link
Contributor

@mpoke mpoke commented Nov 22, 2023

Description

Closes: NA

  • refactor ADR 002 for better understanding
  • refactor ADR 008 for better understanding
  • update throttling params in ICS docs
  • improve comments in throttling tests
  • add v3.2.0 to RELEASES.md
  • add throttling with retries to FEATURES.md

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct docs: prefix in the PR title
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • Confirmed the correct docs: prefix in the PR title
  • Confirmed all author checklist items have been addressed
  • Confirmed that this PR only changes documentation
  • Reviewed content for consistency
  • Reviewed content for spelling and grammar
  • Tested instructions (if applicable)
  • Checked that the documentation website can be built and deployed successfully (run make build-docs)

@mpoke mpoke requested a review from a team as a code owner November 22, 2023 18:44
@github-actions github-actions bot added C:Testing Assigned automatically by the PR labeler C:Docs Assigned automatically by the PR labeler C:ADR Assigned automatically by the PR labeler labels Nov 22, 2023
Copy link
Contributor

@sainoe sainoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactoring! It helps a lot in the understanding of the intuition behind the Slash Throttling feature. Reviewed the half, still have to go through the Throttle retries.

Copy link
Contributor

@MSalopek MSalopek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of the ADR changes.

* If two slash packets are at the head of the queue, the consumer will send the first slash packet, and then wait for a success ack from the provider before sending the second slash packet. This seems like it'd simplify implementation.
* VSC matured packets at the head of the queue (ie. NOT trailing a slash packet) can be sent immediately, and do not block any other packets in the queue, since the provider always handles them immediately.
* Slash packets will always be sent to the provider once they're at the head of the queue.
However, once sent, the consumer will not send any trailing `VSCMaturedPackets` from the queue until the provider responds with an ack that the `SlashPacket` has been handled (ie. validator was jailed).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: having a visual aid would be helpful to explain what a trailing VSCMaturedPacket is and how the sub-protocol operates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced trailing VSCMaturedPacket with subsequent VSCMaturedPacket. Does that help?

To prevent the provider from having to keep track of what slash packets have been rejected, the consumer will have to retry the sending of slash packets over some period of time. This can be achieved with an on-chain consumer param. The suggested param value would probably be 1/2 of the provider's `SlashMeterReplenishmentPeriod`, although it doesn't matter too much as long as the param value is sane.
To prevent the provider from having to keep track of what `SlashPackets` have been rejected, the consumer will have to retry the sending of `SlashPackets` over some period of time.
This can be achieved with an on-chain consumer param, i.e., `RetryDelayPeriod`.
The suggested param value would probably be 1/2 of the provider's `SlashMeterReplenishmentPeriod`, although it doesn't matter too much as long as the param value is sane.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer does not know about the ShashMeterReplenishmentPeriod.

The word "sane" is open to interpretation, maybe consider different phrasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that the consumer knows about this param as it can be queried on the provider. The ShashMeterReplenishmentPeriod is not something that will be change regularly. I agree with replacing the word "sane" though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the recommendation to set RetryDelayPeriod to 1/2 of SlashMeterReplenishmentPeriod it's weird, especially as in the code both defaults are 1 hour. So I changed to this:

To reduce the amount of redundant re-sends, we recommend setting RetryDelayPeriod ~ SlashMeterReplenishmentPeriod, i.e., waiting for the provider slash meter to be replenished before resending the rejected SlashPacket.

docs/docs/adrs/adr-008-throttle-retries.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-008-throttle-retries.md Outdated Show resolved Hide resolved
docs/docs/adrs/adr-008-throttle-retries.md Outdated Show resolved Hide resolved

We can improve the append time for this queue by converting it from a protobuf-esq list, to a queue implemented with sdk-esq code. The idea is to persist an uint64 index that will be incremented each time you queue up a packet. You can think of this as storing the tail of the queue. Then, packet data will be keyed by that index, making the data naturally ordered byte-wise for sdk's iterator. The index will also be stored in the packet data value bytes, so that the index can later be used to delete certain packets from the queue.
We can improve the append time for this queue by converting it from a protobuf-esq list, to a queue implemented with sdk-esq code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this benchmarked or in any way tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of, but it's quite intuitive.

This feature will implement consumer changes in [#1024](https://github.com/cosmos/interchain-security/pull/1024). Note these changes should be deployed to prod for all consumers before the provider changes are deployed to prod. That is the consumer changes in #1024 are compatible with the current ("v1") provider implementation of throttling that's running on the Cosmos Hub as of July 2023.
This feature will implement consumer changes in [#1024](https://github.com/cosmos/interchain-security/pull/1024).

❗***These changes should be deployed to production for all consumers before the provider changes are deployed to production.***
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is difficult to enforce.

Does this mean that we cannot actually have this on the provider any time soon?

@mpoke

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to release the throttling v2 changes in two releases: v3.2.0 will contain only the consumer-side changes and it will be backwards compatible with previous versions; v4.0.0 will the provider-side changes and it will not be compatible with consumer running versions < v3.2.0. For more details, see 2d6b877


Once all consumers have deployed the changes in #1024, the provider changes from (TBD) can be deployed to prod, fully enabling v2 throttling.
Once all consumers have deployed the changes in #1024, the provider changes from [#1321](https://github.com/cosmos/interchain-security/pull/1321) can be deployed to production, fully enabling v2 throttling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is feasible. The provider schedule should not depend on the consumers or vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it should not. But it's hard to modify existing logic and add new logic that affect both provider and consumer without needing some coordination.

@sainoe sainoe self-requested a review November 28, 2023 09:41
@mpoke mpoke merged commit b73884f into main Nov 29, 2023
11 of 14 checks passed
@mpoke mpoke deleted the marius/throttling-adr-refactor branch November 29, 2023 09:31
MSalopek added a commit that referenced this pull request Dec 1, 2023
* refactor adr 002 for better understanding

* refactor ADR 008

* update throttling params

* add docstring to TestBasicSlashPacketThrottling

* update features and releases

* add comments to TestMultiConsumerSlashPacketThrottling

* Update docs/docs/adrs/adr-008-throttle-retries.md

Co-authored-by: MSalopek <[email protected]>

* add review suggestions

* replace trailing with subsequent

* add upcoming versions

* add notes on backward compatibility

---------

Co-authored-by: MSalopek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:ADR Assigned automatically by the PR labeler C:Docs Assigned automatically by the PR labeler C:Testing Assigned automatically by the PR labeler
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants