Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcast Loops #573

Open
Stebalien opened this issue Aug 14, 2024 · 16 comments
Open

Broadcast Loops #573

Stebalien opened this issue Aug 14, 2024 · 16 comments
Assignees

Comments

@Stebalien
Copy link
Member

In F3, we've run into some pretty severe broadcast loops. From what we can tell, the time cache is basically useless.

The core issue is that we have NO TTLs and/or expiration on messages. Personally, I'd recommend a simple expiration to all messages that's strictly less than the time cache duration. We can even treat the time cache duration as a network parameter, refusing to propagate messages that have an expiration more than that duration in the future.

@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

I completely agree.

I think we should have ttls in utc seconds, and the signature should probably include them when present.

I think we need a pubsub spec change for this, i'll be happy to support you in the interest group in specs.

@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

actually expiration in the envelope.

@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

hrm, including in the signature maybe backwards incompatible, so it can only be advisory as it may be tampered with.

@Stebalien
Copy link
Member Author

Make it a per-topic option.

@Stebalien
Copy link
Member Author

Specifically, repurpose the seqno?

@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

I think this would break some systems that rely on it.

@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

Make it a per-topic option.

Sure.

@Stebalien
Copy link
Member Author

I think this would break some systems that rely on it.

Rely on it being sequential?

@Stebalien
Copy link
Member Author

But yeah, making this a per-topic option would go a long way.

Stebalien added a commit to filecoin-project/lotus that referenced this issue Aug 14, 2024
We were using the default first-seen strategy which means we'd loop
messages every 2 minutes (when they expire from the cache). This new
strategy keeps the message in the cache until we haven't seen it for 2
minutes, ensuring that it eventually drops off the network (with high
likelihood).

The correct fix is message expiration:
libp2p/go-libp2p-pubsub#573, but that requires
deeper protocol changes.
@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

I think this would break some systems that rely on it.

Rely on it being sequential?

Rather monotonically increasing.

@Stebalien
Copy link
Member Author

Rather monotonically increasing.

unixnanos? unixnanos with a counter in case we try to go backwards?

@vyzo
Copy link
Collaborator

vyzo commented Aug 14, 2024

Rather monotonically increasing.

unixnanos? unixnanos with a counter in case we try to go backwards?

unixnano init and then sequentially increasing. I don't see anything wrong with allowing an initializer, but we just do unixnano without an initializer for now i think.

@Stebalien
Copy link
Member Author

Well, specifically I was thinking about having a guaranteed monotonically increasing clock. I.e.:

  1. Get unix nanos.
  2. Look at the last sequence number.
  3. If less than or equal to the last sequence number, use the last sequence number plus 1.

We aren't going to broadcast one message per nanosecond, but I like being extra paranoid.

@vyzo
Copy link
Collaborator

vyzo commented Aug 15, 2024

Yes -- the part that is missing is knowledge of the last seqno basically.
The application would need to store it somewhere, load it on restart, and pass it in the constructor.

@Stebalien
Copy link
Member Author

The application would need to store it somewhere, load it on restart, and pass it in the constructor.

Can we not just initialize to the current time on reboot? According to kuba, that's what we already do. To deal with fast reboots, we can insert a very tiny sleep before sending the first message.

@vyzo
Copy link
Collaborator

vyzo commented Aug 15, 2024

sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants