Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry fetching seed list when none of the seeds seem to be responding #11

Open
rkrzewski opened this issue Oct 6, 2015 · 2 comments
Open

Comments

@rkrzewski
Copy link
Owner

Currently a follower node fetches a list of seeds once and attempts to join the cluster using this address list. If the seed list out of date because of leader malfunction or cluster partitioning, this operation may "hang" indefinitely.
Discovery actor should use a timer and if joining the cluster does not succeed within specified time it should cancel the ongoing joining process (invoking Cluster.joinSeedNodes(Seq()) does this) and re-fetch seed list from etcd, hoping that (a new) leader will eventually publish a correct list.

@rkrzewski
Copy link
Owner Author

Implemented in 7016917 but tests are needed. That's a bit tricky since timeouts are involved.

@rkrzewski
Copy link
Owner Author

I've learned by further reading of ClusterCoreDemon source that Cluster.joinSeedNodes(Seq()) would be in fact ignored. However invoking joinSeedNodes with a non-empty list will interrupt current SeedNodeProcess, which means we can simply fetch another list and retry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant