Retry fetching seed list when none of the seeds seem to be responding #11

rkrzewski · 2015-10-06T23:54:03Z

Currently a follower node fetches a list of seeds once and attempts to join the cluster using this address list. If the seed list out of date because of leader malfunction or cluster partitioning, this operation may "hang" indefinitely.
Discovery actor should use a timer and if joining the cluster does not succeed within specified time it should cancel the ongoing joining process (invoking Cluster.joinSeedNodes(Seq()) does this) and re-fetch seed list from etcd, hoping that (a new) leader will eventually publish a correct list.

The text was updated successfully, but these errors were encountered:

rkrzewski · 2015-10-19T22:40:05Z

Implemented in 7016917 but tests are needed. That's a bit tricky since timeouts are involved.

rkrzewski · 2015-10-19T22:46:58Z

I've learned by further reading of ClusterCoreDemon source that Cluster.joinSeedNodes(Seq()) would be in fact ignored. However invoking joinSeedNodes with a non-empty list will interrupt current SeedNodeProcess, which means we can simply fetch another list and retry.

rkrzewski added kind/enhancement area/discovery labels Oct 6, 2015

rkrzewski added the size/S label Oct 17, 2015

rkrzewski added a commit that referenced this issue Oct 19, 2015

#11 split seedsFetch and seedsJoin timeouts, implemented join timeout

7016917

rkrzewski added the status/tests needed label Oct 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry fetching seed list when none of the seeds seem to be responding #11

Retry fetching seed list when none of the seeds seem to be responding #11

rkrzewski commented Oct 6, 2015

rkrzewski commented Oct 19, 2015

rkrzewski commented Oct 19, 2015

Retry fetching seed list when none of the seeds seem to be responding #11

Retry fetching seed list when none of the seeds seem to be responding #11

Comments

rkrzewski commented Oct 6, 2015

rkrzewski commented Oct 19, 2015

rkrzewski commented Oct 19, 2015