Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: fixes fencing and hot-reload in 2.11+ #29

Merged
merged 6 commits into from
Oct 18, 2023
Merged

draft: fixes fencing and hot-reload in 2.11+ #29

merged 6 commits into from
Oct 18, 2023

Conversation

ochaton
Copy link
Member

@ochaton ochaton commented Sep 9, 2023

  • DFixes start/restart and reload for Tarantool 2.11. Valid path to retrieve default_cfg and template_cfg is {mt_call, reload_cfg, reconfig_modules}
  • Adds wall-clock timings for etcd_load call to simplify investigations of issues related to "smthing happened to tarantool-etcd"
  • Fencing: now immediately step-downs when another master is discovered in ETCD. Previously it was waiting until previous fencing_timeout is exhausted
  • config/etcd:request now respects optional argument deadline and is used to adjust request-timeout in listings
  • And finally introduces integration test setup to ensure safety of further patches in config

@Mons , please have a look. And do not merge yet :)

Vladislav Grubov added 3 commits August 20, 2023 14:05
	* due to built-in retries on non-200 responses
	config.etcd:list() takes N*timeout to complete
	instead of just timeout.
	Fencing is time-bound algorithm, so listings
	should not overcommit fencing_timeout
	* this patch introduces basic test-suite
	for most common topologies: etcd.cluster.master
	and etcd.instance.single
	Test-suite starts 1/2/3 tarantool instances against 3
	etcd nodes and checks that configuration is valid.
	Test-suite also checks that config is reloadable.
	For now test suite is running on 1.10.15, 2.8.4, 2.10.6,
	2.11.0 and 2.11.1
@ochaton
Copy link
Member Author

ochaton commented Sep 9, 2023

fixes #27

Vladislav Grubov added 2 commits September 13, 2023 09:43
	* feat: support bootstrap_strategy (2.11)
	* feat: derive load_cfg/default_cfg
	* fix: ensure options of box.cfg always cleared before passing
	  to box.cfg (or wrap_box_cfg, or boxcfg)
	* feat: added annotations for config and config.etcd
	* feat: use module-aware logger (2.11 feature)
	* feat: expose load_cfg/{default_cfg,...} to config._load_cfg
config.lua Outdated Show resolved Hide resolved
  * This method is needed for autofailover mechanisms mainly to enforce
    read_only=true during loading phase of tarantool.

  Before this patch race condition existed between autofailover and
  moonlibs/config recovering behaviour.

  If master crashes but fastly restarts then it initiates long running
  loading phase. Master recovers as read_only=true but after returning
  from box.cfg moonlibs/config retrieves config from ETCD and rechecks
  read_only option.

  The race happens when autofailover changes configuration in ETCD, but
  master just in time returns from loading phase and applies oldest
  configuration. This leads cluster to split-brain.

  With method config.enforce_ro it is possible for external coordinator
  firstly enforce_ro on loading leader and receive approval that leader
  will not be promoted to rw until next reload configuration.

  tarantool is enforcable to be ro only when all of the following conditions
  are met:
	1) Tarantool is recovering from snapshot (it was already
	   bootstrapped)
	2) Client's code do not override box.cfg with passing args.boxcfg
	3) args.tidy_load is enabled (default, but can be overriden by
	   client)
	4) config uses ETCD to retreive topology.
@ochaton ochaton merged commit b540dd9 into master Oct 18, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants