Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintainer, scheduler: support dynamic merge and split #698

Merged
merged 24 commits into from
Dec 25, 2024

Conversation

CharlesCheung96
Copy link
Collaborator

@CharlesCheung96 CharlesCheung96 commented Dec 19, 2024

Group Checker

GroupChecker checks whether the scheduling state of all tasks in a group meets the specified constraints based on the task status.

type GroupChecker[T ReplicationID, R Replication[T]] interface {
	AddReplica(replication R)
	RemoveReplica(replication R)
	UpdateStatus(replication R)

	Check(batch int) GroupCheckResult
	Name() string
	Stat() string
}

This PR implements two basic checker:

  1. hotSpanChecker focuses on identifying and managing "hot spans", which are spans with high throughput.

  2. rebalanceChecker focuses on rebalancing the load across nodes to ensure fairness. It evaluates global system conditions and decides whether to merge or rebalance spans based on load thresholds and imbalance ratios. The details are:
    a. Node Constraint: trigger MergeAndSplit if len(s.allTasks) < len(allNodes)*MinSpanNumberCoefficient
    b. Hard Imbalance Constraint: trigger MergeAndSplit if maxLoad-minLoad >= float64(s.hardWriteThreshold) && maxLoad/minLoad > s.hardImbalanceThreshold
    c. Soft Imbalance Constraint: trigger MergeAndSplit if softRebalanceScore accumulates up to s.softRebalanceScoreThreshold
    d. Soft merge Constraint: trigger Merge if softMergeScore accumulates up to s.softMergeScoreThreshold

Manuual Tests

Hot Span && Node Constraint

image

Rebalance Spans

image

Merge Spans

4MLDxJsoCP

[TODO] Improve check strategies

There are several shortcomings in the current implementation, and possible improvements include:

  • For hotspan with low throughput (eg, less than 32MB/s), implement multi-level hotspan groups to minimize the necessity of frequent splitting.
    • With the enable-table-across-nodes configuration enabled, it ensures that only sufficiently large tables are split.
    • With the enable-table-across-nodes disabled, the scheduler ensures basic load balancing across nodes.
  • For group with low imbalance score, implement a lightweight dynamic splitting mechanism that minimizes overhead while ensuring better load distribution. This approach should adaptively adjust the split granularity to balance performance and resource utilization.
  • Design more effective scoring algorithms to accurately identify unbalanced states.
  • Dynamically calculate the number of subSpans during rebalancing to achieve better balancing results through smaller splitting granularity.

split granularity.

@CharlesCheung96 CharlesCheung96 changed the title Add more comments maintainer, scheduler: support dynamic merge and split Dec 24, 2024
@CharlesCheung96 CharlesCheung96 merged commit 955b924 into pingcap:master Dec 25, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant