fix: Miscalc and inefficient db access patterns of session concurrency limits #3064
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolves #2177
Key Changes
ai.backend.models.resource_policy
concurrency_used
tracker, with explicit Redis key prefix.concurrency_used
values, but the session concurrency limit requires a special attention because the timing of session state change (PENDING → SCHEDULED) differs from the timing of predicate checks, which may result in exceed of concurrency limits when there are multiple sessions scheduled together within a single scheduler loop..with_for_update()
query options to reduce serialization failures by explicitly locking the affected rows in update transactions.Notice on refactored
RootContext
in ManagerYou may be overwhelmed by the number of changed files, but don't worry — Most changes under
src/ai/backend/manager/api
are the relocation of root context attributes:root_ctx.{pidx,local_config,shared_config}
→root_ctx.c.{...}
("c" from configurations)root_ctx.{db,redis_*}
→root_ctx.h.{...}
("h" from halfstack clients)root_ctx.{concurrency_tracker,event_*,hook_*,*_monitor,...}
→root_ctx.g.{...}
("g" from global singletons)For this PR, I had to pass the
ConcurrencyTracker
object to theAgentRegistry
instance, and realized that we should parametrize the halfstack client objects for both. This led to restructuring the root context attributes like this.This makes it easier to spot the mis-constructed abstraction that mixes
.h
access and.g
access in the manager API layer. The manager model layer should implement abstract operations used by the manager API layer, and.g.concurrency_tracker
is a first example to be added from the beginning in this way.For future reference:
shared_config
methods to themodels.config
module.Actually, the migration of container registries from etcd (shared config) to postgres is in line with this direction.
.h
in the API layer in most cases, replacing them with explicit model-layer APIs written as stateful objects in.g
or stateless functions imported fromai.backend.models
.AgentRegistry
is treated specially; it does not belong to.g
but just sits atRootContext
directly. I haven't decided where to put it... 🤔Checklist: (if applicable)