Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added inflight_requests_limit #2356

Merged
merged 5 commits into from
Oct 6, 2024

Conversation

GilboaAWS
Copy link
Collaborator

Inflight Requests Limit

This PR introduces a new feature that limits the maximum number of concurrent (inflight) requests sent to Valkey through each connection of a Glide client. This feature is implemented in the Glide infrastructure and is currently supported for the Python client, with plans to extend support to other language clients in the future.

Motivation

Limiting the number of inflight requests per connection can help prevent the queues in the Glide infrastructure from running out of memory (OOM) due to excessive queuing or blocking. When the Glide infrastructure receives more requests than it can handle, it may start queuing requests, leading to increased memory usage and potential OOM situations in the queues. By controlling the number of inflight requests, this feature aims to mitigate the risk of the Glide queues running out of memory and improve overall system stability and reliability.

Implementation

The inflight requests limit is implemented using an atomic counter mechanism. Each connection in a Glide client has an associated atomic counter that keeps track of the number of inflight requests for that connection.

When a new request is initiated, the client attempts to increment the atomic counter associated with the connection. If the counter value after incrementing exceeds the configured inflight requests limit for that connection, the request is rejected with an error, indicating that the maximum inflight requests limit has been reached.

If the counter value after incrementing is within the limit, the request proceeds, and the counter is decremented when the request is completed.

Configuration

The inflight requests limit is configurable and has a default value of 1000 requests per connection. This default value can be overridden by client-specific configuration options, which will be documented separately for each language client.

For the Python client, the inflight_requests_limit option has been added to the GlideClusterClientConfiguration and GlideClientConfiguration classes, allowing users to specify the desired limit when creating a new client instance.

Limitations and Considerations

  • The inflight requests limit applies to each individual connection in a Glide client, not to the entire client instance.
  • Requests that exceed the inflight requests limit are rejected with an error and are not queued or processed later.
  • Setting a low inflight requests limit might negatively impact the client's performance, as it can lead to more requests being rejected.
  • It's recommended to set the inflight requests limit based on the expected load, available resources, and the desired balance between performance and resource utilization.

Future Work

  • Extend support for the inflight requests limit feature to other language clients (e.g., Node, Java, etc.).
  • Provide monitoring and metrics around the inflight requests limit to help users better understand the client's behavior and tune the configuration accordingly.

@GilboaAWS GilboaAWS added python Rust core protobuf Protobuf, proto and protoc labels Sep 26, 2024
@GilboaAWS GilboaAWS requested a review from a team as a code owner September 26, 2024 09:42
Copy link
Collaborator

@Yury-Fridlyand Yury-Fridlyand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed description!
How multi-node requests are counted? Single request between client and GLIDE core is converted to multiple requests to server nodes.

@GilboaAWS
Copy link
Collaborator Author

#2356 (review)

How multi-node requests are counted?

The inflight requests limit counts multi-node requests as a single command. This design decision was made to keep the implementation simple and provide a straightforward configuration option for users.

This approach strikes a balance between providing a useful feature for controlling the number of concurrent requests and maintaining a reasonable level of complexity in the implementation. However, we recognize the potential need for more granular control and visibility into the request flow.

As part of our future roadmap, we plan to introduce telemetry and monitoring capabilities that will provide deeper insights into the request handling process, including the ability to track and analyze multi-node requests in greater detail. This will enable users to better understand the request flow and make more informed decisions regarding the configuration of the inflight requests limit or other performance-related settings.

@ikolomi
Copy link
Collaborator

ikolomi commented Sep 29, 2024

  1. Link the issue Limit the maximum number of concurrent (inflight) requests sent to Valkey through each connection of a Glide client #1253, all the description should be done there
  2. We have to deliver for the all officially supported wrappers (Java, Node)

@ikolomi
Copy link
Collaborator

ikolomi commented Sep 30, 2024

There is absolutely no need to use fetch_update() when fetch_sub() is sufficient - fetch_update is based on compare_exchange, which is expensive, prone to repetitiveness and ABA problem

@GilboaAWS
Copy link
Collaborator Author

Thanks @ikolomi for the review!
As per comment:

prone to repetitiveness and ABA problem

The ABA (ABA Problem) issue is not a concern in this case, as the operation is performed on an AtomicUsize rather than a pointer. Even if the value changes, the fetch_update method will retry the operation (what you referred to as repetitiveness) and update the value accordingly.

no need to use fetch_update() when fetch_sub() is sufficient

The use of fetch_sub() and fetch_add() could potentially result in exceeding the defined value limit. In the current implementation, exceeding the limit is undesirable.
In such a scenario, we may encounter an edge case where glide returns a response, but a new request cannot be sent because the counter has been decreased, even though the counter still exceeds the defined limit.
This could lead to a situation where the request pipeline is not fully utilized to its maximum capacity.

By employing fetch_update(), we update the counter only upon actual usage, where we count real requests that proceed to the request pipeline.

@ikolomi
Copy link
Collaborator

ikolomi commented Sep 30, 2024

In such a scenario, we may encounter an edge case where glide returns a response, but a new request cannot be sent because the counter has been decreased, even though the counter still exceeds the defined limit.

This is simply not true - Having counter of negative value for an infinite short time period will not have any negative impact on the performance. On the contrary - using fetch_update() will increases latency due it complex memory bus interaction

@GilboaAWS
Copy link
Collaborator Author

GilboaAWS commented Sep 30, 2024

Having counter of negative value for an infinite short time period will not have any negative impact on the performance.

I believe this one will illustrate this negative impact.

image

Thread No.4 could have continued to run without returning an error, but it couldn't because the counter was exceeding the limit at that moment.
When dealing with high throughput, it can lead to the loss of many requests.

@@ -29,6 +29,7 @@ pub const DEFAULT_CONNECTION_ATTEMPT_TIMEOUT: Duration = Duration::from_millis(2
pub const DEFAULT_PERIODIC_TOPOLOGY_CHECKS_INTERVAL: Duration = Duration::from_secs(60);
pub const INTERNAL_CONNECTION_TIMEOUT: Duration = Duration::from_millis(250);
pub const FINISHED_SCAN_CURSOR: &str = "finished";
pub const DEFAULT_MAX_INFLIGHT_REQUESTS: u32 = 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment why is it 1000

@@ -102,6 +105,8 @@ pub enum ClientWrapper {
pub struct Client {
internal_client: ClientWrapper,
request_timeout: Duration,
// Setting this counter to limit the inflight requests, in case of any queue is blocked, so we return error to the customer.
inflight_requests_counter: Arc<AtomicIsize>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not what it counts , so it should be "inflight_requests_allowed"

@@ -409,6 +414,28 @@ impl Client {
Err(err)
}
}

pub fn reserve_inflight_request(&self) -> bool {
if self.inflight_requests_counter.load(Ordering::SeqCst) < 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "inflight_requests_allowed" than <=0 should be used

if self.inflight_requests_counter.load(Ordering::SeqCst) < 0 {
return false;
} else {
if self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment that the value might change from other tasks

@GilboaAWS GilboaAWS force-pushed the inflight_requests_limit branch 2 times, most recently from 125abe3 to 89b5347 Compare October 6, 2024 06:32
@GilboaAWS GilboaAWS merged commit fba6c07 into valkey-io:main Oct 6, 2024
41 checks passed
@GilboaAWS GilboaAWS deleted the inflight_requests_limit branch October 6, 2024 09:19
@GilboaAWS GilboaAWS restored the inflight_requests_limit branch October 6, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protobuf Protobuf, proto and protoc python Rust core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants