You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we launched the WLM subfeature, i.e., multi-tenant search resiliency, to allow managing multi-tenant environments in OpenSearch. However, this feature still relies on external hints to be sent along with each request via an HTTP header.
This becomes cumbersome in programmatic access, and without proper planning, it can lead to unmanageable multi-tenant access. A more efficient solution would allow users to define rules that determine the appropriate tenant for certain types of requests (eg. requests from a specific user or targeting some index). In order for this to work efficiently, these rules (both index and in-memoey storage) must be synchronized consistently and up-to-date across all nodes in the cluster. The approach must be both consistent and efficient.
Assumptions
Rule Changes Are Infrequent: Rule updates are rare.
Rules Storage: The rules for determining the tenant are stored in a system index, and each node maintains an in-memory rule Trie for faster lookups.
Need for Consistency: It's crucial that all nodes have the same set of rules (index and in-memory) at any given time to ensure consistent decision-making when processing requests.
Performance Concerns: The synchronization mechanism must not introduce significant overhead, especially in large clusters.
Synchronization Approaches
Refresh-Based Synchronization
Description: Each node periodically refreshes its local Trie data from the central system index. The refresh would involve pulling the latest set of rules from the node that holds the up-to-date system index data and updating the local Trie.
Advantages:
Simpler implementation.
User can control the refresh interval.
Disadvantages:
Synchronization Delay: As nodes refresh periodically, there will be some delay before a rule change propagates to all nodes. This could potentially lead to inconsistency for short periods.
Performance Impact: Frequent refreshes could impose additional load on nodes.
Push to Sync
Description: When a rule change occurs, the coordinator node saves the rule in the system index and updates its local Trie, then pushes the updated rule to all other nodes so that they can update their in-memory Tries. This needs to be done via custom mechanisms to notify all nodes of the change.
Advantages:
Real-time synchronization: As soon as a rule changes, all nodes can be immediately updated.
Ensures consistency across nodes with minimal delay.
Avoids unnecessary overhead of continuously polling the system index.
Disadvantages:
Complexity: Requires additional infrastructure to manage rule change notifications, and push mechanisms may introduce additional development overhead.
Pull to Sync
Description: Nodes pull the latest rule set from the node with system index when they detect a need to synchronize (e.g., when they receive a request or when a specific trigger is activated).
Disadvantages:
Increased Load: Queries are constantly coming in and this adds significant pressure to the system.
Full Replication with Local Sync
Description: Each node must have a full copy of the system index data, and each node’s Trie is updated from its local index.
Advantages:
Data is fully localized, avoiding cross-node network requests.
Disadvantages:
Against the Principles of Distributed Systems: Forcing all data to be stored on every node can lead to redundant data and increase the storage burden on each node.
Data Synchronization Delays: Even though data exists locally, if the system index replicas on different nodes are not synchronized, it can cause data inconsistency. The update delay can also increase, especially in larger clusters.
Cost and Storage Overhead: Each node needs to store all rule data, which can create significant storage pressure and increase the cost.
Conclusion
After evaluating the synchronization approaches, we recommend adopting the Push to Sync method. This approach guarantees that as soon as a rule is updated, all nodes are immediately notified and update their local Trie, maintaining consistency across the cluster without delay. Besides, this approach also generates minimal system load since the rule update event happens rarely.
While Refresh-Based Synchronization may be simpler to implement and gives user freedom to control the refresh interval, it introduces a risk of synchronization delays and unnecessary overhead of periodically refreshing rules (since the rule updates will be minimal). These delays could result in different nodes having outdated rules temporarily, and could lead to inconsistent results given by different nodes for similar queries. Since consistency is important in our use case, this approach may not be ideal.
In summary, Push to Sync is the optimal synchronization mechanism to meet our needs for consistency, low latency, and minimal system impact.
Please describe the end goal of this project
Recently, we launched the WLM subfeature, i.e., multi-tenant search resiliency, to allow managing multi-tenant environments in OpenSearch. However, this feature still relies on external hints to be sent along with each request via an HTTP header.
This becomes cumbersome in programmatic access, and without proper planning, it can lead to unmanageable multi-tenant access. A more efficient solution would allow users to define rules that determine the appropriate tenant for certain types of requests (eg. requests from a specific user or targeting some index). In order for this to work efficiently, these rules (both index and in-memoey storage) must be synchronized consistently and up-to-date across all nodes in the cluster. The approach must be both consistent and efficient.
Assumptions
Synchronization Approaches
Conclusion
After evaluating the synchronization approaches, we recommend adopting the Push to Sync method. This approach guarantees that as soon as a rule is updated, all nodes are immediately notified and update their local Trie, maintaining consistency across the cluster without delay. Besides, this approach also generates minimal system load since the rule update event happens rarely.
While Refresh-Based Synchronization may be simpler to implement and gives user freedom to control the refresh interval, it introduces a risk of synchronization delays and unnecessary overhead of periodically refreshing rules (since the rule updates will be minimal). These delays could result in different nodes having outdated rules temporarily, and could lead to inconsistent results given by different nodes for similar queries. Since consistency is important in our use case, this approach may not be ideal.
In summary, Push to Sync is the optimal synchronization mechanism to meet our needs for consistency, low latency, and minimal system impact.
Supporting References
#16813
#16797
Issues
#16797
Related component
Search
The text was updated successfully, but these errors were encountered: