-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: stats and keyspace notifications about lazy expiration #1547
Open
proost
wants to merge
32
commits into
valkey-io:unstable
Choose a base branch
from
proost:feat-keyspace-expiration-misses
base: unstable
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat: stats and keyspace notifications about lazy expiration #1547
proost
wants to merge
32
commits into
valkey-io:unstable
from
proost:feat-keyspace-expiration-misses
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1547 +/- ##
============================================
+ Coverage 70.85% 70.96% +0.11%
============================================
Files 120 120
Lines 64991 65066 +75
============================================
+ Hits 46050 46175 +125
+ Misses 18941 18891 -50
|
proost
changed the title
feat: keyspace expiration misses
feat: stats and keyspace notifications about lazy expiration
Jan 17, 2025
Signed-off-by: proost <[email protected]>
Signed-off-by: proost <[email protected]>
Signed-off-by: proost <[email protected]>
This PR replaces dict with hashtable in the ZSET datatype. Instead of mapping key to score as dict did, the hashtable maps key to a node in the skiplist, which contains the score. This takes advantage of hashtable performance improvements and saves 15 bytes per set item - 24 bytes overhead before, 9 bytes after. Closes valkey-io#1096 --------- Signed-off-by: Rain Valentine <[email protected]> Signed-off-by: Viktor Söderqvist <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: proost <[email protected]>
Skip logreqres on tests for the HELLO command Signed-off-by: Rueian <[email protected]> Signed-off-by: proost <[email protected]>
Resolves issue with valkey-cli not auto exiting from subscribed mode on reaching zero pub/sub subscription (previously filed on Redis) redis/redis#12592 --------- Signed-off-by: Nikhil Manglore <[email protected]> Signed-off-by: proost <[email protected]>
- remove old ignores - fix a "new" typo 🎁 Signed-off-by: Viktor Szépe <[email protected]> Signed-off-by: proost <[email protected]>
This PR introduces improvements to the hashtable iterator, implementing prefetching technique described in the blog post [Unlock One Million RPS - Part 2](https://valkey.io/blog/unlock-one-million-rps-part2/) . The changes lay the groundwork for further enhancements in use cases involving iterators. Future PRs will build upon this foundation to improve performance and functionality in various iterator-dependent operations. In the pursuit of maximizing iterator performance, I conducted a comprehensive series of experiments. My tests encompassed a wide range of approaches, including processing multiple bucket indices in parallel, prefetching the next bucket upon completion of the current one, and several other timing and quantity variations. Surprisingly, after rigorous testing and performance analysis, the simplest implementation presented in this PR consistently outperformed all other more complex strategies. ## Implementation Each time we start iterating over a bucket, we prefetch data for future iterations: - We prefetch the entries of the next bucket (if it exists). - We prefetch the structure (but not the entries) of the bucket after the next. This prefetching is done when we pick up a new bucket, increasing the chance that the data will be in cache by the time we need it. ## Performance The data below was taken by conducting keys command on 64cores Graviton 3 Amazon EC2 instance with 50 mil keys in size of 100 bytes each. The results regarding the duration of “keys *” command was taken from “info all” command. ``` +--------------------+------------------+-----------------------------+ | prefetching | Time (seconds) | Keys Processed per Second | +--------------------+------------------+-----------------------------+ | No | 11.112279 | 4,499,529 | | Yes | 3.141916 | 15,913,862 | +--------------------+------------------+-----------------------------+ Improvement: Comparing the iterator without prefetching to the one with prefetching, we can see a speed improvement of 11.112279 / 3.141916 ≈ 3.54 times faster. ``` ### Save command improvment #### Setup: - 64cores Graviton 3 Amazon EC2 instance. - 50 mil keys in size of 100 bytes each. - Running valkey server over RAM file system. - crc checksum and comperssion off. #### Results ``` +--------------------+------------------+-----------------------------+ | prefetching | Time (seconds) | Keys Processed per Second | +--------------------+------------------+-----------------------------+ | No | 28 | 1,785,700 | | Yes | 19.6 | 2,550,000 | +--------------------+------------------+-----------------------------+ Improvement: - Reduced SAVE time by 30% (8.4 seconds faster) - Increased key processing rate by 42.8% (764,300 more keys/second) ``` Signed-off-by: NadavGigi <[email protected]> Signed-off-by: proost <[email protected]>
This PR is to cleanup the `SERVER_TEST` compiler flag from cmake compile definitions, as it is no longer required in the new unit test framework, see valkey-io#428. Signed-off-by: Karthick Ariyaratnam <[email protected]> Signed-off-by: proost <[email protected]>
We need to add a hash tag in cluster mode. Fixes valkey-io#1531. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
The fix that Redis gave us for the CVE-2024-46981 was freeing lctx.lua, and I didn't merge it correctly. We made some changes so that we are able to async free the lua context, so we need to free the passed in context. This was applied correctly on the two released versions (8.0 and 7.2) just not on unstable. Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: proost <[email protected]>
Fixes valkey-io#1538 Signed-off-by: Harkrishn Patro <[email protected]> Signed-off-by: proost <[email protected]>
…lkey-io#1535) In valkey-io#1441, we found a assert, and decided remove this assert and instead just free the newly created node and close the link, since if we cannot get the IP from the link it probably means the connection was closed. ``` === VALKEY BUG REPORT START: Cut & paste starting from here === 17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED === 17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true ------ STACK TRACE ------ 17847 valkey-server * src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634] src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de] /__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e] src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea] src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547] /lib64/libc.so.6(+0x40c8) [0x7f083985a0c8] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b] src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5] ``` But it also introduces another assert. The reason is that this new node is not added to the cluster nodes dict. ``` 17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED === 17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true ------ STACK TRACE ------ 17128 valkey-server * src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4] src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2] src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618] /__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278] src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09] src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5] src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e] ``` This closes valkey-io#1527. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
… is 0 (valkey-io#1541) When latency-monitor-threshold is set to 0, it means the latency monitor is disabled, and in VM_LatencyAddSample, we wrote the condition incorrectly, causing us to record latency when latency was turned off. This bug was introduced in the very first day, see e3b1d6d, it was merged in 2019. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
…flicts (valkey-io#1018) When multiple primary nodes fail simultaneously, the cluster can not recover within the default effective time (data_age limit). The main reason is that the vote is without ranking among multiple replica nodes, which case too many epoch conflicts. Therefore, we introduced into ranking based on the failed primary shard-id. Introduced a new failed_primary_rank var, this var means the rank of this myself instance in the context of all failed primary list. This var will be used in failover and we will do the failover election packets in order based on the rank, this can effectively avoid the voting conflicts. If a single primary is down, the behavior is the same as before. If multiple primaries are down, their replica election initiation time will be delayed by 500ms according to the ranking. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
… the FAIL (valkey-io#1191) Imagine we have a cluster, for example a three-shard cluster, if shard 1 doing a CLUSTER RESET HARD, it will change the node name, and then other nodes will mark it as NOADR since the node name received by PONG has changed. In the eyes of other nodes, there is one working primary node left but with no address, and in this case, the address report in MOVED will be invalid and will confuse the clients. And in the same time, the replica will not failover since its primary is not in the FAIL state. And the cluster looks OK to everyone. This leaves a cluster that appears OK, but with no coverage for shard 1, obviously we should do something like CLUSTER FORGET to remove the node and fix the cluster before using it. But the point in here, we can mark the NOADDR node as FAIL to advance the cluster state. If a node is NOADDR means it does not have a valid address, so we won't reconnect it, we won't send PING, we won't gossip it, it seems reasonable to mark it as FAIL. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
When the cluster changes, we need to persist the cluster configuration, and these file IO operations may cause latency. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
The commands used in valkey-cli tests are not important the reply schema validation. Skip them to avoid the problem if tests hanging. This has failed lately in the daily job: ``` [TIMEOUT]: clients state report follows. sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription Killing still running Valkey server 33357 ``` These test cases use a special valkey-cli command `:get pubsub` command, which is an internal command to valkey-cli rather than a Valkey server command. This command hangs when compiled with with logreqres enabled. Easy solution is to skip the tests in this setup. The test cases were introduced in valkey-io#1432. Signed-off-by: Viktor Söderqvist <[email protected]> Signed-off-by: proost <[email protected]>
After valkey-io#1545 disabled some tests for reply schema validation, we now have another issue that ECHO is not covered. ``` WARNING! The following commands were not hit at all: echo ERROR! at least one command was not hit by the tests ``` This patch adds a test case for ECHO in the unit/other test suite. I haven't checked if there are more commands that aren't covered. Signed-off-by: Viktor Söderqvist <[email protected]> Signed-off-by: proost <[email protected]>
This PR replaces dict with the new hashtable data structure in the HASH datatype. There is a new struct for hashtable items which contains a pointer to value sds string and the embedded key sds string. These values were previously stored in dictEntry. This structure is kept opaque so we can easily add small value embedding or other optimizations in the future. closes valkey-io#1095 --------- Signed-off-by: Rain Valentine <[email protected]> Signed-off-by: proost <[email protected]>
In some cases unix groups could have whitespace and/or `\` in them. One example is my workstation. It's a MacOS in an Active Directory domain. So my user has group `LD\Domain Users`. Running `make test` on `unstable` and `8.0` branches fails with: I'm not sure if we need to fix this in 8.0. But it seems that it should be fixed in unstable. Signed-off-by: secwall <[email protected]> Signed-off-by: proost <[email protected]>
Introduced at https://github.com/valkey-io/valkey/pull/1165/files Signed-off-by: naglera <[email protected]> Signed-off-by: proost <[email protected]>
`sds` is a typedef of `char *`. `const sds` means `char * const`, i.e. a const-pointer to non-const content. More often, you would want `const char *`, i.e. a pointer to const-content. Until now, it's not possible to express that. This PR adds `const_sds` which is a pointer to const-content sds. To get a const-pointer to const-content sds, you can use `const const_sds`. In this PR, some uses of `const sds` are replaced by `const_sds`. We can use it more later. Fixes valkey-io#1542 --------- Signed-off-by: Viktor Söderqvist <[email protected]> Signed-off-by: proost <[email protected]>
Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients to inform users about if clients are paused. --------- Signed-off-by: zhaozhao.zz <[email protected]> Signed-off-by: proost <[email protected]>
Adds filter options to CLIENT LIST: * USER <username> Return clients authenticated by <username>. * ADDR <ip:port> Return clients connected from the specified address. * LADDR <ip:port> Return clients connected to the specified local address. * SKIPME (YES|NO) Exclude the current client from the list (default: no). * MAXAGE <maxage> Only list connections older than the specified age. Modifies the ID filter to CLIENT KILL to allow multiple IDs * ID <client-id> [<client-id>...] Kill connections by client ids. This makes CLIENT LIST and CLIENT KILL accept the same options. For backward compatibility, the default value for SKIPME is NO for CLIENT LIST and YES for CLIENT KILL. The MAXAGE comes from CLIENT KILL, where it *keeps* clients with the given max age and kills the older ones. This logic becomes weird for CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case of first testing manually using CLIENT LIST, and then running CLIENT KILL with the same filters. The `ID client-id [client-id ...]` no longer needs to be the last filter. The parsing logic determines if an argument is an ID or not based on whether it can be parsed as an integer or not. Partly addresses: valkey-io#668 --------- Signed-off-by: Sarthak Aggarwal <[email protected]> Signed-off-by: proost <[email protected]>
Just like spell-check workflow, we should allow to trigger it in push events, so that the forks repo can notice the format thing way before submitting the PR. Signed-off-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
…d other commands(valkey-io#1517) Some commands that use unix-time, such as `EXPIREAT` and `SET EXAT`, should include the deleted keys in the `expired_keys` statistics if the specified time has already expired, and notifications should be sent in the manner of expired. --------- Signed-off-by: Ray Cao <[email protected]>
…1312) This commit creates a new compilation unit for the scripting engine code by extracting the existing code from the functions unit. We're doing this refactor to prepare the code for running the `EVAL` command using different scripting engines. This PR has a module API change: we changed the type of error messages returned by the callback `ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc` to be a `ValkeyModuleString` (aka `robj`); This PR also fixes valkey-io#1470. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: proost <[email protected]>
…1563) This issue affected only two message types (CLUSTERMSG_TYPE_PUBLISH and CLUSTERMSG_TYPE_PUBLISHSHARD) because they used a light message header, which caused the CLUSTER INFO stats to miss sent/received message information for those types. --------- Signed-off-by: Harkrishn Patro <[email protected]> Signed-off-by: Harkrishn Patro <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
When processing a cluster bus PING extension, there is a memory leak when adding a new key to the `nodes_black_list` dict. We now make sure to free the key `sds` if the dict did not take ownership of it. Signed-off-by: Pierre Turin <[email protected]> Signed-off-by: proost <[email protected]>
Update comments and log message in `cluster_legacy.c`. Follow-up from valkey-io#1441. Signed-off-by: Pierre Turin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: proost <[email protected]>
proost
force-pushed
the
feat-keyspace-expiration-misses
branch
from
January 17, 2025 14:32
e3452dd
to
e9a9894
Compare
…eat-keyspace-expiration-misses Signed-off-by: proost <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolve: #1507