Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete Stale Keys from Disk AND RBM #13

Merged
merged 22 commits into from
Dec 19, 2023

Conversation

kiranprakash154
Copy link
Collaborator

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kiranprakash154 kiranprakash154 self-assigned this Dec 5, 2023

diskCleanupKeyToCountMap
.computeIfAbsent(shardId, k -> ConcurrentCollections.newConcurrentMap())
.merge(cleanupKey.readerCacheKeyId, 1, Integer::sum);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment explaining this since it's a bit hard to read? My understanding is add a new concurrent map as the value for shardId if it doesn't yet exist. Then take this inner map and set its value for readerCacheKeyId to 1 if it's empty, or increment it by 1 if it already exists.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your understanding is right.
I do have code comments already added on top of this method that I thought would be self explanatory.
let me know if you didn't see it or u think that was not clear enough.

return;
}

if (cleanupKey.readerCacheKeyId == null) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will the reader cache key ID be null? Is this when the shard is totally closed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be null when the cache clear api is called which calls this clear() method.
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/indices/IndicesRequestCache.java#L132

When the shard is closed, we would call the onclose() method of the cleanupKey generated which will have the readerCacheKeyId.

so essentially.
Closing a shard, or moving to a new reader (on refresh) will always know what the current readerCacheKeyId is.
But in terms of clear api, its more like, i dont care what the current readerCacheKeyId is, cleanup everything.

public static final Setting<Boolean> INDICES_ID_FIELD_DATA_ENABLED_SETTING = Setting.boolSetting(
"indices.id_field_data.enabled",
true,
Property.Dynamic,
Property.NodeScope
);

public static final Setting<String> INDICES_REQUEST_CACHE_DISK_CLEAN_THRESHOLD_SETTING =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could instead use Setting.DoubleSetting() from 0 to 100 (or 0.0 to 1.0), which has built in min value and max value arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that wouldn't accept 50% and instead only works for 0.5.
having string is more flexible. this is following the same pattern like CLUSTER_ROUTING_ALLOCATION_LOW_DISK_WATERMARK_SETTING --- link


// When entire disk tier is stale, test whether cache cleaner cleans up everything from disk
public void testDiskTierInvalidationByCacheCleanerEntireDiskTier() throws Exception {
int thresholdInMillis = 4_000;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set lower so the test doesn't take so long to run? Or does it flake if you do that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it flakes, i'm setting the threshold low enough so that by the time the cleaner runs, we have inserted 2 entries.
and then put the tests to sleep until the cachecleaner runs the job. when test thread wakes up the entries would have been deleted.

client().prepareIndex("index").setId("1").setSource("k", "good bye");
ensureSearchable("index");

for (int i = 0; i < 6; i++) { // index 5 items with the new readerCacheKeyId
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this 6 items?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, i'm considering going back to kinder garden with my son


double result = indicesRequestCache.diskCleanupKeysPercentage();

assertEquals(0, result, 0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add small delta?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ? we want this to be absolute zero since we are checking when disk cache is empty. That's why I didn't add any delta

@@ -215,18 +215,56 @@ public class IndicesService extends AbstractLifecycleComponent
private static final Logger logger = LogManager.getLogger(IndicesService.class);

public static final String INDICES_SHARDS_CLOSED_TIMEOUT = "indices.shards_closed_timeout";
public static final String INDICES_REQUEST_CACHE_DISK_CLEAN_THRESHOLD_SETTING_KEY = "cluster.request_cache.disk.cleanup_threshold_percentage";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this one start with "cluster" instead of "indices" like the rest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want this to be a cluster level setting.

@kiranprakash154 kiranprakash154 changed the title [Draft] Delete Stale Keys from Disk AND RBM Delete Stale Keys from Disk AND RBM Dec 18, 2023
}
}

public static class CleanDiskCacheTests {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't run as part of the test suite in intellij for me - it says bc it's part of the nested class, but it might also be bc of the junit stuff. Should prob refactor this, but I'm gonna go ahead and benchmark without those changes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that too, let me address this, thanks !

@peteralfonsi peteralfonsi merged commit f375874 into feature/tiered-caching Dec 19, 2023
12 of 14 checks passed
Comment on lines +127 to +128
private final ConcurrentMap<ShardId, ConcurrentMap<String , Integer>> diskCleanupKeyToCountMap = ConcurrentCollections.newConcurrentMap();
private final AtomicInteger staleKeysInDiskCount = new AtomicInteger(0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these variables should be extracted out and be moved to some other class. Or make it generic so that it can be extended to other tiers if needed. As otherwise it tightly couples this cache with disk properties irrespective of whether tiered caching(disk) is enabled or not.

Here diskCleanupKeyToCountMap serve a mechanism to calculate staleKeys which in itself is like a "stats" associated with disk tier. So it should be attached to disk tier itself rather than hardcoding here. In case disk tier is enabled, this stats would be automatically calculated and available, otherwise now.

keysToClean.add(new CleanupKey(entity, null));
CleanupKey cleanupKey = new CleanupKey(entity, null);
keysToClean.put(cleanupKey, new CleanupStatus());
updateStaleKeysInDiskCount(cleanupKey);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be called in case disk tier is turned off.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait addressing this until your framework changes are in, it will give me a better picture on how to move all of this into each cache layer. Ideally, yes i also want the cleanup logic of each cache to be in their own classes instead of indicesrequestcache.java, which will also address this.

we need to make sure we clean the disk cache as well
hence passing threshold as 0
*/
cleanDiskCache(0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again call it explicitly doesn't make sense as disk tier might be turned off.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait addressing this until your framework changes are in, it will give me a better picture on how to move all of this into each cache layer. Ideally, yes i also want the cleanup logic of each cache to be in their own classes instead of indicesrequestcache.java, which will also address this.

@kiranprakash154 kiranprakash154 added the enhancement New feature or request label Jan 31, 2024
peteralfonsi pushed a commit that referenced this pull request Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants