Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eviction not working as expected #3

Open
ghz-max opened this issue Apr 22, 2023 · 22 comments
Open

Eviction not working as expected #3

ghz-max opened this issue Apr 22, 2023 · 22 comments
Labels
question Further information is requested

Comments

@ghz-max
Copy link

ghz-max commented Apr 22, 2023

First of all, awesome work! Very promising.

That being said, I may I found a bug in the following situation:

Step to reproduce

        var cacheSize int64 = 1000
	client, err := theine.NewBuilder[string, string](cacheSize).RemovalListener(func(key string, value string, reason theine.RemoveReason) {
		fmt.Println("removal", key, value, reason)
	}).Build()
	if err != nil {
		panic(err)
	}
	client.Set("test_key", "test_value", 500)
	fmt.Println(client.Get("test_key"))
	client.Set("test_2_key", "test_2_value", 500)
	fmt.Println(client.Get("test_2_key"))
	client.Set("test_3_key", "test_3_value", 600)
	fmt.Println(client.Get("test_3_key"))
	// wait for the eviction to proceed
	time.Sleep(time.Second)
	fmt.Println(client.Get("test_3_key"))

Expected

test_value true
test_2_value true
test_3_value true
removal test_key test_value 1
removal test_2_key test_2_value 1
test_3_value true

Actually returned

test_value true
test_2_value true
test_3_value true
removal test_3_key test_3_value 1
 false

This particular behavior seems to happens only when the sum of key's cost equals cacheSize

@Yiling-J
Copy link
Owner

Yiling-J commented Apr 23, 2023

Get buffer size is 64 and you only insert 3 keys, so first the frequency sketch is not update yet. All 3 keys have frequency 0(if updated, all will be 1, still same so doesn't matter). Then, when key-3 is inserted, because cache is full already, Theine compare key-3 frequency to the tail of LRU: key-1. Because both of them have frequency 0, current behavior is inserting new key only if new frequency > tail frequency. So that's why key-3 is removed. If I change to new frequency >= tail frequency, then key-1 and key-2 will be removed. To me both seems ok, which one do you think is better?

@Yiling-J Yiling-J changed the title Eviction not wokring Eviction not working as expected Apr 23, 2023
@ghz-max
Copy link
Author

ghz-max commented Apr 23, 2023

Having never evicting entries is an issue to me, even if it’s an edge case that rarely occurs, evicting olds entries to make room for a new one is what I’d expect, because as is there will be no new entries added.

@Yiling-J
Copy link
Owner

Yiling-J commented Apr 23, 2023

I want to confirm first, does this affect your real use case? Given the fact that frequency is dynamically updated, if your entry is popular enough it will be kept in cache. In the test case your provide, cacheSize is small and cost is large, so the buffer in each shard won't work. I will also take a look how Caffeine handle that.

I think I should add some details to README. So adaptive Window-TinyLFU means Theine starts as an approximate LFU cache, but will switch between LRU and LFU dynamically depends on your workload pattern. The switch is based on history hit ratios and adjust every 10x cache size requests. So in your test 6 requests won't trigger any adaptive adjustment.

@ghz-max
Copy link
Author

ghz-max commented Apr 23, 2023

I'm building a cache disk layer to back a networked filesystem for a large scale project and libraries like theine and ristretto would ease my life thanks to the cost feature.

The cacheSize would the size of the cache disk (a fast ssd/nvme), ranging from few hundred MiBs to few TiB and each cache entry would cost the file system IO size (ranging from 4KiB to 8MiB in my case).

A worst case scenario would be a cache size of 512MiB and 64 cached IOps of 8MiB. It would be a odd case but not entirely out of reach. I would prefer even random evictions rather than no eviction at all in my case.

@ben-manes
Copy link

Favoring recency can be a disadvantage in workloads that perform scans such as database, search, or analytical queries. The > approach to restrict admission rather than >= to allow lets the cache be scan resistant in a loopy access pattern. A modestly sized cache will have enough space for an admission window that can capture recent arrivals (delaying the filtering) and the adaptive behavior lets the cache tune itself dynamically. We tend to favor optimizing for longer running, modestly sized caches instead of tiny caches or short sequences because those are usually easy to fix, less critical, or a strawman that doesn't represent the application.

Caffeine's policy choices were data driven by simulating a diverse set of workloads. When it under performed other eviction policies then we analyzed for the cause, experimented, and found solutions. Currently the policy is either the leader or competitive across all of these workloads, but more are very welcome. A problem we run into is that research papers will sometimes claim to have a substantially better hit rate, but the authors cannot provide the data to let us reproduce and understand what they observed. That makes it difficult to justify improvement ideas as it is probably more harmful to make well intentioned but blind changes.

If you could provide traces of your target workloads then we can see the actual performance and how that behavior compares to optimal and other leading policies.

@Yiling-J
Copy link
Owner

Thanks @ben-manes . Just as ben said @ghz-max , optimize for your small cache may harm longer running, modestly sized cache. in your 512/8 case I think what you need is an exactly LRU policy? If that's true and you can know that in advance, I think I can make policy configurable, but you must chooose LRU manually. If your IO size change dramatically, which means 512MiB may contain thousands of entries or just dozen, I think it's very hard to optimize it automatically

@Yiling-J
Copy link
Owner

hi @ben-manes , is the minimum window size in caffeine 1? In Theine window is in each shard, so the deque size in each shard is: capacity/100/shardCount, and minimum is actually 0 now. I'm making deque cost aware(it's using entry count only now), and plan to change the minimum to 1, but if I do this the sum of window won't be 1/100 capacity any more if cache capacity is small.

@ben-manes
Copy link

Yes, it is 1 but there is no algorithmic justification. It was to reduces this type of question. It comes up either because someone has an assumption in their unit test or is manually exploring the library. A tiny minimum capacity doesn't impact the hit rates, so it seemed safe if avoiding some confusion. Once you shard and allow entries to take varying amounts of capacity then this trick doesn't work, but that is not as commonly used.

I tried to avoid sharding the policy in Caffeine as that can result hot spots due to the skewed access distribution. That's why a sharded lru does not perform very well because while the data is uniformly distributed, the requests are not and contend on a hot segment's lock. I think you tried to mitigate that though so probably not a problem. It was nice since it also means that the number of shards does not dictate the maximum size of an entry.

fwiw, I use the term weight instead of cost, as "cost" in the literature usually means that the policy takes into account the miss penalty. This is sometimes referred to as a latency-aware caching policy. I have some ideas for how to incorporate that into the admission filter but unfortunately there are no public real world traces to validate an approach.

The literature refers to weight as the entry's size (size-aware, e.g. GDSF), but I thought that was confusing terminology. One has the cache's capacity (maximum size), the hash table's capacity and current size, and the entry size. Thus I decided to use the term weight since it seemed clear and gave a nice name to the function (weigher).

@Yiling-J
Copy link
Owner

Thanks @ben-manes , just take a look GDSF and I agree weight is better. But now that Ristretto is famous in Go community, I think I will just use the same term. Also this remind me I should expose the counter configuration, because sketch is based on entry count

@ben-manes
Copy link

For a weighted cache I adjust the sketch's size at runtime. It hasn't been a problem since the query is cheap and usually no-ops. It also protects against someone trying to make an unbounded cache from over allocating, e.g. if embedded and all the user has is a config value to set. It does cause a little skew in simulations so initialCapacity will pre-allocate the sketch for non-weighted traces.

btw, I rewrote the sketch to be optimize for the cpu's cache line. It wasn't necessary and was just for fun.

@Yiling-J
Copy link
Owner

Yiling-J commented May 3, 2023

hi @ben-manes I'm optimizing the sketch and have a question. The block mask is (table.length >>> 3) - 1 and you choose a block based on blockHash: block = (blockHash & blockMask) << 3. That means if table size is 100, then blocks would be 0, 8, 16, 24..., would it be better if blocks can overlap? In this case block mask would be table.length -8 and block = blockHash & blockMask.
My sketch originally use the exactly 4 rows sketch(choose 1 counter in each row), after switching to caffeine optimization, hit ratio of s3 drops 1.2%, I also tried my overlap optimization, hit ratio drops is 0.9%, slightly better.

@ben-manes
Copy link

oh that is unfortunate. I don't recall observing any degradations in my tests, but I'll see if I can reproduce that. You also have the doorkeeper, was that also adjusted likewise? I'd suspect a worse result would occur if the error rate was increased in this new design, e.g. due to bias, causing the sketch to not be fully utilized. This wasn't a needed change for performance, it was a fun micro optimization that I hopefully didn't screw up.

Ideally the blocks are being selected uniformly so with a good hash function and enough counters it wouldn't make a difference. However since the goal of hashing is to make a repeatable randomized output, changing the collisions will cause some gains and losses across traces. Typically that is noise of less than 1%, which is too small to claim any approach is better or worse than another. Instead you have to look across the trends of different workloads to make a determination. I'd imagine that the overlapping blocks would just be noise as a net difference, so not bad if otherwise beneficial but probably not a bonus in itself.

@Yiling-J
Copy link
Owner

Yiling-J commented May 3, 2023

@ben-manes I don't enable doorkeeper in my hit ratio benchmarks. I think the old design in caffeine is "counters are uniformly distributed", which means picking 4 counters randomly in table, not one counter per row? Another possibility is I don't spread/rehash before calculating the block index, I will try that.

@ben-manes
Copy link

I have an old spreadsheet of data when I was originally investigating eviction policies. That can serve as a baseline to ensure nothing seems amiss. I took W-TinyLFU (1%) which is non-adaptive.

Policy 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000
Caffeine_new 11.85 22.98 33.01 42.51 50.91 58.71 64.82 70.18
Caffeine_old 11.72 23.11 33.19 43.00 51.46 59.10 65.39 70.67
W-TinyLfu 12.14 23.26 33.51 42.79 51.12 58.92 65.05 70.22

From here the numbers are around 0.5%, slightly worse on average. I think that is close enough to be noise so I didn't worry about it. I would say there is not a strong argument in favor or against the block-based filter, so I decided to keep it for my own enjoyment and to share a neat optimization trick.

@ben-manes
Copy link

btw, I used the 3-round hash function from hash-prospector. I should probably add that as a doc reference. I thought using the first two rounds for the block and adding the third for the counter would be a reasonable tradeoff of cpu time and randomness.

@Yiling-J
Copy link
Owner

Yiling-J commented May 3, 2023

@ben-manes thanks, I will take a look

@Yiling-J Yiling-J added the question Further information is requested label May 7, 2023
@ben-manes
Copy link

I was skimming over the code thanks to @vmg work on bringing it into Vitess. 🔥

You might want to verify whether your implementation protects against hash flooding attacks, as I didn't see anything explicit in the eviction logic. The adaptive sizing might kind of do this, as it was a problem we mitigated in Caffeine before that feature. The idea is to add a tiny amount of jitter (code and tests). The intent is to avoid an attacker leveraging deterministic ordering of the lru lists, as described below (copied from internal docs), which Ristretto was not impacted by since they used a sampled policy instead.

The historic usage is retained in a compact popularity sketch, which uses hashing to probabilistically estimate an item's frequency. This exposes a flaw where an adversary could use hash flooding [4] to artificially raise the frequency of the main space's victim and cause all candidates to be rejected. In the worst case, by exploiting hash collisions, an attacker could cause the cache to never hit and hold only worthless items, resulting in a denial-of-service attack against the underlying resource. This is mitigated by introducing jitter, allowing candidates that are at least moderately popular to have a small, random chance of being admitted. This causes the victim to be evicted, but in a way that marginally impacts the hit rate.

[4] Denial of Service via Algorithmic Complexity Attack
https://www.usenix.org/legacy/events/sec03/tech/full_papers/crosby/crosby.pdf

@Yiling-J
Copy link
Owner

@ben-manes @vmg Excited! I will take a look the hash flooding attacks problem and the vitess PR this week.

@vmg
Copy link

vmg commented Sep 12, 2023

Damn @ben-manes you really are on top of things! I just marked the PR ready for review last night and you're already looking into it. Thank you so much for your dedication and for the suggestion to use Theine in Vitess. It's been working great so far!

@Yiling-J: Thanks too for Theine! So far the hit rate has been looking good in our synthetic benchmarks, although I had to perform quite a few changes to support the functionality that our plan cache depends on. You can see the details on the PR. Please feel free to chime in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants
@vmg @ben-manes @Yiling-J @ghz-max and others