Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for surfacing cache status to caller #44

Open
mnutt opened this issue Feb 1, 2024 · 1 comment
Open

Support for surfacing cache status to caller #44

mnutt opened this issue Feb 1, 2024 · 1 comment

Comments

@mnutt
Copy link

mnutt commented Feb 1, 2024

I'm looking into using galaxycache to replace an existing API caching system. One of the things I was hoping to do was to log detailed caching info in my request logs. My origin API is particularly sensitive to duplicates so I field a lot of questions like "why did these two requests that came in so close together result in two cache misses" and it's useful to have request logs to debug.

I see that there is built-in tracing support, is there any way to extract a cache status from that? I'm imagining that from a Get() call, I'd like to know a) was the current node authoritative, and b) was the result a local_miss, peer_miss, peer_hit, maincache_hit, hotcache_hit etc. I think from looking through the source that we have most of this information through authoritative and hitLevel, the only one I don't see is how to distinguish between a peer hit or miss.

I'm not wedded to any implementation in particular, but was imagining maybe something like this?

type Metadata struct {
	Level              hitLevel
	LocalAuthoritative bool
	PeerErr            error
	LocalErr           error
}

func (g *Galaxy) GetWithMetadata(ctx context.Context, key string, dest Codec) (*Metadata, error) { }

I opened this issue to make sure that I wasn't missing an obvious way to handle this, and to see if adding it would be something that would be in line with the project's goals. Thanks!

@dfinkel
Copy link

dfinkel commented Feb 5, 2024

You didn't miss an obvious way to handle this.

Generally, we haven't exposed this data because tracing generally provides a better view and it's pretty rare that you'd want that much fidelity in request-logs (distributed tracing is quite useful). For our services, we've been removing fine-grained request logs because we rarely used them and they get expensive depending on the request-rate, and hosting setup.

With that said, I am open to extending the Galaxy type's interface a bit to provide a more extensible form of the Get call. It's probably about time to think about how generics should change the Galaxy type (or at least a wrapper). Providing an Info return value is definitely something that's worth exploring.

Currently, neither the HTTP nor the gRPC transports are setup to plumb back hit information, so figuring out whether it's a remote hit would require some plumbing. (possible, and possibly useful, but since both of them currently use galaxy.Get directly, they'd need a method like the one you're proposing in order to plumb it back. (which does make it doable, but may require an interface change)

It would definitely require an interface change (or at least an optional interface extension) to the RemoteFetcher interface, since there's no place for telemetry there:

Fetch(context context.Context, galaxy string, key string) ([]byte, error)

The other complication that may be worth considering is that single-flighting on both the authoritative host and current host may end up being pretty close to cache-hits if they join up close enough to the completion of a previous request for that key.

I'm going to have to think about the various use-cases and the return value of a new method a bit more before I propose adding another Get-ish method. (It would be good to open the door to a Peek method while we're at it -- particularly while we have to extend both transports anyway)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants