Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Query Caching in Trino Using Hive Connector (fs.cache.enabled=true) #24459

Open
narayanbhawar10 opened this issue Dec 12, 2024 · 0 comments

Comments

@narayanbhawar10
Copy link

narayanbhawar10 commented Dec 12, 2024

I'm facing an issue with caching in Trino (version 460) when querying data from Hive with the following cache settings in

hive.properties:
fs.cache.enabled=true
fs.cache.directories=/data/trino0,/data/trino1
fs.cache.max-disk-usage-percentages=80,80
fs.cache.ttl=2d
fs.cache.preferred-hosts-count=2
fs.cache.page-size=15MB

4 workers nodes
1 coordinator node

These properties are configured to cache the results of Hive queries to reduce the number of "GETOBJECT" requests to S3 object storage. While the query results are indeed being cached, I'm noticing unexpected behavior.
When I run the same query multiple times, the cache is not being used fully. Specifically, the first query causes a "GET" request to S3, retrieving around 700 objects. On the second hit, it retrieves approximately 400 objects, and on the third hit, it retrieves about 200 objects. However, the query is still hitting S3 for some data instead of fetching it directly from the cache.
Has anyone encountered a similar issue? Why does Trino continue to go to S3 for some data even though caching is enabled and the query results are stored in the cache? Is there any misconfiguration or setting that I might have missed?

Despite caching, Trino continues to hit S3 for some data.

Could this be due to multiple nodes, partial cache misses, or something in the internal Alluxio native library? Has anyone encountered this, and is there a misconfiguration here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant