You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These properties are configured to cache the results of Hive queries to reduce the number of "GETOBJECT" requests to S3 object storage. While the query results are indeed being cached, I'm noticing unexpected behavior.
When I run the same query multiple times, the cache is not being used fully. Specifically, the first query causes a "GET" request to S3, retrieving around 700 objects. On the second hit, it retrieves approximately 400 objects, and on the third hit, it retrieves about 200 objects. However, the query is still hitting S3 for some data instead of fetching it directly from the cache.
Has anyone encountered a similar issue? Why does Trino continue to go to S3 for some data even though caching is enabled and the query results are stored in the cache? Is there any misconfiguration or setting that I might have missed?
Despite caching, Trino continues to hit S3 for some data.
Could this be due to multiple nodes, partial cache misses, or something in the internal Alluxio native library? Has anyone encountered this, and is there a misconfiguration here?
The text was updated successfully, but these errors were encountered:
I'm facing an issue with caching in Trino (version 460) when querying data from Hive with the following cache settings in
hive.properties:
fs.cache.enabled=true
fs.cache.directories=/data/trino0,/data/trino1
fs.cache.max-disk-usage-percentages=80,80
fs.cache.ttl=2d
fs.cache.preferred-hosts-count=2
fs.cache.page-size=15MB
4 workers nodes
1 coordinator node
These properties are configured to cache the results of Hive queries to reduce the number of "GETOBJECT" requests to S3 object storage. While the query results are indeed being cached, I'm noticing unexpected behavior.
When I run the same query multiple times, the cache is not being used fully. Specifically, the first query causes a "GET" request to S3, retrieving around 700 objects. On the second hit, it retrieves approximately 400 objects, and on the third hit, it retrieves about 200 objects. However, the query is still hitting S3 for some data instead of fetching it directly from the cache.
Has anyone encountered a similar issue? Why does Trino continue to go to S3 for some data even though caching is enabled and the query results are stored in the cache? Is there any misconfiguration or setting that I might have missed?
Despite caching, Trino continues to hit S3 for some data.
Could this be due to multiple nodes, partial cache misses, or something in the internal Alluxio native library? Has anyone encountered this, and is there a misconfiguration here?
The text was updated successfully, but these errors were encountered: