You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Anything else is not a supported DeltaLake column
thrownewTrinoException(
GENERIC_INTERNAL_ERROR,
format("Unable to parse value [%s] from column %s with type %s", valueString, column.baseColumnName(), column.baseType()));
Which gives this stack trace in practice:
at java.base/java.lang.Thread.run(Thread.java:1570)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at io.trino.$gen.Trino_435_3370_ga3323af____20241216_070309_2.run(Unknown Source)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:545)
at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$12(MoreFutures.java:570)
at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$2(LocalDispatchQuery.java:134)
at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:150)
at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:272)
at io.trino.execution.SqlQueryExecution.start(SqlQueryExecution.java:416)
at io.trino.execution.SqlQueryExecution.planQuery(SqlQueryExecution.java:478)
at io.trino.execution.SqlQueryExecution.doPlanQuery(SqlQueryExecution.java:498)
at io.trino.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:233)
at io.trino.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:238)
at io.trino.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:266)
at io.trino.sql.planner.LogicalPlanner.runOptimizer(LogicalPlanner.java:303)
at io.trino.sql.planner.optimizations.StatsRecordingPlanOptimizer.optimize(StatsRecordingPlanOptimizer.java:41)
at io.trino.sql.planner.optimizations.DeterminePartitionCount.optimize(DeterminePartitionCount.java:120)
at io.trino.sql.planner.optimizations.DeterminePartitionCount.determinePartitionCount(DeterminePartitionCount.java:174)
at io.trino.sql.planner.optimizations.DeterminePartitionCount.getPartitionCountBasedOnRows(DeterminePartitionCount.java:227)
at io.trino.sql.planner.optimizations.DeterminePartitionCount.getSourceNodesOutputStats(DeterminePartitionCount.java:299)
at java.base/java.util.stream.DoublePipeline.sum(DoublePipeline.java:450)
at java.base/java.util.stream.DoublePipeline.collect(DoublePipeline.java:541)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:5082)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:5074)
at java.base/java.util.stream.ReferencePipeline$6$1.accept(ReferencePipeline.java:263)
at io.trino.sql.planner.optimizations.DeterminePartitionCount.lambda$getPartitionCountBasedOnRows$3(DeterminePartitionCount.java:227)
at io.trino.cost.CachingStatsProvider.getStats(CachingStatsProvider.java:87)
at io.trino.cost.ComposableStatsCalculator.calculateStats(ComposableStatsCalculator.java:71)
at io.trino.cost.ComposableStatsCalculator.calculateStats(ComposableStatsCalculator.java:83)
at io.trino.cost.SimpleStatsRule.calculate(SimpleStatsRule.java:37)
at io.trino.cost.TableScanStatsRule.doCalculate(TableScanStatsRule.java:36)
at io.trino.cost.TableScanStatsRule.doCalculate(TableScanStatsRule.java:60)
at io.trino.cost.CachingTableStatsProvider.getTableStatistics(CachingTableStatsProvider.java:46)
at io.trino.tracing.TracingMetadata.getTableStatistics(TracingMetadata.java:311)
at io.trino.metadata.MetadataManager.getTableStatistics(MetadataManager.java:477)
at io.trino.tracing.TracingConnectorMetadata.getTableStatistics(TracingConnectorMetadata.java:331)
at com.dune.trino.metastore.AbstractDelegatingConnectorMetadata.getTableStatistics(AbstractDelegatingConnectorMetadata.kt:262)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getTableStatistics(ClassLoaderSafeConnectorMetadata.java:354)
at io.trino.plugin.deltalake.DeltaLakeMetadata.getTableStatistics(DeltaLakeMetadata.java:899)
at io.trino.plugin.deltalake.statistics.FileBasedTableStatisticsProvider.getTableStatistics(FileBasedTableStatisticsProvider.java:174)
at io.trino.plugin.deltalake.transactionlog.statistics.DeltaLakeJsonFileStatistics.getMinColumnValue(DeltaLakeJsonFileStatistics.java:136)
at java.base/java.util.Optional.flatMap(Optional.java:289)
at io.trino.plugin.deltalake.transactionlog.statistics.DeltaLakeJsonFileStatistics.lambda$getMinColumnValue$4(DeltaLakeJsonFileStatistics.java:136)
at io.trino.plugin.deltalake.transactionlog.statistics.DeltaLakeJsonFileStatistics.deserializeStatisticsValue(DeltaLakeJsonFileStatistics.java:144)
at io.trino.plugin.deltalake.transactionlog.TransactionLogParser.deserializeColumnValue(TransactionLogParser.java:246)
io.trino.spi.TrinoException: Unable to parse value [\x00\x03\x01\x0f\x932\xd3\xf5\xe8\x11\x9cY\xd0\xd6\x92v\xa7c,\xd8] from column hash with type varbinary
2024-12-16T14:54:24.199Z ERROR Query-20241216_145422_23066_5cgep-43942 io.trino.cost.CachingStatsProvider Error occurred when computing stats for query 20241216_145422_23066_5cgep
When this happens, all stats for the query are ignored:
log.error(e, "Error occurred when computing stats for query %s", session.getQueryId());
returnPlanNodeStatsEstimate.unknown();
}
throwe;
}
Which makes it impossible for Trino to optimise based on stats at all, even if stats for some other columns are available.
We've noticed that some implementations (namely delta_rs) write stats for more types than Trino supports, for example in the stacktrace above a stat for a VARBINARY column.
Until Trino adds official support for using these stats, let's consider stats to be unavailable for these specific columns rather than blow up and not consider stats at all?
The text was updated successfully, but these errors were encountered:
Delta Lake entries contain column statistics that can be used by the Trino optimiser to speed up queries.
Only a subset of data types are supported in stats:
trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogParser.java
Lines 193 to 236 in e369f66
When stats do not match one of these columns, we throw an exception:
trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogParser.java
Lines 243 to 246 in e369f66
Which gives this stack trace in practice:
When this happens, all stats for the query are ignored:
trino/core/trino-main/src/main/java/io/trino/cost/CachingStatsProvider.java
Lines 91 to 97 in e369f66
Which makes it impossible for Trino to optimise based on stats at all, even if stats for some other columns are available.
We've noticed that some implementations (namely delta_rs) write stats for more types than Trino supports, for example in the stacktrace above a stat for a VARBINARY column.
Until Trino adds official support for using these stats, let's consider stats to be unavailable for these specific columns rather than blow up and not consider stats at all?
The text was updated successfully, but these errors were encountered: