You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TestMetadataTableReadableMetrics currently hardcodes in the expected size into the metrics rows rather than actually checking the sizes from the underlying data. This means every time the Parquet version changes (or compression or what not) the test needs to be updated. See
Ideally we would change this so the expected values we check are only those which are not dependent on parquet version or change the test to check against the actual values.
See #11462 for an instance where this is complicating things
Query engine
None
Willingness to contribute
I can contribute this improvement/feature independently
I would be willing to contribute this improvement/feature with guidance from the Iceberg community
I cannot contribute this improvement/feature at this time
The text was updated successfully, but these errors were encountered:
can we use the sql select column_sizes from table.files to get the right size?
I would prefer @RussellSpitzer's suggestion to directly check the parquet file sizes. Otherwise we might end up using the same abstraction to get the expected data and the test data.
can we use the sql select column_sizes from table.files to get the right size?
I would prefer @RussellSpitzer's suggestion to directly check the parquet file sizes. Otherwise we might end up using the same abstraction to get the expected data and the test data.
Maybe your are right, i have some question about this link https://iceberg.apache.org/docs/1.6.0/spark-queries/?h=readable_metrics#files. At the tag Inspecting tables -- Files, the SQL SELECT * FROM prod.db.table.files; , the result show the parquet file maybe contains multiple column, if we get the file size, how do we know the column level size?
Feature Request / Improvement
TestMetadataTableReadableMetrics currently hardcodes in the expected size into the metrics rows rather than actually checking the sizes from the underlying data. This means every time the Parquet version changes (or compression or what not) the test needs to be updated. See
b8c2b20
Ideally we would change this so the expected values we check are only those which are not dependent on parquet version or change the test to check against the actual values.
See #11462 for an instance where this is complicating things
Query engine
None
Willingness to contribute
The text was updated successfully, but these errors were encountered: