Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes #11465

RussellSpitzer · 2024-11-04T19:22:50Z

Feature Request / Improvement

TestMetadataTableReadableMetrics currently hardcodes in the expected size into the metrics rows rather than actually checking the sizes from the underlying data. This means every time the Parquet version changes (or compression or what not) the test needs to be updated. See

b8c2b20

Ideally we would change this so the expected values we check are only those which are not dependent on parquet version or change the test to check against the actual values.

See #11462 for an instance where this is complicating things

Query engine

None

Willingness to contribute

I can contribute this improvement/feature independently
I would be willing to contribute this improvement/feature with guidance from the Iceberg community
I cannot contribute this improvement/feature at this time

The text was updated successfully, but these errors were encountered:

davidyuan1223 · 2024-11-05T09:42:16Z

can we use the sql select column_sizes from table.files to get the reight size?

pvary · 2024-11-05T10:20:55Z

can we use the sql select column_sizes from table.files to get the right size?

I would prefer @RussellSpitzer's suggestion to directly check the parquet file sizes. Otherwise we might end up using the same abstraction to get the expected data and the test data.

davidyuan1223 · 2024-11-05T11:11:00Z

can we use the sql select column_sizes from table.files to get the right size?

I would prefer @RussellSpitzer's suggestion to directly check the parquet file sizes. Otherwise we might end up using the same abstraction to get the expected data and the test data.

Maybe your are right, i have some question about this link https://iceberg.apache.org/docs/1.6.0/spark-queries/?h=readable_metrics#files. At the tag Inspecting tables -- Files, the SQL SELECT * FROM prod.db.table.files; , the result show the parquet file maybe contains multiple column, if we get the file size, how do we know the column level size?

RussellSpitzer added improvement PR that improves existing functionality good first issue Good for newcomers labels Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes #11465

Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes #11465

RussellSpitzer commented Nov 4, 2024

davidyuan1223 commented Nov 5, 2024

pvary commented Nov 5, 2024

davidyuan1223 commented Nov 5, 2024

Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes #11465

Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes #11465

Comments

RussellSpitzer commented Nov 4, 2024

Feature Request / Improvement

Query engine

Willingness to contribute

davidyuan1223 commented Nov 5, 2024

pvary commented Nov 5, 2024

davidyuan1223 commented Nov 5, 2024