Add support to read parquet file metadata through deephaven #6126

malhotrashivam · 2024-09-25T15:04:23Z

This will help with remotely debugging and understanding the parquet file structure.
We can follow the similar API spec as duck_db: https://duckdb.org/docs/data/parquet/overview

read_parquet
parquet_file_metadata
parquet_kv_metadata
parquet_schema

malhotrashivam · 2024-09-25T15:05:44Z

One approach that @rcaudy suggested in the meanwhile:

If you have a raw source table in groovy, you should be able to:

.initialize() it
Get its columnSourceManager field.
Get the Table result of the CSM’s locationTable()
Get the K-V metadata for each file by applying an update("KV = ((io.deephaven.parquet.table.location.ParquetTableLocation) _TableLocation).getParquetKey().getMetadata().getFileMetaData().getKeyValueMetaData()")

devinrsmith · 2024-09-25T19:47:04Z

It may be useful to write a little standalone utility to print out the FileMetaData as JSON; I've found this little script helpful:

        try (final TMemoryBuffer buffer = new TMemoryBuffer(128)) {
            fileMetaData.write(new TSimpleJSONProtocol(buffer));
            buffer.flush();
            System.out.println(buffer.toString(StandardCharsets.UTF_8));
        } catch (TException e) {
            // ignore
        }

malhotrashivam added feature request New feature or request parquet Related to the Parquet integration s3 labels Sep 25, 2024

malhotrashivam added this to the Backlog milestone Sep 25, 2024

malhotrashivam self-assigned this Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to read parquet file metadata through deephaven #6126

Add support to read parquet file metadata through deephaven #6126

malhotrashivam commented Sep 25, 2024

malhotrashivam commented Sep 25, 2024

devinrsmith commented Sep 25, 2024

Add support to read parquet file metadata through deephaven #6126

Add support to read parquet file metadata through deephaven #6126

Comments

malhotrashivam commented Sep 25, 2024

malhotrashivam commented Sep 25, 2024

devinrsmith commented Sep 25, 2024