You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dictionary encoding builds a dictionary of values encountered in a given column. The dictionary will be stored in a dictionary page per column chunk. The values are stored as integers using the RLE/Bit-Packing Hybrid encoding. If the dictionary grows too big, whether in size or number of distinct values, the encoding will fall back to the plain encoding. The dictionary page is written first, before the data pages of the column chunk.
Per the documentation:
This seems like a good fit for Parquet enum types, which are written as strings under the hood. ParquetWriter.Builder has a .withDictionaryEncoding(String columnPath, boolean enableDictionary) that we could invoke for all enum ParquetFields.
The text was updated successfully, but these errors were encountered: