Skip to content

Commit

Permalink
Enable reading string view by default from Parquet
Browse files Browse the repository at this point in the history
  • Loading branch information
alamb committed Aug 22, 2024
1 parent ed2b222 commit bc5d7f7
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 5 deletions.
10 changes: 7 additions & 3 deletions datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -487,9 +487,13 @@ config_namespace! {
/// data frame.
pub maximum_buffered_record_batches_per_stream: usize, default = 2

/// (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`,
/// and `Binary/BinaryLarge` with `BinaryView`.
pub schema_force_string_view: bool, default = false
/// (reading) If true (the default), parquet reader will read text and
/// binary columns using Arrow byte view types. DataFusion has
/// specialized proceessing using the Arrow `Utf8View` type for columns
/// that could also be read as `Utf8/Utf8Large` and using the Arrow
/// `BinaryView` type for columns that could also be read as
/// `Binary/BinaryLarge`.
pub schema_force_string_view: bool, default = true
}
}

Expand Down
4 changes: 2 additions & 2 deletions datafusion/sqllogictest/test_files/information_schema.slt
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ datafusion.execution.parquet.metadata_size_hint NULL
datafusion.execution.parquet.pruning true
datafusion.execution.parquet.pushdown_filters false
datafusion.execution.parquet.reorder_filters false
datafusion.execution.parquet.schema_force_string_view false
datafusion.execution.parquet.schema_force_string_view true
datafusion.execution.parquet.skip_metadata true
datafusion.execution.parquet.statistics_enabled page
datafusion.execution.parquet.write_batch_size 1024
Expand Down Expand Up @@ -291,7 +291,7 @@ datafusion.execution.parquet.metadata_size_hint NULL (reading) If specified, the
datafusion.execution.parquet.pruning true (reading) If true, the parquet reader attempts to skip entire row groups based on the predicate in the query and the metadata (min/max values) stored in the parquet file
datafusion.execution.parquet.pushdown_filters false (reading) If true, filter expressions are be applied during the parquet decoding operation to reduce the number of rows decoded. This optimization is sometimes called "late materialization".
datafusion.execution.parquet.reorder_filters false (reading) If true, filter expressions evaluated during the parquet decoding operation will be reordered heuristically to minimize the cost of evaluation. If false, the filters are applied in the same order as written in the query
datafusion.execution.parquet.schema_force_string_view false (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`, and `Binary/BinaryLarge` with `BinaryView`.
datafusion.execution.parquet.schema_force_string_view true (reading) If true (the default), parquet reader will read text and binary columns using Arrow byte view types. DataFusion has specialized proceessing using the Arrow `Utf8View` type for columns that could also be read as `Utf8/Utf8Large` and using the Arrow `BinaryView` type for columns that could also be read as `Binary/BinaryLarge`.
datafusion.execution.parquet.skip_metadata true (reading) If true, the parquet reader skip the optional embedded metadata that may be in the file Schema. This setting can help avoid schema conflicts when querying multiple parquet files with schemas containing compatible types but different metadata
datafusion.execution.parquet.statistics_enabled page (writing) Sets if statistics are enabled for any column Valid values are: "none", "chunk", and "page" These values are not case sensitive. If NULL, uses default parquet writer setting
datafusion.execution.parquet.write_batch_size 1024 (writing) Sets write_batch_size in bytes
Expand Down

0 comments on commit bc5d7f7

Please sign in to comment.