Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: C++ code redesign for TypeInfo #33

Open
maozguttman opened this issue Jul 13, 2021 · 1 comment
Open

enhancement: C++ code redesign for TypeInfo #33

maozguttman opened this issue Jul 13, 2021 · 1 comment

Comments

@maozguttman
Copy link

Hi,

I'm using pguint package (can be found in github) for supporting unsigned integers in postgres. pguint package adds following postgres data types:

  1. int1 (8 bits signed integer)
  2. uint1 (8 bits unsigned integer)
  3. uint2 (16 bits unsigned integer)
  4. uint4 (32 bits unsigned integer)
  5. uint8 (64 bits unsigned integer)

I modified parquet_fdw code to support these new datatypes.

My proposal is to make TypeInfo to behave more as a C++ class than a C structure. It eliminates the C switches on parquet datatypes (maybe will work a little bit faster) and makes it easier to add support for new datatypes.
TypeInfo will be a base class. There will be derived classes from it for each parquet data type, for example: BoolTypeInfo, Int8TypeInfo, DoubleTypeInfo, TimestampTypeInfo, etc. TypeInfo base class will have (virtual) interface methods that are implemented in the derived classes.

Example on read_primitive_type function:
Add following interface:

virtual Datum TypeInfo::read_primitive_type(arrow::Array *array, int64_t i) const = 0;

Implement it in derived classes:

virtual Datum BoolTypeInfo::read_primitive_type(arrow::Array *array, int64_t i) const
{
  arrow::BooleanArray *boolarray = (arrow::BooleanArray *) array;
  Datum res = BoolGetDatum(boolarray->Value(i));
  return res;
}

And modify ParquetReader::read_primitive_type accordingly:

Datum ParquetReader::read_primitive_type(arrow::Array *array, const TypeInfo *typinfo, int64_t i)
{
  if (typinfo  == nullptr)
  {
    throw Error("parquet_fdw: unsupported column type");
  }
  Datum res = typinfo->read_primitive_type(array, i);
  ...
}

A "special" code that does not seem to fit above proposal is row_group_matches_filter function where it calls bytes_to_postgres_type function since there is no TypeInfo involved there (maybe it can be changed).

Thanks,
Maoz

@zilder
Copy link
Contributor

zilder commented Jul 14, 2021

Hi Maoz,

thanks for the suggestion. TBH i doubt this will work faster. This would require using virtual functions which implies extra memory access (to vtable). And one extra function call. Which may be significant for such trivial work, that read_primitive_type does.
Anyway this requires benchmarking. Unfortunately i currently don't have much time as there are higher priority tasks. Maybe later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants