Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: integrate iceberg initial load support via jvm grpc #1792

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

iamKunalGupta
Copy link
Member

@iamKunalGupta iamKunalGupta commented Jun 5, 2024

  • add quarkus with grpc for jvm
  • expose required routes for appends, changes and setup tables

Currently Gradle is used for build and dev and additionally does the following (via Quarkus Tasks):

  • Generates the corresponding java files for proto
  • Generates the GrpcService Interfaces for extending and injecting

TODOS:

  • Iceberg partitioning support
  • UI Changes
  • Table namespace support planned for later
  • More support
    • Catalogs
      • JDBC
      • Hive
    • FileIO
      • S3
      • GCS - supported via S3 + HMAC based access

- add quarkus with grpc for jvm
- expose required routes for appends, changes and setup tables
- Introduce significant enhancements to Iceberg support, including the handling of Iceberg tables and improved JVM compatibility.
- Refactor codebase to improve readability and maintainability, including the extraction of common functionality into separate methods and the removal of redundant code.
- Implement new methods for handling Iceberg tables, including table creation, record appending, and table dropping.
- Introduce a new `RecordWriterFactory` to manage record writing tasks.
- Update `build.gradle` and `application.yaml` files to reflect changes in the project setup.
- Remove unused test files and update the logging system for better debugging.
- Update Go files in the `flow/connectors/iceberg` directory to improve Iceberg handling.
- Add new protobuf messages in `protos/peers.proto` for Google Cloud Storage (GCS) configuration.
- Update `protos/flow-jvm.proto` to include an idempotency key in the `AppendRecordsRequest` message.
@iamKunalGupta iamKunalGupta force-pushed the feat/iceberg-support branch 2 times, most recently from 079d627 to e008421 Compare June 11, 2024 04:41
@iamKunalGupta iamKunalGupta marked this pull request as ready for review June 11, 2024 05:12
@iamKunalGupta iamKunalGupta requested review from serprex, iskakaushik and heavycrystal and removed request for serprex June 11, 2024 05:27
@serprex
Copy link
Contributor

serprex commented Jun 11, 2024

PR description should outline why gradle files need to be brought in

optional int32 client_pool_size = 4;
optional bool cache_enabled = 5;
// This helps in testing, where we can pass additional properties to the catalog
// map<string, string> additional_properties = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be kept in final code?

@iamKunalGupta iamKunalGupta force-pushed the feat/iceberg-support branch 2 times, most recently from 10f7d7f to 9b93886 Compare June 13, 2024 00:43
- Modify the condition in IcebergService to correctly check if a namespace exists before creating it.
…y footprint (#1835)

This uses streams instead of sending all the records in 1 go.
@iamKunalGupta iamKunalGupta changed the title [WIP] feat: integrate iceberg support via jvm grpc [WIP] feat: integrate iceberg intial load support via jvm grpc Jun 17, 2024
@iamKunalGupta iamKunalGupta changed the title [WIP] feat: integrate iceberg intial load support via jvm grpc [WIP] feat: integrate iceberg initial load support via jvm grpc Jun 17, 2024
flow-jvm/README.md Show resolved Hide resolved
@@ -0,0 +1,97 @@
####
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of the Dockerfiles in this directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are generated by quarkus, have used them further for reference to create our own under stacks.

protos/peers.proto Show resolved Hide resolved
Amogh-Bharadwaj and others added 2 commits June 18, 2024 09:19
![Screenshot 2024-06-15 at 1 28
50 AM](https://github.com/PeerDB-io/peerdb/assets/65964360/630b0ac8-7cdc-413d-a436-f8340bad3a45)


TODO:
- [x] Add required asterisk
- [ ] Make validate and create work
- [x] Test and fix the wiring of all fields to Flow API
- [x] Add form validation and error toast
- [x] Add tips and helpful links

---------

Co-authored-by: Kunal Gupta <[email protected]>
@iamKunalGupta iamKunalGupta changed the title [WIP] feat: integrate iceberg initial load support via jvm grpc feat: integrate iceberg initial load support via jvm grpc Jun 18, 2024
- Add `AppendAlreadyDoneException` to handle specific scenario when append operation is already done
- Enhance record appending process in `IcebergService` by introducing better error handling and more precise control flow
@iamKunalGupta iamKunalGupta linked an issue Jun 24, 2024 that may be closed by this pull request
7 tasks
@iamKunalGupta iamKunalGupta enabled auto-merge (squash) June 25, 2024 00:11
req *protos.GetTableSchemaBatchInput,
) (*protos.GetTableSchemaBatchOutput, error) {
// TODO implement me
panic("implement me")
Copy link
Contributor

@serprex serprex Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO, or remove implementing this interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iceberg Intial Load
4 participants