feat(java): support overwrite for spark connector #3313

SaintBacchus · 2024-12-30T03:23:05Z

support overwrite for lance spark connector

df.write
  .format("lance")
  .option("path", "s3://lance/demo.lance")
  .mode("overwrite")
  .save()

eddyxu · 2024-12-31T04:07:47Z

java/spark/src/main/java/com/lancedb/lance/spark/internal/LanceFragmentScanner.java

@@ -54,7 +54,11 @@ public static LanceFragmentScanner create(
      LanceConfig config = inputPartition.getConfig();
      ReadOptions options = SparkOptions.genReadOptionFromConfig(config);
      dataset = Dataset.open(allocator, config.getDatasetUri(), options);
-      fragment = dataset.getFragments().get(fragmentId);
+      fragment =
+          dataset.getFragments().stream()


This is a O(n) operation? is it sensitive to the performance here?

If we want to use an O(1) operation to get the fragment, it has to build the dataset.getFragments() as a hash table and store it in the LanceInputPartition.

The LanceInputPartition will be serialized in spark and it will cause a lot of memory for a big lance dataset. So I think maybe the O(n) filter is a suitable way here.

feat(java): support overwrite for spark connector

ea4693f

github-actions bot added enhancement New feature or request java labels Dec 30, 2024

SaintBacchus mentioned this pull request Dec 30, 2024

Improve spark data source for lance #3260

Open

16 tasks

SaintBacchus added 2 commits December 30, 2024 11:32

format rust

c37bfe9

Merge branch 'main' into SupportOverwriteForSpark

c6dfb51

eddyxu reviewed Dec 31, 2024

View reviewed changes

SaintBacchus added 2 commits December 31, 2024 15:00

Merge branch 'main' into SupportOverwriteForSpark

1986ee3

Merge branch 'main' into SupportOverwriteForSpark

ba08560

eddyxu approved these changes Jan 1, 2025

View reviewed changes

eddyxu merged commit 33c45c8 into lancedb:main Jan 1, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(java): support overwrite for spark connector #3313

feat(java): support overwrite for spark connector #3313

SaintBacchus commented Dec 30, 2024

eddyxu Dec 31, 2024

SaintBacchus Dec 31, 2024

feat(java): support overwrite for spark connector #3313

feat(java): support overwrite for spark connector #3313

Conversation

SaintBacchus commented Dec 30, 2024

eddyxu Dec 31, 2024

Choose a reason for hiding this comment

SaintBacchus Dec 31, 2024

Choose a reason for hiding this comment