Pagination for QueryTableChanges #354

charlenelyu-db · 2023-07-24T22:22:42Z

Part 1: Pagination for QueryTable at snapshot
- Part 2: Pagination for QueryTable from starting_version
  - Part 3: Pagination for QueryTableChanges
    - Part 4: Client change

Support pagination for QueryTableChanges in reference server.

This is part 3 for issue #351.

server/src/main/scala/io/delta/standalone/internal/DeltaSharedTableLoader.scala

chakankardb · 2023-07-28T03:06:11Z

server/src/main/scala/io/delta/standalone/internal/DeltaSharedTableLoader.scala

+            // Return early if we already have enough files in the current page
+            if (pageSizeOpt.contains(numSignedFiles)) {
+              actions.append(tokenGenerator(v, idx))
+              return actions.toSeq


Question: It seems first we will generate all the changes, and then apply pagination? So pagination will not save any processing on the server side right? Is the same true for DS server in UC as well? @linzhou-db

We're saving some progress: the start will be equal to the startingVersion in the page token, so we won't unpack delta logs for versions that are already processed in previous pages.

However, we still generate all changes until the endingVersion each time. I guess this part can be optimized in UC. Two options I can think of:

we could push down actionListener to getChanges(), so we read logs and process actions together and it'll return in the middle

we don't change the interface of getChanges(), we call it to unpack one version at a time and process version by version.

WDYT?

I think both options should work; the first one is similar to what we do for normal query, so may be preferable for consistency reason.
In the future, we will likely process json files in parallel as well (similar to checkpoint parquet).

Let's open a ticket for now to address this in UC.

let's have a separate discussion about how to handle this in UC.

chakankardb

Perhaps wait for Lin's review as well.

This was referenced Jul 24, 2023

Pagination for QueryTable at snapshot #352

Merged

Pagination for QueryTable from starting_version #353

Merged

charlenelyu-db requested review from linzhou-db and chakankardb and removed request for linzhou-db July 24, 2023 22:29

charlenelyu-db force-pushed the SC-135923-3 branch from 31c8e18 to 388be39 Compare July 25, 2023 21:54

charlenelyu-db mentioned this pull request Jul 26, 2023

Query table pagination in Delta Sharing client #356

Merged

charlenelyu-db force-pushed the SC-135923-3 branch from 388be39 to ecd6bf4 Compare July 28, 2023 01:31

chakankardb reviewed Jul 28, 2023

View reviewed changes

pagination for queryTableChanges

c7271be

charlenelyu-db force-pushed the SC-135923-3 branch from ecd6bf4 to c7271be Compare July 28, 2023 20:06

charlenelyu-db requested a review from chakankardb July 28, 2023 20:11

chakankardb approved these changes Jul 28, 2023

View reviewed changes

linzhou-db approved these changes Jul 31, 2023

View reviewed changes

charlenelyu-db merged commit 7a5523c into delta-io:main Jul 31, 2023
4 checks passed

This was referenced Jul 31, 2023

Fix failing python unit tests #360

Merged

Introduce EndStreamAction and return minUrlExpirationTimestamp for paginated request #362

Merged

Fix tests in DeltaSharingRestClientDeltaSuite #366

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pagination for QueryTableChanges #354

Pagination for QueryTableChanges #354

charlenelyu-db commented Jul 24, 2023 •

edited

Loading

chakankardb Jul 28, 2023

charlenelyu-db Jul 28, 2023 •

edited

Loading

chakankardb Jul 28, 2023

linzhou-db Jul 31, 2023

chakankardb left a comment

Pagination for QueryTableChanges #354

Pagination for QueryTableChanges #354

Conversation

charlenelyu-db commented Jul 24, 2023 • edited Loading

chakankardb Jul 28, 2023

Choose a reason for hiding this comment

charlenelyu-db Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

chakankardb Jul 28, 2023

Choose a reason for hiding this comment

linzhou-db Jul 31, 2023

Choose a reason for hiding this comment

chakankardb left a comment

Choose a reason for hiding this comment

charlenelyu-db commented Jul 24, 2023 •

edited

Loading

charlenelyu-db Jul 28, 2023 •

edited

Loading