Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListOffsets returns earliest offset locally, not taking remote into account #14

Open
ivanyu opened this issue Apr 13, 2023 · 5 comments
Assignees
Labels
bug Stale tiered storage Issues related to Tiered Storage

Comments

@ivanyu
Copy link
Member

ivanyu commented Apr 13, 2023

Assuming the beginning of the log is on remote

kcat -b localhost:9092 -C -t topic1 -o beginning -e -f "%o\n" | head

fetches only the locally available records, but

kcat -b localhost:9092 -C -t topic1 -o 0 -e -f "%o\n" | head

fetches records from the remote storage.

This Python

tp = TopicPartition("topic1", 0)
kc.assign([tp])
kc.seek_to_beginning(tp)
r = kc.poll(timeout_ms=1)

fetches from the remote.

The issue is that when we request the earliest offset from the broker (ListOffsets request type / beginning_offsets in Python)), it returns the earliest available locally, i.e. not counting remote.

@ivanyu ivanyu added the tiered storage Issues related to Tiered Storage label Apr 13, 2023
@mdedetrich mdedetrich assigned ivanyu and mdedetrich and unassigned ivanyu May 22, 2023
@mdedetrich
Copy link

So I have created an implementation locally, wasn't too hard but I need to verify whether the returning offsets need to be ordered in some way.

@mdedetrich
Copy link

Here is the patch so far, going to confirm how ordering is meant to work.

Make_ListOffsets_work_with_Tiered_Storage.patch

@HenryCaiHaiying
Copy link

Are we moving forward with the patching? It seems it might be important for some consumers.

@ivanyu
Copy link
Member Author

ivanyu commented Jun 9, 2023

@mdedetrich did the patch work for you? It doesn't for me, the kcat test.

At least log.logStartOffset as a parameter to findOffsetByTimestamp will not produce anything, because log.logStartOffset is always (currently) higher than anything in the remote storage.

Also, based on how oldStyleOffsets is used later, it should always be 0 or 1 element, so I guess ++ won't work.

@mdedetrich
Copy link

Indeed this is correct, I was trying to write a test to validate this and I noticed the same. I was attempting to upstream this first and the core implementation of ListOffsets has changed compared to whats in 3.3.

@ivanyu ivanyu added the bug label Jun 29, 2023
@github-actions github-actions bot added the Stale label Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Stale tiered storage Issues related to Tiered Storage
Projects
None yet
Development

No branches or pull requests

3 participants