You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have included information about relevant versions
I have verified that the issue persists when using the master branch of Faust.
Steps to reproduce
if a topic partition is not empty: low watermark = minimum/earliest available offset, high watermark = maximum/latest available offset +1
if a topic partition is empty low watermark = high watermark
A changelog topic can become empty as a result of a Kafka cleanup policy (i.e., time/sized-based retention)
The case when the topic is empty is not handled properly in Faust recovery.
The recovery service needs to replay messages between low watermark (earliest offset) to high watermark - 1 (latest offset). Faust does this for the active and the standby partitions. Afterwards, it runs some consistency checks.
# FIXME the -1 here is because of the way we commit offsets
tp: value-1ifvalueisnotNoneelse-1
fortp, valueinhighwaters.items()
}
If the partition is empty high - 1 does not exist, and the recovery will fail. There is even a FIXME in building the highwaters. In my opinion, it would be better to also get the low watermarks and -1 if high is None or low == high else high - 1
# Offsets may have been compacted, need to get to the recent ones
earliest=awaitconsumer.earliest_offsets(*tps)
# FIXME To be consistent with the offset -1 logic
earliest= {
tp: offset-1ifoffsetisnotNoneelseNone
fortp, offsetinearliest.items()
}
In my opinion, this could be None if offset is None else min(offset, highwaters.get(offset, offset))
Standby partitions
Moreover, recovering standby partitions has a separate issue in the consistency checks. First, let's see what is the sequence of steps for active partitions such that we can draw a parallel.
Active:
Find latest/max offsets, min/earliest offsets, run consistency checks and seek to offset
Checklist
master
branch of Faust.Steps to reproduce
A changelog topic can become empty as a result of a Kafka cleanup policy (i.e., time/sized-based retention)
The case when the topic is empty is not handled properly in Faust recovery.
The recovery service needs to replay messages between low watermark (earliest offset) to high watermark - 1 (latest offset). Faust does this for the active and the standby partitions. Afterwards, it runs some consistency checks.
Active partitions
Let's start with the active partitions:
faust/faust/tables/recovery.py
Lines 387 to 393 in 6588a97
faust/faust/tables/recovery.py
Lines 655 to 664 in 6588a97
If the partition is empty
high - 1
does not exist, and the recovery will fail. There is even aFIXME
in building the highwaters. In my opinion, it would be better to also get the low watermarks and-1 if high is None or low == high else high - 1
faust/faust/tables/recovery.py
Lines 396 to 401 in 6588a97
low - 1
(offsetlow
after the +1 adjustment) would not existfaust/faust/tables/recovery.py
Lines 703 to 713 in 6588a97
In my opinion, this could be
None if offset is None else min(offset, highwaters.get(offset, offset))
Standby partitions
Moreover, recovering standby partitions has a separate issue in the consistency checks. First, let's see what is the sequence of steps for active partitions such that we can draw a parallel.
Active:
faust/faust/tables/recovery.py
Lines 387 to 404 in 6588a97
faust/faust/tables/recovery.py
Lines 425 to 430 in 6588a97
Standby:
faust/faust/tables/recovery.py
Lines 417 to 423 in 6588a97
faust/faust/tables/recovery.py
Lines 492 to 526 in 6588a97
The problem is that after seeking the offsets may be updated asynchronously so by the time the consistency checks run they may no longer hold.
The text was updated successfully, but these errors were encountered: