Cannot recover when any changelog topic partition becomes empty (as a result of some retention policy) #597

cristianmatache · 2023-12-29T19:34:41Z

Checklist

I have included information about relevant versions
I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

if a topic partition is not empty: low watermark = minimum/earliest available offset, high watermark = maximum/latest available offset +1
if a topic partition is empty low watermark = high watermark

A changelog topic can become empty as a result of a Kafka cleanup policy (i.e., time/sized-based retention)
The case when the topic is empty is not handled properly in Faust recovery.

The recovery service needs to replay messages between low watermark (earliest offset) to high watermark - 1 (latest offset). Faust does this for the active and the standby partitions. Afterwards, it runs some consistency checks.

Active partitions

Let's start with the active partitions:

Building highwaters for active partitions

faust/faust/tables/recovery.py

Lines 387 to 393 in 6588a97

    
           self.log.dev("Build highwaters for active partitions") 
        
           await self._wait( 
        
               T(self._build_highwaters)( 
        
                   consumer, assigned_active_tps, active_highwaters, "active" 
        
               ), 
        
               timeout=self.app.conf.broker_request_timeout, 
        
           )

faust/faust/tables/recovery.py

Lines 655 to 664 in 6588a97

    
           async def _build_highwaters( 
        
               self, consumer: ConsumerT, tps: Set[TP], destination: Counter[TP], title: str 
        
           ) -> None: 
        
               # -- Build highwater 
        
               highwaters = await consumer.highwaters(*tps) 
        
               highwaters = { 
        
                   # FIXME the -1 here is because of the way we commit offsets 
        
                   tp: value - 1 if value is not None else -1 
        
                   for tp, value in highwaters.items() 
        
               }

If the partition is empty high - 1 does not exist, and the recovery will fail. There is even a FIXME in building the highwaters. In my opinion, it would be better to also get the low watermarks and -1 if high is None or low == high else high - 1

Building earliest offsets for active partitions

faust/faust/tables/recovery.py

Lines 396 to 401 in 6588a97

    
           await self._wait( 
        
               T(self._build_offsets)( 
        
                   consumer, assigned_active_tps, active_offsets, "active" 
        
               ), 
        
               timeout=self.app.conf.broker_request_timeout, 
        
           )

Similarly, if the topic partition is empty low - 1 (offset low after the +1 adjustment) would not exist

faust/faust/tables/recovery.py

Lines 703 to 713 in 6588a97

    
           async def _build_offsets( 
        
               self, consumer: ConsumerT, tps: Set[TP], destination: Counter[TP], title: str 
        
           ) -> None: 
        
               # -- Update offsets 
        
               # Offsets may have been compacted, need to get to the recent ones 
        
               earliest = await consumer.earliest_offsets(*tps) 
        
               # FIXME To be consistent with the offset -1 logic 
        
               earliest = { 
        
                   tp: offset - 1 if offset is not None else None 
        
                   for tp, offset in earliest.items() 
        
               }

In my opinion, this could be None if offset is None else min(offset, highwaters.get(offset, offset))

Standby partitions

Moreover, recovering standby partitions has a separate issue in the consistency checks. First, let's see what is the sequence of steps for active partitions such that we can draw a parallel.

Active:

Find latest/max offsets, min/earliest offsets, run consistency checks and seek to offset

faust/faust/tables/recovery.py

Lines 387 to 404 in 6588a97

    
           self.log.dev("Build highwaters for active partitions") 
        
           await self._wait( 
        
               T(self._build_highwaters)( 
        
                   consumer, assigned_active_tps, active_highwaters, "active" 
        
               ), 
        
               timeout=self.app.conf.broker_request_timeout, 
        
           ) 
        
           self.log.dev("Build offsets for active partitions") 
        
           await self._wait( 
        
               T(self._build_offsets)( 
        
                   consumer, assigned_active_tps, active_offsets, "active" 
        
               ), 
        
               timeout=self.app.conf.broker_request_timeout, 
        
           ) 
        
           if self.app.conf.recovery_consistency_check: 
        
               for tp in assigned_active_tps: 
        
                   if (

faust/faust/tables/recovery.py

Lines 425 to 430 in 6588a97

    
           self.log.dev("Seek offsets for active partitions") 
        
           await self._wait( 
        
               T(self._seek_offsets)( 
        
                   consumer, assigned_active_tps, active_offsets, "active" 
        
               ), 
        
               timeout=self.app.conf.broker_request_timeout,

Standby:

Find min/earliest offsets, seek to offsets, find max/latest offsets, run consistency checks

faust/faust/tables/recovery.py

Lines 417 to 423 in 6588a97

    
           self.log.dev("Build offsets for standby partitions") 
        
           await self._wait( 
        
               T(self._build_offsets)( 
        
                   consumer, assigned_standby_tps, standby_offsets, "standby" 
        
               ), 
        
               timeout=self.app.conf.broker_request_timeout, 
        
           )

faust/faust/tables/recovery.py

Lines 492 to 526 in 6588a97

    
           if standby_tps: 
        
               self.log.info("Starting standby partitions...") 
        
               self.log.dev("Seek standby offsets") 
        
               await self._wait( 
        
                   T(self._seek_offsets)( 
        
                       consumer, standby_tps, standby_offsets, "standby" 
        
                   ), 
        
                   timeout=self.app.conf.broker_request_timeout, 
        
               ) 
        
               self.log.dev("Build standby highwaters") 
        
               await self._wait( 
        
                   T(self._build_highwaters)( 
        
                       consumer, 
        
                       standby_tps, 
        
                       standby_highwaters, 
        
                       "standby", 
        
                   ), 
        
                   timeout=self.app.conf.broker_request_timeout, 
        
               ) 
        
               if self.app.conf.recovery_consistency_check: 
        
                   for tp in standby_tps: 
        
                       if ( 
        
                           standby_offsets[tp] 
        
                           and standby_highwaters[tp] 
        
                           and standby_offsets[tp] > standby_highwaters[tp] 
        
                       ): 
        
                           raise ConsistencyError( 
        
                               E_PERSISTED_OFFSET.format( 
        
                                   tp, 
        
                                   standby_offsets[tp], 
        
                                   standby_highwaters[tp], 
        
                               ), 
        
                           )

The problem is that after seeking the offsets may be updated asynchronously so by the time the consistency checks run they may no longer hold.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot recover when any changelog topic partition becomes empty (as a result of some retention policy) #597

Cannot recover when any changelog topic partition becomes empty (as a result of some retention policy) #597

cristianmatache commented Dec 29, 2023 •

edited

Loading

Cannot recover when any changelog topic partition becomes empty (as a result of some retention policy) #597

Cannot recover when any changelog topic partition becomes empty (as a result of some retention policy) #597

Comments

cristianmatache commented Dec 29, 2023 • edited Loading

Checklist

Steps to reproduce

Active partitions

Standby partitions

cristianmatache commented Dec 29, 2023 •

edited

Loading