Redesign seed sequencer #117

folbricht · 2019-08-12T22:42:11Z

When running extract with seeds, there are a couple of issues that seem different but happen in the same piece of code. Fixing those requires a rewrite of the code that determines what seed should be used.

The current algorithm tries to find the seed with the longest sequence of matching chunks, regardless if that sequence is aligned and can be reflinked or not. Ideally, the algorithm should always use "clone-able" ranges first and if there aren't any more, then also consider "copy-able". This will result in optimal storage efficiency and performance.
When a seed is corrupt (the hash of a chunk doesn't match), the whole process fails. It'd be much better to either evict the whole seed and carry on without it, or to keep going and just get the damaged chunks by other means (other seeds or chunk store).

Getting both of those right and doing it all concurrently may require additional complexity, perhaps the self-seed, where chunks from earlier sections of the target file are copied/cloned to later sections, may need to be dropped and be replaced with a static execution plan calculated beforehand, not during execution.

The text was updated successfully, but these errors were encountered:

If we have a seed index for a file/device that is corrupted we should try harder to continue anyway by discarding the invalid seed and fallback to the potentially other seeds and/or by just using the store. Partially addresses folbricht#117 Signed-off-by: Ludovico de Nittis <[email protected]>

folbricht added the enhancement label Aug 12, 2019

RyuzakiKK mentioned this issue Dec 7, 2021

Add option to discard invalid seeds #203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign seed sequencer #117

Redesign seed sequencer #117

folbricht commented Aug 12, 2019

Redesign seed sequencer #117

Redesign seed sequencer #117

Comments

folbricht commented Aug 12, 2019