You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running extract with seeds, there are a couple of issues that seem different but happen in the same piece of code. Fixing those requires a rewrite of the code that determines what seed should be used.
The current algorithm tries to find the seed with the longest sequence of matching chunks, regardless if that sequence is aligned and can be reflinked or not. Ideally, the algorithm should always use "clone-able" ranges first and if there aren't any more, then also consider "copy-able". This will result in optimal storage efficiency and performance.
When a seed is corrupt (the hash of a chunk doesn't match), the whole process fails. It'd be much better to either evict the whole seed and carry on without it, or to keep going and just get the damaged chunks by other means (other seeds or chunk store).
Getting both of those right and doing it all concurrently may require additional complexity, perhaps the self-seed, where chunks from earlier sections of the target file are copied/cloned to later sections, may need to be dropped and be replaced with a static execution plan calculated beforehand, not during execution.
The text was updated successfully, but these errors were encountered:
If we have a seed index for a file/device that is corrupted we should
try harder to continue anyway by discarding the invalid seed and
fallback to the potentially other seeds and/or by just using the store.
Partially addresses folbricht#117
Signed-off-by: Ludovico de Nittis <[email protected]>
When running
extract
with seeds, there are a couple of issues that seem different but happen in the same piece of code. Fixing those requires a rewrite of the code that determines what seed should be used.The current algorithm tries to find the seed with the longest sequence of matching chunks, regardless if that sequence is aligned and can be reflinked or not. Ideally, the algorithm should always use "clone-able" ranges first and if there aren't any more, then also consider "copy-able". This will result in optimal storage efficiency and performance.
When a seed is corrupt (the hash of a chunk doesn't match), the whole process fails. It'd be much better to either evict the whole seed and carry on without it, or to keep going and just get the damaged chunks by other means (other seeds or chunk store).
Getting both of those right and doing it all concurrently may require additional complexity, perhaps the self-seed, where chunks from earlier sections of the target file are copied/cloned to later sections, may need to be dropped and be replaced with a static execution plan calculated beforehand, not during execution.
The text was updated successfully, but these errors were encountered: