Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: unexpected ID string #5

Open
apalazzi opened this issue Jan 11, 2021 · 15 comments
Open

panic: unexpected ID string #5

apalazzi opened this issue Jan 11, 2021 · 15 comments

Comments

@apalazzi
Copy link

I'm trying to restore some corrupted backup files and since par2cmd doesn't seem to work (see this bug report ) I've tried par2go, however the process fails with an error:

andrea@marcopolo:~/par2-repair$ ~/go/bin/par v /mnt/casa/storage/1/Full-0005
[0] Loading volume file "/mnt/casa/storage/1/Full-0005" failed: unexpected ID string
panic: unexpected ID string

goroutine 1 [running]:
main.main()
	/home/andrea/go/src/github.com/akalin/gopar/cmd/par/main.go:426 +0xf7c
@akalin
Copy link
Owner

akalin commented Jan 11, 2021

Can you pass in the path to the .par2 file directly, e.g. ~/go/bin/par v /mnt/casa/storage/1/Full-0005.par2? Currently the par1 vs. par2 detection is very dumb and will treat it as par1 unless the file ends in .par2 (which is what's causing the error).

I'd be curious to see if gopar runs into a similar problem as par2cmdline did!

@apalazzi
Copy link
Author

I've run a test and the result is the same as with par2cmdline: the file is not repaired and, to be more precise, the "repaired" file is the same as the original Full-0005 file.

Here is the (cut down) log of the repair operation:

Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes)
Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd
Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0
Loaded creator packet with client ID "Created by par2cmdline version 0.7.4."
Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd)
[1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses)
Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
Loaded recovery packet: exponent=0, byte count=9663672
[1] Loaded volume file "Full-0005.vol000+001.par2"
Loaded recovery packet: exponent=1, byte count=9663672
Loaded recovery packet: exponent=2, byte count=9663672
[2] Loaded volume file "Full-0005.vol001+002.par2"
Loaded recovery packet: exponent=3, byte count=9663672
Loaded recovery packet: exponent=4, byte count=9663672
Loaded recovery packet: exponent=5, byte count=9663672
Loaded recovery packet: exponent=6, byte count=9663672
[3] Loaded volume file "Full-0005.vol003+004.par2"
Loaded recovery packet: exponent=7, byte count=9663672
Loaded recovery packet: exponent=8, byte count=9663672
Loaded recovery packet: exponent=9, byte count=9663672
Loaded recovery packet: exponent=10, byte count=9663672
Loaded recovery packet: exponent=11, byte count=9663672
[.....]
Loaded recovery packet: exponent=105, byte count=9663672
Loaded recovery packet: exponent=106, byte count=9663672
Loaded recovery packet: exponent=107, byte count=9663672
Loaded recovery packet: exponent=108, byte count=9663672
Loaded recovery packet: exponent=109, byte count=9663672
Loaded recovery packet: exponent=110, byte count=9663672

Is the log supposed to end like this, without a message of "repair terminated" or something?

@akalin
Copy link
Owner

akalin commented Jan 12, 2021

Hmm, the I/O for gopar is quite unoptimized, since it just loads everything into memory, so it's entirely possible it's just running out of memory and getting killed. I'll take a closer look...

@akalin
Copy link
Owner

akalin commented Jan 12, 2021

One thing you can try: it seems it's running out of memory when trying to load all of the recovery packets, but you only need one. If you move all but one or two recovery file out of the way, perhaps it can make further progress?

@apalazzi
Copy link
Author

I noticed that it needs a lot of memory ;) but if it runs out of memory usually either the process is killed and the system reports it, or the system hangs. I'll try with only one file and see what happens.

@apalazzi
Copy link
Author

Here are the results of the test with only one file:

Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes)
Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd
Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0
Loaded creator packet with client ID "Created by par2cmdline version 0.7.4."
Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd)
[1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses)
Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
Loaded recovery packet: exponent=3, byte count=9663672
Loaded recovery packet: exponent=4, byte count=9663672
Loaded recovery packet: exponent=5, byte count=9663672
Loaded recovery packet: exponent=6, byte count=9663672
[1] Loaded volume file "Full-0005.vol003+004.par2"
Repair error: hash mismatch in reconstructed data

I've also tried with vol000+001 and vol001+2 with the same result.
Does that tells you something?

@akalin
Copy link
Owner

akalin commented Jan 12, 2021

Hunh, interesting. So par2 stores the md5 hash of every file in the set, and that error is returned whenever the md5 of the reconstructed file still doesn't match the stored hash. This seems consistent with the behavior of par2cmdline in the other bug.

In this case, gopar refuses to write out the reconstructed file, whereas it looks like par2cmdline writes it out anyway.

I could perhaps add more output saying what the expected vs. actual hashes are, and also maybe add a switch to force writing out the reconstructed file even if the hashes don't match. Then you can perhaps compare the output of gopar vs. the output of par2cmdline to see if they match (which I expect), and that would mostly rule out a bug in par2cmdline's reconstruction. What do you think?

One more question: did you run gopar on the original corrupted file, or on the output of par2cmdline's reconstruction?

@apalazzi
Copy link
Author

Yes, more output is welcome :) also, if I understand correctly, gopar overwrites the original file with the corrected data: maybe you could make it write a completely new file instead, that way we can rule out any filesystem issue in overwriting the file.

PS: the original file is identical to the one reconstructed by par2cmdline (same md5sum).

@akalin
Copy link
Owner

akalin commented Jan 12, 2021

Ok, I added some more output -- I put it in the issue5-debug branch, so please compile from that. I also made it so it writes files to <filename>.recovered.

Oh, so par2cmdline wrote out a new repaired file, but it's identical to the original file? Interesting...one theory that came to mind is this sequence of events during the creation of the par2 files:

  1. par2cmdline scans Full-0005 to compute its MD5 hash and the other block checksums that par2 computes.
  2. Something else comes along and modifies the block between bytes 18148376016 to 18158039687.
  3. par2cmdline computes the recovery data, but using the modified file.
  4. par2cmdline writes out the par2 data with the old hashes but the new recovery data.

Then any subsequent run of par2cmdline/gopar would detect the corrupted block, but recovery would simply reconstruct the corrupted block.

Anyway, that's just a theory, but it seems to fit with all the observed behaviors. Unfortunately, if that's the explanation I don't think there's any practical way to recover the original data without knowing the structure of the file. :( I'll sleep on it and see if I can come up with another explanation that's more hopeful...

@apalazzi
Copy link
Author

par2cmdline overwrites the original file, but in the end the repaired file is not changed. I'm not sure I understood your theory. Note also that this is not the only file where recovery fails.

Anyway, here is the results: please check also the preliminary steps to confirm I'm correctly running all the modified sources:

andrea@marcopolo:~/go/src/github.com/akalin/gopar$ git pull
warning: L'esecuzione di un pull senza specificare come riconciliare branch
divergenti non è consigliata. È possibile sopprimere questo messaggio
eseguendo uno dei seguenti comandi prima di eseguire il prossimo pull:

  git config pull.rebase false  # merge (strategia predefinita)
  git config pull.rebase true   # rebase
  git config pull.ff only       # esegui solo fast forward

Puoi sostituire "git config" con "git config --global" per impostare una
preferenza predefinita per tutti i repository. Puoi anche passare gli
argomenti --rebase, --no-rebase o --ff-only sulla riga di comando per
eseguire l'override del valore predefinito configurato per una singola
invocazione.

remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 28 (delta 12), reused 22 (delta 12), pack-reused 0
Decompressione degli oggetti in corso: 100% (28/28), 22.64 KiB | 526.00 KiB/s, fatto.
Da https://github.com/akalin/gopar
   93c63e8..e638488  master       -> origin/master
 * [nuovo branch]    issue5-debug -> origin/issue5-debug
Aggiornamento di 93c63e8..e638488
Fast-forward
 cmd/par/main.go      | 59 ++++++++++++++++++++++++++++++++++++++---------------------
 par1/decoder.go      | 31 ++++++++++++-------------------
 par1/decoder_test.go | 24 ++++++++++++------------
 par1/encoder_test.go |  4 ++--
 par2/decoder.go      | 57 ++++++++++++++++++++++-----------------------------------
 par2/decoder_test.go | 24 ++++++++++++------------
 par2/encoder_test.go |  4 ++--
 rsec16/coder.go      | 46 ++++++++++++++++++++++++++++++++++++++++------
 rsec16/coder_test.go | 28 +++++++++++++++++++++-------
 9 files changed, 161 insertions(+), 116 deletions(-)
andrea@marcopolo:~/go/src/github.com/akalin/gopar$ git checkout issue5-debug
Branch 'issue5-debug' impostato per tracciare il branch remoto 'issue5-debug' da 'origin'.
Si è passati a un nuovo branch 'issue5-debug'

andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ go run github.com/akalin/gopar/cmd/par r Full-0005.par2 
Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes)
Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd
Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0
Loaded creator packet with client ID "Created by par2cmdline version 0.7.4."
Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd): 16k hash matches (868ce610521d5d69d1d26c664cc5d366) but expected hash (dc31cf125b1fd387dcb49bd86444bd84) doesn't match actual hash (a1ceb53101e86333edc69c8e057f5f9e)
[1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses)
Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
Loaded recovery packet: exponent=3, byte count=9663672
Loaded recovery packet: exponent=4, byte count=9663672
Loaded recovery packet: exponent=5, byte count=9663672
Loaded recovery packet: exponent=6, byte count=9663672
[1] Loaded volume file "Full-0005.vol003+004.par2"
DEBUG: hash mismatch error for file Full-0005, 16k hash matches (868ce610521d5d69d1d26c664cc5d366) but expected hash (dc31cf125b1fd387dcb49bd86444bd84) doesn't match actual hash (27b03b2874e1456da3745bc38c7bf772)[1/1] Wrote data file "Full-0005.recovered" (19327343417 bytes)
Repaired files: [Full-0005]

andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ md5sum Full-0005
a1ceb53101e86333edc69c8e057f5f9e  Full-0005
andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ md5sum Full-0005.recovered 
27b03b2874e1456da3745bc38c7bf772  Full-0005.recovered

@mdnahas
Copy link

mdnahas commented Jan 12, 2021

Hi akalin, I'm Mike Nahas, the designer of the PAR2 spec.

So, apallazi, I think what akalin was saying is: Is it possible that files were changing while generating the original par2 files? If they were, different versions of the input files might have been read at different times. The file data is used (1) to generate md5sums for each file and (2) to generate the recovery data. If those reads were different, the recovery data may be generating a file that doesn't match the md5sum for it.

BTW, if the data was compressed, most compression/archiving programs (ZIP, TAR, etc.) can still regenerate most of the files in them. So, hopefully, you haven't lost all your data. (I know that is small comfort after you used PAR2 just to avoid that issue. Sorry.)

Mike

@akalin
Copy link
Owner

akalin commented Jan 13, 2021

@apalazzi that looks good. But yeah, unless you have any other ideas for things to try, the current theory of Full-0005 being modified during par2 generation seems most likely. Out of curiosity, what format is Full-0005? I wonder if it has some sort of integrity checking itself...

@mdnahas one weirdness is that gopar writes out a new file with a different md5, whereas it sounds like par2cmdline writes out a repaired file that's the same as the original one. Is that expected? Does par2cmdline do something different if it detects that the recovered file doesn't match the hash?

@apalazzi
Copy link
Author

@mdnahas Don't worry, the data should be safe (but see below), I was just double-checking if everything was ok with the backup.
@akalin the file is a bareos backup file, it has some integrity check but not date recovery capabilities.

I've recently discovered that one of the ram modules was bad; even though the backup dates back to 2017, if the ram was already bad at the time that would explain the issue. If this is the case I guess it's impossible to recover the corrupted files... or is there a way to still recover the data?

@mdnahas
Copy link

mdnahas commented Jan 14, 2021

@apalazzi If the memory was unreliable, it would be impossible to recover the data.

@akalin I'm just a designer of the spec. I don't know much about how par2cmdline is supposed to operate.

@akalin
Copy link
Owner

akalin commented Jan 16, 2021

@apalazzi would it be possible to run those integrity checks on the various versions of the file? I'd be interested to see if any of them pass, and if some have more failures than others...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants