[bug] Creating new parity files: "panic: too many shards" #7

brenthuisman · 2021-01-14T12:39:53Z

For larger files (the threshold is somewhere between 9.2 and 77MB) I consistently get this error when I try to create parity. Looking at memory usage, all files (one is 2.7GB) seem to be loaded in full. The error seems to come right after loading:

[1/1] Loaded data file "bsc.tar.zst" (578352090 bytes)
panic: too many data shards

goroutine 1 [running]:
main.main()

The text was updated successfully, but these errors were encountered:

akalin · 2021-01-16T00:16:38Z

That error is when the number of data shards is >256, which is a par2 limitation. I suspect it has to do with the default block size not being intelligently picked, but fixed at 2000. Can you try with par2 c -s <n> ... where n is larger than 2000 (but a multiple of 4) and see if that fixes it?

brenthuisman · 2021-01-16T16:17:49Z

It is also happens when -s and -c are not specified. Maybe a better default would be nice?

I'm cracking my head a second time over how par2cmdline calculates blocksize and blockcount from a redundancy level (percentage of protection). I believe the blockcount is always 2000 if you specify a redundancy:

https://github.com/brenthuisman/libpar2/blob/master/src/commandline.cpp#L1099

On the other hand, there's a recoveryblockcount being calculated, that that the one I should set for -c?

https://github.com/brenthuisman/libpar2/blob/master/src/commandline.cpp#L1223

In summary it is this:

for (file in files):
      filesize = sizeof(file)
      sourceblockcount += (filesize + blocksize-1) / blocksize;
recoveryblockcount = (sourceblockcount * redundancy + 50) / 100;

Remaining question is how to get the blocksize. Assuming blockcount = 2000, I think we're almost always in this block:

https://github.com/brenthuisman/libpar2/blob/master/src/commandline.cpp#L1151

Unfortunately I'm struggling to read along there.

Do you have any idea for a heuristic for -s and -c based on the number of files and filesize (I always use per-file parity, so the only variable is filesize)?

brenthuisman · 2021-01-16T20:27:32Z

This seems pretty robust and doing what I think it does:

	def getblocksizecount(self,filename):
		f_size = os.path.getsize(filename)
		blocksize_min = f_size//2**15 # size can never be below this
		blocksize_f = (f_size*self.percentage)//100
		blockcount_max = 2**7-1 #some logic to keep blockcount and overhead for small files under control
		if f_size < 1e6:
			blockcount_max = 2**3-1
		elif f_size < 4e6:
			blockcount_max = 2**4-1
		elif f_size < 20e6:
			blockcount_max = 2**5-1
		if blocksize_f > blocksize_min:
			try:
				blockcount = min(blockcount_max,blocksize_f//blocksize_min)
				blocksize = blocksize_f/blockcount
			except ZeroDivisionError:
				blockcount = 1
				blocksize = blocksize_min
		else:
			blockcount = 1
			blocksize = 4
		blocksize = (blocksize//4+1)*4 #make multiple of 4
		return int(blocksize),int(blockcount)

akalin · 2021-01-17T20:02:53Z

I'll keep this open as a reminder to do calculate the parameters a bit more intelligently. The snippet you posted looks plausible, I assume you're gonna calculate that in your external app and pass that in.

(Also, I misspoke above, the shard limit for par2 is 65536, not 256 (which is par1).)

brenthuisman · 2021-01-18T08:13:07Z

OK, good idea. Indeed, this is what I calculate and pass in. Made a small modification to handle very small files.

The shard limit I found in par2cmdline is 2**15 (~32k), not 65536. I tested this, and gopar too showed a threshold there. Hence the 2**15 in the snippet.

akalin · 2021-01-18T08:22:26Z

Ah, yes you're right! Forgot there was a smaller limit for data shards.

brenthuisman · 2021-01-18T08:49:28Z

A nicer place for the snippet would be in gopars own flags of course, but I didn't do that because I felt that having different logic from par2cmdline for the -r flag could be confusing. On the other hand, maybe that's taking legacy compatibility a bit too far. What's your opinion on that?

akalin · 2021-01-18T09:08:07Z

Yeah I don't think there's any real need to implement par2cmdline's computation exactly -- in fact, it seems pretty ad hoc, and I think if I think about it for a bit I can come up with a more systematic way.

The calculation above is only for single files, right? In general, par2 would have handle multiple files, which might change thing a bit.

brenthuisman · 2021-01-18T09:28:01Z

Correct, this is only for single files, and therefore not ready for inclusion. I think par2cmdline takes the largest file as a basis for a first blockcount estimate, but then there's a loop that converges on something but I'm not sure what, or what the goal was there.

I only work with single file parity (that's the whole idea of par2deep).

brenthuisman changed the title ~~[bug] panic: too many shards~~ [bug] Creating new parity files: "panic: too many shards" Jan 15, 2021

brenthuisman closed this as completed Jan 17, 2021

akalin reopened this Jan 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Creating new parity files: "panic: too many shards" #7

[bug] Creating new parity files: "panic: too many shards" #7

brenthuisman commented Jan 14, 2021

akalin commented Jan 16, 2021

brenthuisman commented Jan 16, 2021

brenthuisman commented Jan 16, 2021 •

edited

Loading

akalin commented Jan 17, 2021

brenthuisman commented Jan 18, 2021 •

edited by akalin

Loading

akalin commented Jan 18, 2021

brenthuisman commented Jan 18, 2021

akalin commented Jan 18, 2021

brenthuisman commented Jan 18, 2021

[bug] Creating new parity files: "panic: too many shards" #7

[bug] Creating new parity files: "panic: too many shards" #7

Comments

brenthuisman commented Jan 14, 2021

akalin commented Jan 16, 2021

brenthuisman commented Jan 16, 2021

brenthuisman commented Jan 16, 2021 • edited Loading

akalin commented Jan 17, 2021

brenthuisman commented Jan 18, 2021 • edited by akalin Loading

akalin commented Jan 18, 2021

brenthuisman commented Jan 18, 2021

akalin commented Jan 18, 2021

brenthuisman commented Jan 18, 2021

brenthuisman commented Jan 16, 2021 •

edited

Loading

brenthuisman commented Jan 18, 2021 •

edited by akalin

Loading