-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix; incorrect length was used to check for discarded areas #103
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #103 +/- ##
==========================================
+ Coverage 66.24% 66.29% +0.05%
==========================================
Files 6 6
Lines 1176 1178 +2
==========================================
+ Hits 779 781 +2
Misses 278 278
Partials 119 119
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Our issue still seems to persist: attempting to get a larger chunk returns less data than requested (generated a 100MB test file for this).
This is expected to return Running on |
30M I assume here? Is this exact code that gives that result or is that something you can share? Similarly, what's the exact size in bytes of the file here (can it be shared easily)? Is this with a separate testprogram or through sda-download? |
The code is the unit test in this PR, but instead of This is how we used the data edit list to decrypt parts of the file between specific coordinates, and it worked like this up to version
provided us with Are you saying the data edit list in and before that version was not working correctly? How would it then be used to read a section between X and Y bytes? And how does it work with your example with 3 Lengths |
Yes, that sounds incorrect. Data edit list handling is described in 4.2 in My "simple" interpretation of that is alternating values of how much to discard and how much to keep, with a missing final keep being interpreted as copy the rest "until the end of file" Anyway, those values being counters rather than file offsets means the data edit list as above would discard the first 10000000 bytes, provide the next 30000000 and then nothing more, so you should get what is between file offset But looking at https://github.com/neicnordic/sensitive-data-archive/blob/main/sda-download/api/sda/sda.go#L262-L264, it seems this was done correctly in sda-download. I'll try experimenting a bit running with running the unit tests with a larger file size for the sample file (right now I'm seeing a failure because of a non-failure, but that being in the writer hasn't been touched for a long time). For extra clarity: odd number of lengths in a data edit list means that the rest of the file will be provided after the last discard, so |
So, one of the tests had bad assumptions and has been fixed. There's also a test that (for better or worse) depends on the content being what it is in the repo and will fail with an empty file. With the fixes in this branch, the tests work fine after extending |
Ok, it makes sense now, thank you! 😅 |
I think we need to revert this, normal decryption does not work at all steps to reproduce:
|
confirmed, decryption hangs without producing anything |
Oops. Yes, I guess it's best to revert for now (I don't think I can look today). I think this fix is good and needed but something else might also be required. (Although I'm a bit surprised I didn't see this while testing.) |
This fixes a bug that could lead to incorrect data being returned in some cases.
When doing reads, we determine the largest amount of data we can read in a single swoop to check that area for holes caused by any edit lists. This is the lowest number of the length of the slice we should store data in and the readable data left in the buffer.
The computation incorrectly used the number of bytes that had been read rather than bytes that can be read, leading to the incorrect part of the logical decrypted stream being checked for holes.
Checking an area that is too large is not a problem, but if the area to check is too small, it could lead to incorrect data being returned if the smaller bit does not contain holes but the correctly calculated (based on the amount of bytes that can be read from the decrypted buffer) bit does.
As an added precaution, this PR also makes sure to not try to read more than the calculated bit.