-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMBER - in "simplesolvated" directory all files are not compressed #68
Comments
@orbeckst Maybe you could comment on this as well. I'm leaning towards having all of them as uncompressed files. I mean yes bz2 will make the file smaller so save some time in downloading the test files but these time is then spent on unzipping the files and writing it to a local file before reading it. At the end of the day, they are not going to drastically decrease the CI time. Considering that we are pulling the files from On the other hand, saving the files as bz2 would mean that the user has to do the unzipping, which seems like extra effort to me to be honest. |
I LOVE this suggestion :) I was afraid you preferred to have all compressed files, hence the issue, but I completely agree that's much easier to create and check tests with uncompressed files! |
I'd like to keep them compressed (or compresse them if not done so already) so that the footprint of the test package is small. We never uncompress files to disk. We decompress on the fly with Do you have timing information that shows a marked difference between compressed and uncompressed test files? |
Ok, I'll compress the files in current and future PRs then! |
@orbeckst So I did a test with bz2 import time
from alchemlyb.parsing.amber import extract_u_nk
from alchemtest.amber import load_bace_example
file = load_bace_example().data['solvated']['decharge'][0]
a = time.time()
for i in range(1000):
extract_u_nk(file, 298)
b = time.time()
print(b-a) gives 43.86949706077576 with bz2.open(file, "rt") as bz_file:
text = bz_file.read()
with open('test.out', 'w') as f:
f.write(text)
a = time.time()
for i in range(1000):
extract_u_nk('test.out', 298)
b = time.time()
print(b-a) gives 24.963331937789917 With regard to the zip file size, I have created a branch where all bz2 files have been unzipped (https://github.com/xiki-tempula/alchemtest/tree/feat_unzip), the zip file has a size of 82.9 MB which is slightly larger than the original zip file of 73.9 MB but I don't think it is too bad? Plus, it will avoid #71 (comment) |
I agree that the difference in download size is not an issue for alchemlyb where we take the zip. Maybe that's even true of whl etc (not sure if they are compressed). However, when installed, the difference is sizable
so in order to keep the installed footprint down, I'd really like to keep datafiles compressed. (This is not important for CI but for users/developers.) The data that you showed #68 (comment) indicate overhead and that's interesting. I don't know if loading the data is the bottleneck for most tests — it might well be — and if this is the case then unpacking the data files might cut CI time by 25% - 50%. However, I think that the GROMACS reader will not be affected as much as it uses I ran
Probably anything labelled setup is related to reading files (possibly also related to alchemistry/alchemlyb#206 ) but I cannot discern from the output if we're reading GMX or AMBER in these tests. @xiki-tempula is your main argument for uncompressing that we want to make CI run faster? If so, then the question is by how much it would really speed up things. And then one has to decide what tradeoffs one wants to make. |
I'll close this issue as "won't fix" for now. Please chime in if you want to re-open the discussion. |
All the files neded for testing are compressed as .bz2 files, except the ones inside the
simplesolvated
directory. Is there a reason behind this? We could spare some space compressing also these files.The text was updated successfully, but these errors were encountered: