AMBER - in "simplesolvated" directory all files are not compressed #68

DrDomenicoMarson · 2022-09-24T16:34:31Z

All the files neded for testing are compressed as .bz2 files, except the ones inside the simplesolvated directory. Is there a reason behind this? We could spare some space compressing also these files.

The text was updated successfully, but these errors were encountered:

xiki-tempula · 2022-09-25T10:12:37Z

@orbeckst Maybe you could comment on this as well.

I'm leaning towards having all of them as uncompressed files. I mean yes bz2 will make the file smaller so save some time in downloading the test files but these time is then spent on unzipping the files and writing it to a local file before reading it. At the end of the day, they are not going to drastically decrease the CI time.

Considering that we are pulling the files from alchemtest/archive/master.zip, file > bz2 > zip might not really shrink much space compared with file > zip.

On the other hand, saving the files as bz2 would mean that the user has to do the unzipping, which seems like extra effort to me to be honest.

DrDomenicoMarson · 2022-09-25T11:03:03Z

I LOVE this suggestion :) I was afraid you preferred to have all compressed files, hence the issue, but I completely agree that's much easier to create and check tests with uncompressed files!

orbeckst · 2022-09-26T17:04:10Z

I'd like to keep them compressed (or compresse them if not done so already) so that the footprint of the test package is small.

We never uncompress files to disk. We decompress on the fly with anyopen() https://github.com/alchemistry/alchemlyb/blob/cb965ad049556dc97445acc16cbb21a2bf3f7f38/src/alchemlyb/parsing/util.py#L20

Do you have timing information that shows a marked difference between compressed and uncompressed test files?

DrDomenicoMarson · 2022-09-27T07:32:10Z

Ok, I'll compress the files in current and future PRs then!

xiki-tempula · 2022-10-02T10:27:10Z

@orbeckst So I did a test with bz2

import time
from alchemlyb.parsing.amber import extract_u_nk
from alchemtest.amber import load_bace_example
file = load_bace_example().data['solvated']['decharge'][0]
a = time.time()
for i in range(1000):
    extract_u_nk(file, 298)
b = time.time()
print(b-a)

gives 43.86949706077576

with bz2.open(file, "rt") as bz_file:
    text = bz_file.read()
    with open('test.out', 'w') as f:
        f.write(text)
a = time.time()
for i in range(1000):
    extract_u_nk('test.out', 298)
b = time.time()
print(b-a)

gives 24.963331937789917
So running bz2 will make it 75% slower.

With regard to the zip file size, I have created a branch where all bz2 files have been unzipped (https://github.com/xiki-tempula/alchemtest/tree/feat_unzip), the zip file has a size of 82.9 MB which is slightly larger than the original zip file of 73.9 MB but I don't think it is too bad?

Plus, it will avoid #71 (comment)

orbeckst · 2022-10-10T19:01:43Z

I agree that the difference in download size is not an issue for alchemlyb where we take the zip. Maybe that's even true of whl etc (not sure if they are compressed). However, when installed, the difference is sizable

 81M	0.6.0.tar.gz
 94M	alchemtest-0.6.0             # as dowloaded
304M	alchemtest-0.6.0-unpacked    # manually unpacked all gz and bz2 files

so in order to keep the installed footprint down, I'd really like to keep datafiles compressed. (This is not important for CI but for users/developers.)

The data that you showed #68 (comment) indicate overhead and that's interesting. I don't know if loading the data is the bottleneck for most tests — it might well be — and if this is the case then unpacking the data files might cut CI time by 25% - 50%. However, I think that the GROMACS reader will not be affected as much as it uses pandas.read_csv(), which is already much faster than our manually written ASCII readers (but I don't know how much it is affected by compression).

I ran pytest -v --durations=20 (on my laptop with SSD) to see where we spend time (we can add --durations to the alchemlyb CI for more information): here are the 20 slowest tests:

============================================================ slowest 20 durations =============================================================
34.28s setup    src/alchemlyb/tests/test_workflow_ABFE.py::Test_automatic_ABFE::test_read
32.75s setup    src/alchemlyb/tests/test_workflow_ABFE.py::Test_manual_ABFE::test_read
9.33s call     src/alchemlyb/tests/test_workflow_ABFE.py::Test_manual_ABFE::test_convergence_method
9.30s call     src/alchemlyb/tests/test_workflow_ABFE.py::Test_automatic_ABFE::test_convergence_method
7.18s call     src/alchemlyb/tests/test_workflow_ABFE.py::Test_manual_ABFE::test_convergence_nosample_u_nk
5.16s setup    src/alchemlyb/tests/test_workflow_ABFE.py::Test_automatic_benzene::test_read
4.61s call     src/alchemlyb/tests/test_visualisation.py::test_plot_dF_state
2.83s setup    src/alchemlyb/tests/test_workflow_ABFE.py::Test_automatic_amber::test_summary
2.74s call     src/alchemlyb/tests/test_workflow_ABFE.py::Test_manual_ABFE::test_estimator_method
2.72s call     src/alchemlyb/tests/test_workflow_ABFE.py::Test_automatic_ABFE::test_estimator_method
1.99s setup    src/alchemlyb/tests/test_fep_estimators.py::TestMBAR::test_mbar[X_delta_f5]
1.96s call     src/alchemlyb/tests/parsing/test_gmx.py::test_u_nk_with_potential_energy
1.88s call     src/alchemlyb/tests/parsing/test_gmx.py::test_u_nk_with_total_energy
1.85s call     src/alchemlyb/tests/test_visualisation.py::test_plot_ti_dhdl
1.81s setup    src/alchemlyb/tests/test_fep_estimators.py::TestBAR::test_bar[X_delta_f1]
1.79s call     src/alchemlyb/tests/parsing/test_gmx.py::test_u_nk_without_energy
1.78s setup    src/alchemlyb/tests/test_fep_estimators.py::TestAutoMBAR::test_mbar[X_delta_f5]
1.73s setup    src/alchemlyb/tests/test_fep_estimators.py::TestAutoMBAR::test_mbar[X_delta_f8]
1.72s setup    src/alchemlyb/tests/test_fep_estimators.py::TestMBAR::test_mbar[X_delta_f8]
1.72s call     src/alchemlyb/tests/test_fep_estimators.py::TestMBAR::test_mbar[X_delta_f4]
=============================================== 383 passed, 3141 warnings in 238.44s (0:03:58) ================================================

Probably anything labelled setup is related to reading files (possibly also related to alchemistry/alchemlyb#206 ) but I cannot discern from the output if we're reading GMX or AMBER in these tests.

@xiki-tempula is your main argument for uncompressing that we want to make CI run faster?

If so, then the question is by how much it would really speed up things. And then one has to decide what tradeoffs one wants to make.

orbeckst · 2022-10-21T18:00:31Z

I'll close this issue as "won't fix" for now. Please chime in if you want to re-open the discussion.

xiki-tempula mentioned this issue Oct 10, 2022

Added the capability of loading a single test case for AMBER #71

Merged

orbeckst added the wontfix label Oct 21, 2022

orbeckst closed this as not planned Won't fix, can't repro, duplicate, stale Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMBER - in "simplesolvated" directory all files are not compressed #68

AMBER - in "simplesolvated" directory all files are not compressed #68

DrDomenicoMarson commented Sep 24, 2022

xiki-tempula commented Sep 25, 2022 •

edited

Loading

DrDomenicoMarson commented Sep 25, 2022

orbeckst commented Sep 26, 2022

DrDomenicoMarson commented Sep 27, 2022

xiki-tempula commented Oct 2, 2022 •

edited

Loading

orbeckst commented Oct 10, 2022

orbeckst commented Oct 21, 2022

AMBER - in "simplesolvated" directory all files are not compressed #68

AMBER - in "simplesolvated" directory all files are not compressed #68

Comments

DrDomenicoMarson commented Sep 24, 2022

xiki-tempula commented Sep 25, 2022 • edited Loading

DrDomenicoMarson commented Sep 25, 2022

orbeckst commented Sep 26, 2022

DrDomenicoMarson commented Sep 27, 2022

xiki-tempula commented Oct 2, 2022 • edited Loading

orbeckst commented Oct 10, 2022

orbeckst commented Oct 21, 2022

xiki-tempula commented Sep 25, 2022 •

edited

Loading

xiki-tempula commented Oct 2, 2022 •

edited

Loading