Added the capability of loading a single test case for AMBER #71

DrDomenicoMarson · 2022-09-25T22:06:40Z

This PR addresses issues #65 and #69. I removed the load_invalidfiles function and added the new load_testfile, putting all the single files that can be loaded inside the directory testfiles.

Currently, I left the files unzipped, but I shrank them removing unnecessary lines.

As it is, the docs also build fine. I have a doubt though.

To keep everything in line with the other functions, my load_testfile also returns a dictionary with data and DESCR, but also I put in the testfiles directory a descr.rst file, that is read to build the docs.

To have the function return the dictionary, I also created a descr file for each .out file, that are loaded together to populate the dict. in such descr file I wrote the simple description of the out file it's referring to. In this way the documentation is pretty scarse, as the common descr file is almost empty. I could write the description of each file also inside the common descr file, but this would be redundant. Wouldn't be better if I just:

place all the descriptions of the files in the common descr
make the load_testfile function returns just the file itself, without descr,
remove the now useless single desc file for each out file

I don't know if it's mandatory for the loading function to return a dictionary thought, in that case maybe it's ok how it is now!

This PR has to be "synchronized" with a PR also in alchemlyb, because test_amber.py there have to be updated to use this new function (I have this modification ready for when the current PR about amber parser will be accepted).

DrDomenicoMarson · 2022-09-27T08:31:11Z

I changed this a little bit, compressing all the new test files, and accordingly changed the loading function to search for the compressed version of the file!

DrDomenicoMarson · 2022-09-27T08:32:28Z

@xiki-tempula @orbeckst what do you think it's the best approach for the descr "problem"? Should I have a single descr file, multiple descr files or both?

orbeckst · 2022-09-27T16:37:29Z

See my #65 (comment)

orbeckst

See #65 (comment), in brief

deprecate load_invalidfiles()
add new load_testfiles(singlefilekey=None) (assuming @xiki-tempula supports the syntax and sees the need for having the function return a single file — I don't quite see the need)
give the test files better key names

Changes should be made so that alchemlyb tests don't break.

orbeckst · 2022-09-27T16:43:56Z

I don't know if it's mandatory for the loading function to return a dictionary thought, in that case maybe it's ok how it is now!

All loading function should return a bunch. Consistency in the API is very important.

Please check https://alchemtest.readthedocs.io/en/latest/contributing.html — your code refactor needs to fit the description. We don't want to get into a situation where datasets get added in an arbitrary manner.

In particular

Add an accessor function load_MYDATASET() to the access.py file at the top of the code directory. The accessor function makes the dataset available as a dict under the data key in the Bunch. The data are typically another dict with different parts of a calculation such as Coulomb and VDW parts being different keys in a dictionary. All files that are needed for a single free energy calculation are in a list under the appropriate key. The description text is added as the DESCR key.

orbeckst · 2022-09-27T16:49:27Z

Btw, I don't have a problem with renaming files — we are only providing access through the accessors so that means that the filename itself does not matter.

If you think that the invalid-case-N.out.bz2 could be named better, just rename to invalid-case-N-New_description_of_case.out.bz2 and leave the initial part of the file name the same so that it's easy to understand where the files came from.

orbeckst · 2022-09-27T18:35:04Z

For some reason, CI is not running the tests on this PR at the moment. I hope that when PR #72 is merged (and then merged into this PR), that will magically change...

orbeckst · 2022-09-27T19:59:05Z

I added required checks — ~~this seemed to have started the available ones.~~ EDIT: nope, only lists them as "Expected — Waiting for status to be updated". So this will hopefully run after PR #72.

There are missing ones which will be added with PR #72

orbeckst · 2022-09-27T20:23:25Z

Tests are running now — please check for failures.

orbeckst · 2022-09-27T21:02:35Z

Well... looking through the AMBER files (with an eye towards #49 ) and in particular load_bace_example() and load_invalid_files() do not even follow the API.

load_bace_example() contains another dict where there ought to be a list of files
load_invalid_files() is just a list of files (and following @xiki-tempula 's suggestion would actually become compatible and would be easily testable with the testing base class BaseDatasetTest)

But load_bace_example() admittedly breaks the defined API.

My take is that I'd rather have one documented outlier and strive for a clean API adherence in the future than to declare that "anything goes". But I am open to different arguments.

…icoMarson/alchemtestDM into add-amber-mbar-none-non-finished

DrDomenicoMarson · 2022-09-28T15:46:02Z

Dear @orbeckst and @xiki-tempula,
I reformatted the load_testfiles() function, now it returns a Bunch, where data is a dictionary with
key (name of the file without .out.tar.bz2): [path to the file]

So now we can access the single files with load_testfiles().data['file_I_want'][0]

The descr.rst file now lists all the possible files, with a brief description for each of them, like it was previously for load_invalidfiles.

I checked the docs and it seems fine, I added the deprecation to load_invalidfiles (and to the directory which had the invalidfiles), so merging this PR will not break existing tests, while I have prepared a new test_amber.py that uses the new function, which I'll submit after this is closed, so after that, it will be safe to remove load_testfiles and its directory!

codecov · 2022-09-28T16:02:12Z

Codecov Report

Merging #71 (f711a06) into master (57458ef) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
+ Coverage   99.34%   99.37%   +0.03%     
==========================================
  Files          11       11              
  Lines         152      160       +8     
  Branches       18       19       +1     
==========================================
+ Hits          151      159       +8     
  Misses          1        1

Flag	Coverage Δ
unittests	`99.37% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/alchemtest/amber/__init__.py	`100.00% <100.00%> (ø)`
src/alchemtest/amber/access.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

orbeckst

Looking pretty good but please address the following (in addition to the inline comments)

Let's not have invalidfiles and testfiles directories in parallel. Just remove invalidfiles and change the file names inside load_invalid(). Just make sure that the order of files in the list that was returned corresponds to the previous ordering.
Add tests, see test_amber.py; you should be able to add load_testfiles() just to the generic TestAmber unit test.

src/alchemtest/amber/access.py

orbeckst · 2022-09-28T16:12:22Z

src/alchemtest/amber/access.py

+
+    .. deprecated:: 0.7
+        substituted by laod_testfiles
+


At the start of the function, issue a DeprecationWarning

warnings.warn("load_invalidfiles() was deprecated in 0.7.0 and will be removed in the following release. Use load_testfiles() instead", DeprecationWarning)

src/alchemtest/amber/access.py

src/alchemtest/amber/testfiles/descr.rst

Co-authored-by: Oliver Beckstein <[email protected]>

…icoMarson/alchemtestDM into add-amber-mbar-none-non-finished

xiki-tempula · 2022-10-01T21:45:08Z

src/alchemtest/amber/access.py

+        if f.suffix==".bz2":
+            while f.suffix in ('.tar', '.bz2', '.out'):
+                f = f.with_suffix('')
+            data[f.name] = [f_path]


So what I was wondering is that
if f has suffix of '.tar' or '.out', they won't pass the if f.suffix==".bz2": in the first place?

orbeckst · 2022-10-01T22:17:45Z

Quick comment: there should be a more robust way to split all extensions — something like splitext.

…

-- Oliver Beckstein * ***@***.*** https://becksteinlab.physics.asu.edu

On Oct 1, 2022, at 2:45 PM, Zhiyi Wu ***@***.***> wrote: @xiki-tempula commented on this pull request. In src/alchemtest/amber/access.py: > + Dictionary-like object, the interesting attributes are: + + - 'data' : the data files + - 'DESCR': the full description of all the files + + """ + + testfiles_path = Path(__file__).parent / 'testfiles' + + data = {} + for f in testfiles_path.iterdir(): + f_path = str(f) + if f.suffix==".bz2": + while f.suffix in ('.tar', '.bz2', '.out'): + f = f.with_suffix('') + data[f.name] = [f_path] So what I was wondering is that if f has suffix of '.tar' or '.out', they won't pass the if f.suffix==".bz2": in the first place? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

DrDomenicoMarson · 2022-10-01T22:21:50Z

Quick comment: there should be a more robust way to split all extensions — something like splitext.

I searched for a better method, but I found nothing, the default ones just split the file removing only the last item after the last "."!

orbeckst · 2022-10-01T22:36:17Z

According to https://stackoverflow.com/a/35188296 pathlib.Path.suffixes can give you all suffixes. Maybe it has a function to give you the basename or stem, too? I find it difficult to believe that the common case of multiple suffixes (path/dir/name.tar.bz2) doesn't have pre-built functionality in os.path or pathlib.

DrDomenicoMarson · 2022-10-01T22:42:10Z

I tried, it just gives a list of the basename splitted at ".", so in the not impossible situation of a "." used inside a filename the output is not what one should expect.

Regarding the comment of @xiki-tempula, it's true, I only checked for "bz2" as final extension because we agreed to have compressed files, but indeed if ... in ["bz2", "tar", "out"] it's better!

DrDomenicoMarson · 2022-10-02T09:13:54Z

According to https://stackoverflow.com/a/35188296 pathlib.Path.suffixes can give you all suffixes. Maybe it has a function to give you the basename or stem, too? I find it difficult to believe that the common case of multiple suffixes (path/dir/name.tar.bz2) doesn't have pre-built functionality in os.path or pathlib.

I just tried again to be sure, these are the results:

In [1]: from pathlib import Path

In [2]: p = Path("sgge/vrvs.sese/bwsbs.osg.e.tar.xz")

In [3]: p.suffixes
Out[3]: ['.osg', '.e', '.tar', '.xz']

In [4]: p.name
Out[4]: 'bwsbs.osg.e.tar.xz'

In [5]: p.stem
Out[5]: 'bwsbs.osg.e.tar'

DrDomenicoMarson · 2022-10-09T15:54:10Z

@xiki-tempula are there any other problems with this PR?

orbeckst

Looks pretty good, just a few minor fixes and a suggestion to generate the keys in a simpler fashion.

src/alchemtest/amber/access.py

src/alchemtest/amber/testfiles/descr.rst

xiki-tempula · 2022-10-10T17:46:32Z

@DrDomenicoMarson My major concern is this part

    data = {}
    for f in testfiles_path.iterdir():
        f_path = str(f)
        suffixes = (".out", ".tar", ".bz2")
        if f.suffix in suffixes:
            while f.suffix in suffixes:
                f = f.with_suffix('')
            data[f.name] = [f_path]

I don't want a function that cleverly trims the file to show the original extension to be present at the edge of the project. Later on, someone might want to have a same function for Gromacs or other MD Engines and they will add their own version there and it will make the project hard to maintain.

My suggestion is to either make an explicit statement
data['no_useful_data'] = [module_path / 'testfiles' / 'no_useful_data.out.tar.bz2', ]
So people could look at the code and immediately know what are the available files in this function.

Or a very simple one like

data = {}
for f in testfiles_path.glob('*.bz2'):
    data[f.stem.split('.')[0]] = [str(f),]

And discourage people from using . in the file name.

A related thing is that I'm not a huge fan of zipping things up #68 (comment). if we don't create a bz2 file, then this part could just be

data = {}
for f in testfiles_path.glob('*.out'):
   data[f.stem] = [str(f),]

DrDomenicoMarson · 2022-10-12T17:10:41Z

Sorry for the late reply, but I made the adjustments and now the key is created "by hand" as requested!

orbeckst

Thank you for all the hard work and many iterations. I think we arrived at a clear way to transition to more expressive AMBER tests. Looks good to me!

xiki-tempula

Thanks for the many iterations. Just change the out.tar.bz2 to out.bz2 and this PR is ready to be merged.

xiki-tempula · 2022-10-12T19:25:09Z

src/alchemtest/amber/access.py

+        DeprecationWarning)
+    module_path = Path(__file__).parent
+    data = [[
+        module_path / 'testfiles' / 'no_useful_data.out.tar.bz2',


Just change these file names from *.out.tar.bz2 to *.out.bz2 so it is consistent with other files,

xiki-tempula

LGTM. Thanks for all the work.

orbeckst · 2022-10-18T22:44:36Z

@xiki-tempula I think you merged this PR instead of squash-merging so our history now contains lots of small commits. In the future, can you please double-check that you're squashing everything unless the PR has been cleaned up into self-contained commits? Thanks.

xiki-tempula · 2022-10-19T08:56:17Z

@orbeckst Sorry, I was under the impression that the default action would be squash and merge. I guess I have used bitbucket too much recently and will double check next time when I merge a PR.

orbeckst · 2022-10-19T15:59:56Z

I also think that to be the case but it's worthwhile double checking every time.

…r-none-non-finished Added the capability of loading a single test case for AMBER

DrDomenicoMarson added 4 commits September 24, 2022 18:47

added new test files and single file load capabil

c2a4f6a

removed .out not compressed file

61f5288

created the load_testfile function

819b2ad

fixed to comply to sphinx

423c13c

orbeckst mentioned this pull request Sep 27, 2022

Introduced the "extract" function to all the parsers, with relative tests alchemistry/alchemlyb#240

Merged

compressed

7ec895a

orbeckst requested changes Sep 27, 2022

View reviewed changes

orbeckst mentioned this pull request Sep 27, 2022

CI: add pypi package check #72

Merged

Merge branch 'master' into add-amber-mbar-none-non-finished

5de020b

orbeckst mentioned this pull request Sep 27, 2022

increase coverage to 100% (test ALL AMBER files and all of __init__) #74

Merged

orbeckst and others added 6 commits September 28, 2022 07:56

Merge branch 'master' into add-amber-mbar-none-non-finished

6cb3792

load_testfile now is compliant with the directives

be6b856

Merge branch 'add-amber-mbar-none-non-finished' of github.com:DrDomen…

98b8d4e

…icoMarson/alchemtestDM into add-amber-mbar-none-non-finished

fixed a bug in retriving file name

ea8f255

reintroduced invalidfiles

c761bf1

updated the docs

8957129

orbeckst requested changes Sep 28, 2022

View reviewed changes

DrDomenicoMarson and others added 2 commits September 28, 2022 23:54

Update src/alchemtest/amber/access.py

32a1094

Co-authored-by: Oliver Beckstein <[email protected]>

Update src/alchemtest/amber/testfiles/descr.rst

28b3611

Co-authored-by: Oliver Beckstein <[email protected]>

DrDomenicoMarson added 3 commits October 1, 2022 15:51

got rid of listdir

ee0bd39

Merge branch 'add-amber-mbar-none-non-finished' of github.com:DrDomen…

0abec73

…icoMarson/alchemtestDM into add-amber-mbar-none-non-finished

removed a debugging print

e9d32cc

xiki-tempula mentioned this pull request Oct 1, 2022

NEP29 #76

Merged

xiki-tempula reviewed Oct 1, 2022

View reviewed changes

added 'out' and 'tar' as possible suffixes

b50d98b

xiki-tempula mentioned this pull request Oct 2, 2022

AMBER - in "simplesolvated" directory all files are not compressed #68

Closed

orbeckst requested changes Oct 9, 2022

View reviewed changes

This was linked to issues Oct 9, 2022

AMBER tests could be clearer #65

Closed

AMBER - more test file are needed #69

Closed

Merge branch 'master' into add-amber-mbar-none-non-finished

958edbe

removed auto key generation

ec32c47

orbeckst approved these changes Oct 12, 2022

View reviewed changes

xiki-tempula requested changes Oct 12, 2022

View reviewed changes

changed extension

f711a06

xiki-tempula approved these changes Oct 16, 2022

View reviewed changes

xiki-tempula merged commit 6407137 into alchemistry:master Oct 16, 2022

DrDomenicoMarson deleted the add-amber-mbar-none-non-finished branch October 16, 2022 09:18

xiki-tempula added a commit to xiki-tempula/alchemtest that referenced this pull request Oct 21, 2022

Merge pull request alchemistry#71 from DrDomenicoMarson/add-amber-mba…

db67bd1

…r-none-non-finished Added the capability of loading a single test case for AMBER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the capability of loading a single test case for AMBER #71

Added the capability of loading a single test case for AMBER #71

DrDomenicoMarson commented Sep 25, 2022

DrDomenicoMarson commented Sep 27, 2022

DrDomenicoMarson commented Sep 27, 2022

orbeckst commented Sep 27, 2022

orbeckst left a comment

orbeckst commented Sep 27, 2022 •

edited

Loading

orbeckst commented Sep 27, 2022

orbeckst commented Sep 27, 2022

orbeckst commented Sep 27, 2022 •

edited

Loading

orbeckst commented Sep 27, 2022

orbeckst commented Sep 27, 2022

DrDomenicoMarson commented Sep 28, 2022

codecov bot commented Sep 28, 2022 •

edited

Loading

orbeckst left a comment

orbeckst Sep 28, 2022

DrDomenicoMarson Sep 29, 2022

xiki-tempula Oct 1, 2022

orbeckst commented Oct 1, 2022 via email

DrDomenicoMarson commented Oct 1, 2022

orbeckst commented Oct 1, 2022

DrDomenicoMarson commented Oct 1, 2022 •

edited

Loading

DrDomenicoMarson commented Oct 2, 2022

DrDomenicoMarson commented Oct 9, 2022

orbeckst left a comment

xiki-tempula commented Oct 10, 2022 •

edited

Loading

DrDomenicoMarson commented Oct 12, 2022

orbeckst left a comment

xiki-tempula left a comment •

edited

Loading

xiki-tempula Oct 12, 2022 •

edited

Loading

DrDomenicoMarson Oct 16, 2022

xiki-tempula left a comment

orbeckst commented Oct 18, 2022

xiki-tempula commented Oct 19, 2022

orbeckst commented Oct 19, 2022

Added the capability of loading a single test case for AMBER #71

Added the capability of loading a single test case for AMBER #71

Conversation

DrDomenicoMarson commented Sep 25, 2022

DrDomenicoMarson commented Sep 27, 2022

DrDomenicoMarson commented Sep 27, 2022

orbeckst commented Sep 27, 2022

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Sep 27, 2022 • edited Loading

orbeckst commented Sep 27, 2022

orbeckst commented Sep 27, 2022

orbeckst commented Sep 27, 2022 • edited Loading

orbeckst commented Sep 27, 2022

orbeckst commented Sep 27, 2022

DrDomenicoMarson commented Sep 28, 2022

codecov bot commented Sep 28, 2022 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst Sep 28, 2022

Choose a reason for hiding this comment

DrDomenicoMarson Sep 29, 2022

Choose a reason for hiding this comment

xiki-tempula Oct 1, 2022

Choose a reason for hiding this comment

orbeckst commented Oct 1, 2022 via email

DrDomenicoMarson commented Oct 1, 2022

orbeckst commented Oct 1, 2022

DrDomenicoMarson commented Oct 1, 2022 • edited Loading

DrDomenicoMarson commented Oct 2, 2022

DrDomenicoMarson commented Oct 9, 2022

orbeckst left a comment

Choose a reason for hiding this comment

xiki-tempula commented Oct 10, 2022 • edited Loading

DrDomenicoMarson commented Oct 12, 2022

orbeckst left a comment

Choose a reason for hiding this comment

xiki-tempula left a comment • edited Loading

Choose a reason for hiding this comment

xiki-tempula Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

DrDomenicoMarson Oct 16, 2022

Choose a reason for hiding this comment

xiki-tempula left a comment

Choose a reason for hiding this comment

orbeckst commented Oct 18, 2022

xiki-tempula commented Oct 19, 2022

orbeckst commented Oct 19, 2022

orbeckst commented Sep 27, 2022 •

edited

Loading

orbeckst commented Sep 27, 2022 •

edited

Loading

codecov bot commented Sep 28, 2022 •

edited

Loading

DrDomenicoMarson commented Oct 1, 2022 •

edited

Loading

xiki-tempula commented Oct 10, 2022 •

edited

Loading

xiki-tempula left a comment •

edited

Loading

xiki-tempula Oct 12, 2022 •

edited

Loading