Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the capability of loading a single test case for AMBER #71

Conversation

DrDomenicoMarson
Copy link
Contributor

This PR addresses issues #65 and #69. I removed the load_invalidfiles function and added the new load_testfile, putting all the single files that can be loaded inside the directory testfiles.

Currently, I left the files unzipped, but I shrank them removing unnecessary lines.

As it is, the docs also build fine. I have a doubt though.

To keep everything in line with the other functions, my load_testfile also returns a dictionary with data and DESCR, but also I put in the testfiles directory a descr.rst file, that is read to build the docs.

To have the function return the dictionary, I also created a descr file for each .out file, that are loaded together to populate the dict. in such descr file I wrote the simple description of the out file it's referring to. In this way the documentation is pretty scarse, as the common descr file is almost empty. I could write the description of each file also inside the common descr file, but this would be redundant. Wouldn't be better if I just:

  • place all the descriptions of the files in the common descr
  • make the load_testfile function returns just the file itself, without descr,
  • remove the now useless single desc file for each out file

I don't know if it's mandatory for the loading function to return a dictionary thought, in that case maybe it's ok how it is now!

This PR has to be "synchronized" with a PR also in alchemlyb, because test_amber.py there have to be updated to use this new function (I have this modification ready for when the current PR about amber parser will be accepted).

@DrDomenicoMarson
Copy link
Contributor Author

I changed this a little bit, compressing all the new test files, and accordingly changed the loading function to search for the compressed version of the file!

@DrDomenicoMarson
Copy link
Contributor Author

@xiki-tempula @orbeckst what do you think it's the best approach for the descr "problem"? Should I have a single descr file, multiple descr files or both?

@orbeckst
Copy link
Member

See my #65 (comment)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #65 (comment), in brief

  • deprecate load_invalidfiles()
  • add new load_testfiles(singlefilekey=None) (assuming @xiki-tempula supports the syntax and sees the need for having the function return a single file — I don't quite see the need)
  • give the test files better key names

Changes should be made so that alchemlyb tests don't break.

@orbeckst
Copy link
Member

orbeckst commented Sep 27, 2022

I don't know if it's mandatory for the loading function to return a dictionary thought, in that case maybe it's ok how it is now!

All loading function should return a bunch. Consistency in the API is very important.

Please check https://alchemtest.readthedocs.io/en/latest/contributing.html — your code refactor needs to fit the description. We don't want to get into a situation where datasets get added in an arbitrary manner.

In particular

Add an accessor function load_MYDATASET() to the access.py file at the top of the code directory. The accessor function makes the dataset available as a dict under the data key in the Bunch. The data are typically another dict with different parts of a calculation such as Coulomb and VDW parts being different keys in a dictionary. All files that are needed for a single free energy calculation are in a list under the appropriate key. The description text is added as the DESCR key.

@orbeckst
Copy link
Member

Btw, I don't have a problem with renaming files — we are only providing access through the accessors so that means that the filename itself does not matter.

If you think that the invalid-case-N.out.bz2 could be named better, just rename to invalid-case-N-New_description_of_case.out.bz2 and leave the initial part of the file name the same so that it's easy to understand where the files came from.

@orbeckst
Copy link
Member

For some reason, CI is not running the tests on this PR at the moment. I hope that when PR #72 is merged (and then merged into this PR), that will magically change...

@orbeckst
Copy link
Member

orbeckst commented Sep 27, 2022

I added required checks — this seemed to have started the available ones. EDIT: nope, only lists them as "Expected — Waiting for status to be updated". So this will hopefully run after PR #72.

There are missing ones which will be added with PR #72

@orbeckst
Copy link
Member

Tests are running now — please check for failures.

@orbeckst
Copy link
Member

Well... looking through the AMBER files (with an eye towards #49 ) and in particular load_bace_example() and load_invalid_files() do not even follow the API.

  • load_bace_example() contains another dict where there ought to be a list of files
  • load_invalid_files() is just a list of files (and following @xiki-tempula 's suggestion would actually become compatible and would be easily testable with the testing base class BaseDatasetTest)

But load_bace_example() admittedly breaks the defined API.

My take is that I'd rather have one documented outlier and strive for a clean API adherence in the future than to declare that "anything goes". But I am open to different arguments.

@DrDomenicoMarson
Copy link
Contributor Author

Dear @orbeckst and @xiki-tempula,
I reformatted the load_testfiles() function, now it returns a Bunch, where data is a dictionary with
key (name of the file without .out.tar.bz2): [path to the file]

So now we can access the single files with load_testfiles().data['file_I_want'][0]

The descr.rst file now lists all the possible files, with a brief description for each of them, like it was previously for load_invalidfiles.

I checked the docs and it seems fine, I added the deprecation to load_invalidfiles (and to the directory which had the invalidfiles), so merging this PR will not break existing tests, while I have prepared a new test_amber.py that uses the new function, which I'll submit after this is closed, so after that, it will be safe to remove load_testfiles and its directory!

@codecov
Copy link

codecov bot commented Sep 28, 2022

Codecov Report

Merging #71 (f711a06) into master (57458ef) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
+ Coverage   99.34%   99.37%   +0.03%     
==========================================
  Files          11       11              
  Lines         152      160       +8     
  Branches       18       19       +1     
==========================================
+ Hits          151      159       +8     
  Misses          1        1              
Flag Coverage Δ
unittests 99.37% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/alchemtest/amber/__init__.py 100.00% <100.00%> (ø)
src/alchemtest/amber/access.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good but please address the following (in addition to the inline comments)

  • Let's not have invalidfiles and testfiles directories in parallel. Just remove invalidfiles and change the file names inside load_invalid(). Just make sure that the order of files in the list that was returned corresponds to the previous ordering.
  • Add tests, see test_amber.py; you should be able to add load_testfiles() just to the generic TestAmber unit test.

src/alchemtest/amber/access.py Outdated Show resolved Hide resolved

.. deprecated:: 0.7
substituted by laod_testfiles

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the start of the function, issue a DeprecationWarning

warnings.warn("load_invalidfiles() was deprecated in 0.7.0 and will be removed in the following release. Use load_testfiles() instead", DeprecationWarning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/testfiles/descr.rst Outdated Show resolved Hide resolved
@xiki-tempula xiki-tempula mentioned this pull request Oct 1, 2022
if f.suffix==".bz2":
while f.suffix in ('.tar', '.bz2', '.out'):
f = f.with_suffix('')
data[f.name] = [f_path]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what I was wondering is that
if f has suffix of '.tar' or '.out', they won't pass the if f.suffix==".bz2": in the first place?

@orbeckst
Copy link
Member

orbeckst commented Oct 1, 2022 via email

@DrDomenicoMarson
Copy link
Contributor Author

Quick comment: there should be a more robust way to split all extensions — something like splitext.

I searched for a better method, but I found nothing, the default ones just split the file removing only the last item after the last "."!

@orbeckst
Copy link
Member

orbeckst commented Oct 1, 2022

According to https://stackoverflow.com/a/35188296 pathlib.Path.suffixes can give you all suffixes. Maybe it has a function to give you the basename or stem, too? I find it difficult to believe that the common case of multiple suffixes (path/dir/name.tar.bz2) doesn't have pre-built functionality in os.path or pathlib.

@DrDomenicoMarson
Copy link
Contributor Author

DrDomenicoMarson commented Oct 1, 2022

I tried, it just gives a list of the basename splitted at ".", so in the not impossible situation of a "." used inside a filename the output is not what one should expect.

Regarding the comment of @xiki-tempula, it's true, I only checked for "bz2" as final extension because we agreed to have compressed files, but indeed if ... in ["bz2", "tar", "out"] it's better!

@DrDomenicoMarson
Copy link
Contributor Author

According to https://stackoverflow.com/a/35188296 pathlib.Path.suffixes can give you all suffixes. Maybe it has a function to give you the basename or stem, too? I find it difficult to believe that the common case of multiple suffixes (path/dir/name.tar.bz2) doesn't have pre-built functionality in os.path or pathlib.

I just tried again to be sure, these are the results:

In [1]: from pathlib import Path

In [2]: p = Path("sgge/vrvs.sese/bwsbs.osg.e.tar.xz")

In [3]: p.suffixes
Out[3]: ['.osg', '.e', '.tar', '.xz']

In [4]: p.name
Out[4]: 'bwsbs.osg.e.tar.xz'

In [5]: p.stem
Out[5]: 'bwsbs.osg.e.tar'

@DrDomenicoMarson
Copy link
Contributor Author

@xiki-tempula are there any other problems with this PR?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, just a few minor fixes and a suggestion to generate the keys in a simpler fashion.

src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/access.py Outdated Show resolved Hide resolved
src/alchemtest/amber/testfiles/descr.rst Outdated Show resolved Hide resolved
src/alchemtest/amber/testfiles/descr.rst Outdated Show resolved Hide resolved
This was linked to issues Oct 9, 2022
@xiki-tempula
Copy link
Collaborator

xiki-tempula commented Oct 10, 2022

@DrDomenicoMarson My major concern is this part

    data = {}
    for f in testfiles_path.iterdir():
        f_path = str(f)
        suffixes = (".out", ".tar", ".bz2")
        if f.suffix in suffixes:
            while f.suffix in suffixes:
                f = f.with_suffix('')
            data[f.name] = [f_path]

I don't want a function that cleverly trims the file to show the original extension to be present at the edge of the project. Later on, someone might want to have a same function for Gromacs or other MD Engines and they will add their own version there and it will make the project hard to maintain.

My suggestion is to either make an explicit statement
data['no_useful_data'] = [module_path / 'testfiles' / 'no_useful_data.out.tar.bz2', ]
So people could look at the code and immediately know what are the available files in this function.

Or a very simple one like

data = {}
for f in testfiles_path.glob('*.bz2'):
    data[f.stem.split('.')[0]] = [str(f),]

And discourage people from using . in the file name.

A related thing is that I'm not a huge fan of zipping things up #68 (comment). if we don't create a bz2 file, then this part could just be

data = {}
for f in testfiles_path.glob('*.out'):
   data[f.stem] = [str(f),]

@DrDomenicoMarson
Copy link
Contributor Author

Sorry for the late reply, but I made the adjustments and now the key is created "by hand" as requested!

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the hard work and many iterations. I think we arrived at a clear way to transition to more expressive AMBER tests. Looks good to me!

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the many iterations. Just change the out.tar.bz2 to out.bz2 and this PR is ready to be merged.

DeprecationWarning)
module_path = Path(__file__).parent
data = [[
module_path / 'testfiles' / 'no_useful_data.out.tar.bz2',
Copy link
Collaborator

@xiki-tempula xiki-tempula Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change these file names from *.out.tar.bz2 to *.out.bz2 so it is consistent with other files,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all the work.

@xiki-tempula xiki-tempula merged commit 6407137 into alchemistry:master Oct 16, 2022
@DrDomenicoMarson DrDomenicoMarson deleted the add-amber-mbar-none-non-finished branch October 16, 2022 09:18
@orbeckst
Copy link
Member

@xiki-tempula I think you merged this PR instead of squash-merging so our history now contains lots of small commits. In the future, can you please double-check that you're squashing everything unless the PR has been cleaned up into self-contained commits? Thanks.

@xiki-tempula
Copy link
Collaborator

@orbeckst Sorry, I was under the impression that the default action would be squash and merge. I guess I have used bitbucket too much recently and will double check next time when I merge a PR.

@orbeckst
Copy link
Member

I also think that to be the case but it's worthwhile double checking every time.

xiki-tempula added a commit to xiki-tempula/alchemtest that referenced this pull request Oct 21, 2022
…r-none-non-finished

Added the capability of loading a single test case for AMBER
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AMBER - more test file are needed AMBER tests could be clearer
3 participants