-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the capability of loading a single test case for AMBER #71
Added the capability of loading a single test case for AMBER #71
Conversation
I changed this a little bit, compressing all the new test files, and accordingly changed the loading function to search for the compressed version of the file! |
@xiki-tempula @orbeckst what do you think it's the best approach for the |
See my #65 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #65 (comment), in brief
- deprecate load_invalidfiles()
- add new load_testfiles(singlefilekey=None) (assuming @xiki-tempula supports the syntax and sees the need for having the function return a single file — I don't quite see the need)
- give the test files better key names
Changes should be made so that alchemlyb tests don't break.
All loading function should return a bunch. Consistency in the API is very important. Please check https://alchemtest.readthedocs.io/en/latest/contributing.html — your code refactor needs to fit the description. We don't want to get into a situation where datasets get added in an arbitrary manner. In particular
|
Btw, I don't have a problem with renaming files — we are only providing access through the accessors so that means that the filename itself does not matter. If you think that the invalid-case-N.out.bz2 could be named better, just rename to invalid-case-N-New_description_of_case.out.bz2 and leave the initial part of the file name the same so that it's easy to understand where the files came from. |
For some reason, CI is not running the tests on this PR at the moment. I hope that when PR #72 is merged (and then merged into this PR), that will magically change... |
Tests are running now — please check for failures. |
Well... looking through the AMBER files (with an eye towards #49 ) and in particular
But My take is that I'd rather have one documented outlier and strive for a clean API adherence in the future than to declare that "anything goes". But I am open to different arguments. |
…icoMarson/alchemtestDM into add-amber-mbar-none-non-finished
Dear @orbeckst and @xiki-tempula, So now we can access the single files with load_testfiles().data['file_I_want'][0] The I checked the docs and it seems fine, I added the deprecation to load_invalidfiles (and to the directory which had the invalidfiles), so merging this PR will not break existing tests, while I have prepared a new |
Codecov Report
@@ Coverage Diff @@
## master #71 +/- ##
==========================================
+ Coverage 99.34% 99.37% +0.03%
==========================================
Files 11 11
Lines 152 160 +8
Branches 18 19 +1
==========================================
+ Hits 151 159 +8
Misses 1 1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good but please address the following (in addition to the inline comments)
- Let's not have
invalidfiles
andtestfiles
directories in parallel. Just removeinvalidfiles
and change the file names insideload_invalid()
. Just make sure that the order of files in the list that was returned corresponds to the previous ordering. - Add tests, see
test_amber.py
; you should be able to addload_testfiles()
just to the genericTestAmber
unit test.
|
||
.. deprecated:: 0.7 | ||
substituted by laod_testfiles | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the start of the function, issue a DeprecationWarning
warnings.warn("load_invalidfiles() was deprecated in 0.7.0 and will be removed in the following release. Use load_testfiles() instead", DeprecationWarning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Co-authored-by: Oliver Beckstein <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
…icoMarson/alchemtestDM into add-amber-mbar-none-non-finished
src/alchemtest/amber/access.py
Outdated
if f.suffix==".bz2": | ||
while f.suffix in ('.tar', '.bz2', '.out'): | ||
f = f.with_suffix('') | ||
data[f.name] = [f_path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what I was wondering is that
if f has suffix of '.tar'
or '.out'
, they won't pass the if f.suffix==".bz2":
in the first place?
Quick comment: there should be a more robust way to split all extensions — something like splitext.
…--
Oliver Beckstein * ***@***.***
https://becksteinlab.physics.asu.edu
On Oct 1, 2022, at 2:45 PM, Zhiyi Wu ***@***.***> wrote:
@xiki-tempula commented on this pull request.
In src/alchemtest/amber/access.py:
> + Dictionary-like object, the interesting attributes are:
+
+ - 'data' : the data files
+ - 'DESCR': the full description of all the files
+
+ """
+
+ testfiles_path = Path(__file__).parent / 'testfiles'
+
+ data = {}
+ for f in testfiles_path.iterdir():
+ f_path = str(f)
+ if f.suffix==".bz2":
+ while f.suffix in ('.tar', '.bz2', '.out'):
+ f = f.with_suffix('')
+ data[f.name] = [f_path]
So what I was wondering is that
if f has suffix of '.tar' or '.out', they won't pass the if f.suffix==".bz2": in the first place?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
I searched for a better method, but I found nothing, the default ones just split the file removing only the last item after the last "."! |
According to https://stackoverflow.com/a/35188296 |
I tried, it just gives a list of the basename splitted at ".", so in the not impossible situation of a "." used inside a filename the output is not what one should expect. Regarding the comment of @xiki-tempula, it's true, I only checked for "bz2" as final extension because we agreed to have compressed files, but indeed |
I just tried again to be sure, these are the results: In [1]: from pathlib import Path
In [2]: p = Path("sgge/vrvs.sese/bwsbs.osg.e.tar.xz")
In [3]: p.suffixes
Out[3]: ['.osg', '.e', '.tar', '.xz']
In [4]: p.name
Out[4]: 'bwsbs.osg.e.tar.xz'
In [5]: p.stem
Out[5]: 'bwsbs.osg.e.tar' |
@xiki-tempula are there any other problems with this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, just a few minor fixes and a suggestion to generate the keys in a simpler fashion.
@DrDomenicoMarson My major concern is this part
I don't want a function that cleverly trims the file to show the original extension to be present at the edge of the project. Later on, someone might want to have a same function for Gromacs or other MD Engines and they will add their own version there and it will make the project hard to maintain. My suggestion is to either make an explicit statement Or a very simple one like
And discourage people from using A related thing is that I'm not a huge fan of zipping things up #68 (comment). if we don't create a bz2 file, then this part could just be
|
Sorry for the late reply, but I made the adjustments and now the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the hard work and many iterations. I think we arrived at a clear way to transition to more expressive AMBER tests. Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the many iterations. Just change the out.tar.bz2 to out.bz2 and this PR is ready to be merged.
src/alchemtest/amber/access.py
Outdated
DeprecationWarning) | ||
module_path = Path(__file__).parent | ||
data = [[ | ||
module_path / 'testfiles' / 'no_useful_data.out.tar.bz2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just change these file names from *.out.tar.bz2
to *.out.bz2
so it is consistent with other files,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for all the work.
@xiki-tempula I think you merged this PR instead of squash-merging so our history now contains lots of small commits. In the future, can you please double-check that you're squashing everything unless the PR has been cleaned up into self-contained commits? Thanks. |
@orbeckst Sorry, I was under the impression that the default action would be squash and merge. I guess I have used bitbucket too much recently and will double check next time when I merge a PR. |
I also think that to be the case but it's worthwhile double checking every time. |
…r-none-non-finished Added the capability of loading a single test case for AMBER
This PR addresses issues #65 and #69. I removed the
load_invalidfiles
function and added the newload_testfile
, putting all the single files that can be loaded inside the directorytestfiles
.Currently, I left the files unzipped, but I shrank them removing unnecessary lines.
As it is, the docs also build fine. I have a doubt though.
To keep everything in line with the other functions, my
load_testfile
also returns a dictionary withdata
andDESCR
, but also I put in thetestfiles
directory adescr.rst
file, that is read to build the docs.To have the function return the dictionary, I also created a
descr
file for each.out
file, that are loaded together to populate the dict. in suchdescr
file I wrote the simple description of theout
file it's referring to. In this way the documentation is pretty scarse, as the commondescr
file is almost empty. I could write the description of each file also inside the commondescr
file, but this would be redundant. Wouldn't be better if I just:descr
load_testfile
function returns just the file itself, without descr,desc
file for eachout
fileI don't know if it's mandatory for the loading function to return a dictionary thought, in that case maybe it's ok how it is now!
This PR has to be "synchronized" with a PR also in alchemlyb, because
test_amber.py
there have to be updated to use this new function (I have this modification ready for when the current PR about amber parser will be accepted).