-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD return values -- what should they be #588
Comments
@pfebrer your inputs here would be valuable, I know you have commented on something like this before, here a more targeted issue. |
I think lists are easy and intuitive for multiple |
I think the main idea about a container would be that it can allow some arbitrary functions. For instance one can do: gcoll = something.read_geometry[:]()
sub_gcoll = gcol.applymap(Geometry.sub, atoms=[1, 2, 3]) to extract some atoms for all geometries in a new collection. The Collection was added in #546, should we revert it and simply return the list? |
I guess I do not see a clear advantage, rather some additional complexity without a clear use case. With lists one can achieve the mapping with [g.sub([1, 2, 3]) for g in gcoll] |
For trajectories I think the best would be to introduce an extra dimension (MD step) in arrays like the coordinates. It would make everything simpler and more efficient. But as you said perhaps For general collections of geometries, I don't know, because I have no use case in mind. Unless you have a clear use case, I believe lists are fine for now. Whatever I agree that if the only purpose of this collection is to map functions it is probably not worth it. I would wait to have some more interesting usage in mind before converting this list to a custom class. The list can always be upgraded in the future, instead needing to "downgrade" from custom class to list will most likely cause problems. |
So, in the first message of this issue, for trajectories I would prefer something similar to the second snippet. But perhaps that should come as an output of a method called Every time I deal with trajectories I like to organize my data in |
Ok, I see. I can see the benefit of xarray, perhaps now is the time to step into that regime. My proposal would then be to:
|
I didn't know, but See the example here: https://docs.xarray.dev/en/stable/internals/duck-arrays-integration.html#integrating-with-duck-arrays And in principle one can generate the sparse dataset from a dataframe: https://docs.xarray.dev/en/stable/generated/xarray.Dataset.from_dataframe.html#xarray.Dataset.from_dataframe So that would be suitable for the SCF data. When I have time I can try to play with it in |
I am not sure that would be great going forward. That would mean that all scf data is contained in a sparse array, which isn't particularly fast. I think it would be wiser to do a list of Dataset, for easier interaction. I might be wrong, but it just feels weird to use a sparse array to have consecutive data that is truncated at different levels... |
Hmm I don't know, because with a list of Datasets it is not easy to do operations across MD steps. I don't know if the sparse dataset is a good choice, but a pandas dataframe for example seems better than a list of datasets. |
Perhaps one should just not be able to extract MD + SCF at the same time, either MD or SCF. |
Hmm I don't know, I have used the |
Hmm.. I don't know, yeah probably fine with that then. :) |
The problem is that we need to converge the returned values in some way.
Currently some methods returns
Now what should we do about multiple returns.
See below for two ideas, which would we prefer.
Things to consider:
easy to work with
get performant code (ideally they could be
(md, ...)
arrays)consistency across the outputs
You would prefer this?
Originally posted by @tfrederiksen in #573 (comment)
List of actions:
Collection
and friends, users now need to manually write many geometriesThe text was updated successfully, but these errors were encountered: