MD return values -- what should they be #588

zerothi · 2023-06-20T19:25:40Z

The problem is that we need to converge the returned values in some way.

Currently some methods returns

lists
collection
geometrycollection

Now what should we do about multiple returns.
See below for two ideas, which would we prefer.

Things to consider:

easy to work with
get performant code (ideally they could be (md, ...) arrays)

consistency across the outputs

        I've addressed all comments except this one as I am not sure we want/need to unpack. For instance, the current behaviour seems logical to me:

traj = stdoutSileVASP(f).read_trajectory[:3]()
for i in range(3):
    print(traj[i].force)
# or
for step in traj:
    print(step.force)

You would prefer this?

traj = stdoutSileVASP(f).read_trajectory[:3]()
for i in range(3):
    print(traj.force[i])
# or
for F in traj.force:
    print(F)

Originally posted by @tfrederiksen in #573 (comment)

List of actions:

removed Collection + GeometryCollection #589 remove Collection and friends, users now need to manually write many geometries
Dataset attempt

The text was updated successfully, but these errors were encountered:

zerothi · 2023-06-20T19:26:03Z

@pfebrer your inputs here would be valuable, I know you have commented on something like this before, here a more targeted issue.

tfrederiksen · 2023-06-20T19:53:34Z

I think lists are easy and intuitive for multiple read_* calls. As long as the individual functions are clear what a single call produces (Geometry, PropertyDict, ndarray, etc) I would let the user do the postprocessing (or provide the postprocess function). Are there any particular concerns with this?

zerothi · 2023-06-20T19:58:48Z

I think the main idea about a container would be that it can allow some arbitrary functions.

For instance one can do:

gcoll = something.read_geometry[:]()
sub_gcoll = gcol.applymap(Geometry.sub, atoms=[1, 2, 3])

to extract some atoms for all geometries in a new collection.

The Collection was added in #546, should we revert it and simply return the list?

tfrederiksen · 2023-06-20T20:49:52Z

I guess I do not see a clear advantage, rather some additional complexity without a clear use case.

With lists one can achieve the mapping with

[g.sub([1, 2, 3]) for g in gcoll]

pfebrer · 2023-06-20T21:41:35Z

For trajectories I think the best would be to introduce an extra dimension (MD step) in arrays like the coordinates. It would make everything simpler and more efficient. But as you said perhaps sisl shouldn't focus on supporting trajectories and it should be managed externally.

For general collections of geometries, I don't know, because I have no use case in mind. Unless you have a clear use case, I believe lists are fine for now. Whatever Collection class that is implemented in the future will probably have all the functionality that a list has (i.e. it will be iterable, you will be able to retreive items using indices...) so the probability of backward compatibility problems is very low.

I agree that if the only purpose of this collection is to map functions it is probably not worth it. I would wait to have some more interesting usage in mind before converting this list to a custom class. The list can always be upgraded in the future, instead needing to "downgrade" from custom class to list will most likely cause problems.

pfebrer · 2023-06-20T21:57:18Z

So, in the first message of this issue, for trajectories I would prefer something similar to the second snippet. But perhaps that should come as an output of a method called read_trajectory, e.g. read_trajectory[:](), not just the general read_geometry[:]().

Every time I deal with trajectories I like to organize my data in xarray.Datasets because they allow me to do lots of operations easily. So traj in the snippet could be a Dataset, then the amount of things you need to implement in sisl gets very reduced. For example, with a Dataset you could also easily loop over all the data by MD step as in the first snippet if you want.

zerothi · 2023-06-21T05:46:01Z

Ok, I see.
Lets delete the Collection class.

I can see the benefit of xarray, perhaps now is the time to step into that regime.
The only problem with the dataset is when requesting data from MD + SCF, then there are variable SCF steps for each MD, and hence it might be a bit problematic in that case? There some other solution is necessary. Perhaps a list [MD] of datasets [SCF].

My proposal would then be to:

read_geometry returns a list
read_* returns a dataset where applicable, otherwise a list of dataset, datasets should ideally be complete with attributes describing origin, full details etc.

pfebrer · 2023-06-21T09:26:41Z

I didn't know, but xarray can have sparse arrays as variables.

See the example here: https://docs.xarray.dev/en/stable/internals/duck-arrays-integration.html#integrating-with-duck-arrays

And in principle one can generate the sparse dataset from a dataframe: https://docs.xarray.dev/en/stable/generated/xarray.Dataset.from_dataframe.html#xarray.Dataset.from_dataframe

So that would be suitable for the SCF data.

When I have time I can try to play with it in outSileSiesta.read_scf, because I don't know how it would actually work.

zerothi · 2023-06-21T09:28:59Z

I am not sure that would be great going forward. That would mean that all scf data is contained in a sparse array, which isn't particularly fast.

I think it would be wiser to do a list of Dataset, for easier interaction. I might be wrong, but it just feels weird to use a sparse array to have consecutive data that is truncated at different levels...

pfebrer · 2023-06-21T10:13:04Z

Hmm I don't know, because with a list of Datasets it is not easy to do operations across MD steps.

I don't know if the sparse dataset is a good choice, but a pandas dataframe for example seems better than a list of datasets.

zerothi · 2023-06-21T10:15:10Z

Perhaps one should just not be able to extract MD + SCF at the same time, either MD or SCF.

pfebrer · 2023-06-21T10:20:25Z

Hmm I don't know, I have used the outSileSiesta.read_scf(as_dataframe=True) and it worked great for me.

zerothi · 2023-06-21T10:27:25Z

Hmm.. I don't know, yeah probably fine with that then. :)
We should just have some pretty simple rules so users don't need to read documentation all the time.

zerothi mentioned this issue Jun 20, 2023

VASP forces from OUTCAR #573

Merged

4 tasks

zerothi added this to the v0.14 milestone Jun 20, 2023

zerothi modified the milestones: v0.14, v0.15 Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MD return values -- what should they be #588

MD return values -- what should they be #588

zerothi commented Jun 20, 2023 •

edited

Loading

zerothi commented Jun 20, 2023

tfrederiksen commented Jun 20, 2023

zerothi commented Jun 20, 2023

tfrederiksen commented Jun 20, 2023

pfebrer commented Jun 20, 2023 •

edited

Loading

pfebrer commented Jun 20, 2023 •

edited

Loading

zerothi commented Jun 21, 2023

pfebrer commented Jun 21, 2023

zerothi commented Jun 21, 2023

pfebrer commented Jun 21, 2023

zerothi commented Jun 21, 2023

pfebrer commented Jun 21, 2023

zerothi commented Jun 21, 2023

MD return values -- what should they be #588

MD return values -- what should they be #588

Comments

zerothi commented Jun 20, 2023 • edited Loading

zerothi commented Jun 20, 2023

tfrederiksen commented Jun 20, 2023

zerothi commented Jun 20, 2023

tfrederiksen commented Jun 20, 2023

pfebrer commented Jun 20, 2023 • edited Loading

pfebrer commented Jun 20, 2023 • edited Loading

zerothi commented Jun 21, 2023

pfebrer commented Jun 21, 2023

zerothi commented Jun 21, 2023

pfebrer commented Jun 21, 2023

zerothi commented Jun 21, 2023

pfebrer commented Jun 21, 2023

zerothi commented Jun 21, 2023

zerothi commented Jun 20, 2023 •

edited

Loading

pfebrer commented Jun 20, 2023 •

edited

Loading

pfebrer commented Jun 20, 2023 •

edited

Loading