Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing IO read_ routines #586

Merged
merged 7 commits into from
Jun 20, 2023
Merged

Slicing IO read_ routines #586

merged 7 commits into from
Jun 20, 2023

Conversation

zerothi
Copy link
Owner

@zerothi zerothi commented Jun 16, 2023

  • Closes Slicing read_* methods #584
  • Tests added
  • Documentation for functionality
    somewhat, more documentation would be great.
  • User visible changes (including notable bug fixes) are documented in CHANGELOG

@codecov
Copy link

codecov bot commented Jun 16, 2023

Codecov Report

Merging #586 (adb10c5) into main (1dae3e0) will decrease coverage by 0.02%.
The diff coverage is 95.03%.

@@            Coverage Diff             @@
##             main     #586      +/-   ##
==========================================
- Coverage   87.30%   87.28%   -0.02%     
==========================================
  Files         374      375       +1     
  Lines       47506    47564      +58     
==========================================
+ Hits        41474    41518      +44     
- Misses       6032     6046      +14     
Impacted Files Coverage Δ
src/sisl/io/_multiple.py 86.82% <86.82%> (ø)
src/sisl/io/xyz.py 95.78% <90.90%> (-1.81%) ⬇️
src/sisl/io/sile.py 84.82% <96.15%> (-0.68%) ⬇️
src/sisl/io/orca/stdout.py 97.61% <99.16%> (+1.22%) ⬆️
src/sisl/geometry.py 85.89% <100.00%> (+<0.01%) ⬆️
src/sisl/io/openmx/md.py 100.00% <100.00%> (ø)
src/sisl/io/orca/tests/test_stdout.py 100.00% <100.00%> (ø)
src/sisl/io/siesta/ani.py 100.00% <100.00%> (ø)
src/sisl/io/siesta/tests/test_ani.py 100.00% <100.00%> (ø)
src/sisl/io/siesta/tests/test_stdout.py 100.00% <100.00%> (ø)
... and 5 more

... and 4 files with indirect coverage changes

src/sisl/io/_multiple.py Fixed Show fixed Hide fixed
src/sisl/io/_multiple.py Fixed Show fixed Hide fixed
Copy link
Contributor

@tfrederiksen tfrederiksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit around with this and it looks really cool. Well done!

src/sisl/io/_multiple.py Show resolved Hide resolved
src/sisl/io/siesta/tests/test_ani.py Show resolved Hide resolved
@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

I don't see a reason to not use this everywhere, i.e. read_energy etc?

Would there be objections to this?

@tfrederiksen
Copy link
Contributor

I don't see a reason to not use this everywhere, i.e. read_energy etc?

Would there be objections to this?

I agree, no objections. I already tested this out on stdoutSileVASP.read_energy(). However, I tend to think it may cause confusion if we change the behaviour everywhere to return the first entry instead of the last (default up to now).

Maybe raise a warning for a while to make users aware that read_energy[-1]() is needed to get the last entry (that I suppose is the typical case)?

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

One can do:

@SileBinder(..., default_slice=-1)
def read_energy(...)

which should ensure that it defaults to return the last one, say read_energy() will then do that.

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

Where should we default to return the last? Everywhere, some of @pfebrer comments related to MD are valid.
However, I assume that those doing MD are aware of their needs and thus does require to return all.
I don't know if all should default to return the last item?

Note, that a local commit ensures that the default slice is written in the documentation of said routine, so it will automatically be added for the method.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 16, 2023

Wouldn't the default slice return the confusion of the arguments we have removed? I advocate for no defaults on this, then "legacy" code will be more likely to keep doing the same thing in the future. If in the future we decide that another default is better, it will be very difficult to debug changes in the defaults in already written code.

I would just issue a warning for some time as @tfrederiksen was suggesting (perhaps keep the current API with a deprecation warning).

Also in my case I would spend more time rechecking the documentation for the defaults of each sile than simply specifying what I want 😅

@zerothi

This comment was marked as resolved.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 16, 2023

By the way I don't know if it would be a good idea to use slices for cases where there is more than one dimension. E.g. in outSileSiesta.read_scf one could do read_scf[:, -1] meaning all MD steps, but just the last scf step.

It looks nice but it's a bit hard to know if it makes sense implementationwise because it would no longer be just a simple external iterator on the reading function.

@pfebrer

This comment was marked as resolved.

@zerothi

This comment was marked as resolved.

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

By the way I don't know if it would be a good idea to use slices for cases where there is more than one dimension. E.g. in outSileSiesta.read_scf one could do read_scf[:, -1] meaning all MD steps, but just the last scf step.

It looks nice but it's a bit hard to know if it makes sense implementationwise because it would no longer be just a simple external iterator on the reading function.

This is a fine idea, I think it could be implemented easily as the subsequent tuple keys are simply applied to the returned arrays. Lets see...

@pfebrer

This comment was marked as resolved.

@zerothi

This comment was marked as resolved.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 16, 2023

Would it be so bad to have a read_last_geometry() method if it makes sense for some Sile?

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

Would it be so bad to have a read_last_geometry() method if it makes sense for some Sile?

I am pretty sure that would make sense for all siles that uses the SileBinder :)
Hmm.. Perhaps you are right.

Ok, so default to only the next item, as it is now. The default_slice should then preferentially be used by extensions not in sisl.

@tfrederiksen
Copy link
Contributor

Would it be so bad to have a read_last_geometry() method if it makes sense for some Sile?

I think this is a good suggestion, making it explicit for the user what is looked up. In particular the warning could suggest users to adapt read_last_energy() in their scripts.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 16, 2023

I am pretty sure that would make sense for all siles that uses the SileBinder

Yeah maybe 😅

I guess if read_* functions only parse lines it will be also easier to combine them. For example you could implement something to read geometries and energies in the same pass through the file.

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

Would it be so bad to have a read_last_geometry() method if it makes sense for some Sile?

I think this is a good suggestion, making it explicit for the user what is looked up. In particular the warning could suggest users to adapt read_last_energy() in their scripts.

If all this is for explicitness, wouldn't the current availability of read_energy[-1] be fine? Is this so bad to ask users to do as compared to read_energy_last?

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

I have thought about it, and I have now added this, what do you think?

read_geometry.last(*args, **kwargs)
read_geometry.next(*args, **kwargs)

to explicitly request the next item, and the last item. Remark that first would have been misleading as in the case of @pfebrer's case above.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 16, 2023

Sounds good!

But what does read_geometry() do?

@zerothi
Copy link
Owner Author

zerothi commented Jun 16, 2023

Sounds good!

But what does read_geometry() do?

same as next so I don't know if that is needed? but more for clarity, as you suggested.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 16, 2023

Hmm it is very difficult to know what users will think about it. But it doesn't seem it could be harmful in the future to maintain these two properties (fingers crossed haha) so we could have them both.

@zerothi
Copy link
Owner Author

zerothi commented Jun 17, 2023

@tfrederiksen you might be interested in the latest commit which changes the orca output to follow this method. I couldn't really safe the all argument, but I hope this is fine. In the end when we get to a release it will be 0.14. Let me know your thoughts.

Probably this should go in ASAP, and then the other siles will come as required.

if len(blocks) > 0:
return blocks[-1]
return None
return read_block(step_to)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'read_block' may be used before it is initialized.
if len(blocks) > 0:
return blocks[-1]
return None
return read_block(step_to)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'step_to' may be used before it is initialized.
@pfebrer
Copy link
Contributor

pfebrer commented Jun 17, 2023

Could read_energy and similars return a numpy array by default instead of a Collection?

@zerothi
Copy link
Owner Author

zerothi commented Jun 17, 2023

Could read_energy and similars return a numpy array by default instead of a Collection?

That would be hard for end users, then the users need to read which indices corresponds to which energies, that was why the property dict was used.
It would be nice if they all allow collection or dataframes.

Copy link
Contributor

@tfrederiksen tfrederiksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great. After merge I will revisit #573.

Fixed documentation issues and enabled this.
Storing sliced objects will also work.
Currently nested slicing is not possible.

enabled nested with of a sile and changed to SileBinder

The sile_read_multiple has been deleted, now the way to
do it is SileBinder.
This is more versatile and allows better handling.
It also patches the documentation.

This changes the behaviour of all files using this.
By default a read_* with the SileBinder will
only read the next entry, however to read
multiple one should use read_*[...] for slicing
the data.
The bound method will try and be smart about only reading
the needed elements, there is a bit of overhead, but
largely it should be fast.

All tests are functional again.

Signed-off-by: Nick Papior <[email protected]>
if the file reads raises something, then announce

Signed-off-by: Nick Papior <[email protected]>
This makes the read-functionalities more
useable to retain other parts of the information.

All tests are functional again.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
@zerothi zerothi merged commit 0b17eec into main Jun 20, 2023
3 of 4 checks passed
@zerothi zerothi deleted the 584-sile-slicer branch June 20, 2023 08:12
zerothi added a commit that referenced this pull request Mar 1, 2024
Added lots of typehints in the code, primarily in the
io code base.

Changed the way the siesta.stdout siles handle data.
They now rely on the SileSlicer and thus can be sliced
upon calling. It makes the code a little simpler but also
highligted some problems.

In particular the problem arise when the return function
needs to do different things depending on whether the file
is done reading, or not.

The defaults for stdoutSileSiesta has changed to read
the next entry in the file. This is in contrast to the
way it was. It will require users to adapt!
the reason for this can be found in the discussion in
#586.

This streamlines the usage across the different parts of the
IO handling.

Signed-off-by: Nick Papior <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slicing read_* methods
3 participants