Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate stk as a structure provider, and generate shapes for custom bonds #366

Merged
merged 35 commits into from
Oct 4, 2024

Conversation

andrewtarzia
Copy link
Contributor

@andrewtarzia andrewtarzia commented Sep 20, 2024

In this pull request, I implement the "simple" solution from #361 through a Python function and example for others to use.

This builds off an unreleased branch from @Luthaf and so it may require rebasing before merging, but I was hoping to get feedback/discuss some questions anyway in this draft.

Discussion points:

  • Is the location of the function appropriate. I wrote it out in Python for ease, but perhaps there is a better place for this processing?
  • I did not want to add dependancies, but I have built the molecules with stk, and extracted known bonds with stk (while also showing some "by-hand" bond additions). I do not think the dependancy matters anywhere else and is up to the user to potentially install it (pip install stk) if they want to run the example? But I did this as an example for you, and I think a better option (that does not confuse the user) would be to just show the example with bonds collected by hand.
  • The current approach is "you have a molecule already, but want to add bonds", but an alternative to this is for me to write a class that contains structure and topology, and reads from .mol/rdkit/stk (for example), and can be used in place of the ase.read statement?

Remaining todo:

  • Allow for user-set colours, or a better default.
  • Allow user to turn on or off "bonds" and "shapes" - although this is handled in the example, not in the new function.
  • Clean up the stk code after discussion
  • Probably adhere to tests and formatting rules that I have not...
  • Rebase previous changes form @Luthaf

@Luthaf
Copy link
Contributor

Luthaf commented Sep 23, 2024

Thanks a lot for sending this contribution!

This builds off an unreleased branch from @Luthaf and so it may require rebasing before merging

Are you referring to 9615954? If yes, it is already merged in the main branch, so rebasing on top of that should be all you need to do!

Probably adhere to tests and formatting rules that I have not...

You can check this on your machine by running tox -e lint and tox -e format for formatting, and running tox to run all tests (including lint tests).


and can be used in place of the ase.read statement?

This is actually the intention behind this part of the code! There is a frames_to_json function that takes arbitrary frame types, and then dispatch to special code for each underlying type. Right now there is only one such underlying type with ase.Atoms, but I'd be happy to also have the same functionality for STK.

Then on top of this, we could add the function that generates bonds as shapes (like we have ase_tensors_to_ellipsoids which is specific to ase).

Overall, the code could look something like this from the user perspective:

structures = stk.read_structures(...) # not sure about the name of the actual function from STK here ^_^

bonds = chemiscope.stk_bonds_to_shapes(structures)
chemiscope.show(frames=structures, properties=..., shapes=bonds)

The good point about this approach is that this will then be an easy change when/if we integrate bonds directly in the dataset given to JavaScript. One would just remove the shapes=bonds, and frames_to_json would directly include the bonds information in the JSON file for STK data.

@Luthaf
Copy link
Contributor

Luthaf commented Sep 23, 2024

If we go with the full support for STK data in chemiscope (in addition to ASE), it might make sense to do a separate PR adding just basic support (i.e. atomic names + positions + cell), and keep the bonds => shape transformation functionality separate.

@andrewtarzia
Copy link
Contributor Author

andrewtarzia commented Sep 24, 2024

Thank you for the response!

You can check this on your machine by running tox -e lint and tox -e format for formatting, and running tox to run all tests (including lint tests).

Works a charm, TY!

Regarding the stk-specific methods, I am very happy with your suggestion. Does that mean you would then add it as a dependancy, or an optional dependancy for the user to add?

If we go with the full support for STK data in chemiscope (in addition to ASE), it might make sense to do a separate PR adding just basic support (i.e. atomic names + positions + cell), and keep the bonds => shape transformation functionality separate.

This sounds like a good idea - I might put it all into this draft PR to get it aligned with your code and then "close-reopen" as needed.

@andrewtarzia
Copy link
Contributor Author

Also, regarding the failed python 3.9 test, that seems to be an issue with ase?

@Luthaf
Copy link
Contributor

Luthaf commented Sep 24, 2024

I would leave stk as an optional dependency: chemiscope is not really using stk, but instead providing adapters from stk data, which are only relevant if the user already has and is using stk.

The CI failure is actually a linter error (at the end, you can see lint: FAIL and tests: OK). The error message is

/home/runner/work/chemiscope/chemiscope/python/chemiscope/structures/_bonding.py:84:17: B007 Loop control variable 'i' not used within the loop body. If this is intended, start the name with an underscore.

The ASE things printed near the end are only warnings.

@andrewtarzia
Copy link
Contributor Author

The CI failure is actually a linter error (at the end, you can see lint: FAIL and tests: OK). The error message is

Ah ok, I was seeing red after fixing the lint issue and panicked... aha its ok on my local machine.

@andrewtarzia
Copy link
Contributor Author

Updated this, one issue I foresee is that stk molecules do not contain properties, so we can add them through their own dictionary like in the example, or I can write an stk molecule-containing class that also has properties?

Will chemiscope check the validity of a json file (i.e. matching of structures and properties) elsewhere?

@andrewtarzia
Copy link
Contributor Author

Also, currently, the tests may fail because of a bug in rdkit rdkit/rdkit#7841 and the latest version. I have pinned the rdkit version for that reason, but that should be fixed soon.

@Luthaf
Copy link
Contributor

Luthaf commented Sep 24, 2024

Updated this, one issue I foresee is that stk molecules do not contain properties, so we can add them through their own dictionary like in the example, or I can write an stk molecule-containing class that also has properties?

In this case, the properties should be given separately yes!

Will chemiscope check the validity of a json file (i.e. matching of structures and properties) elsewhere?

Yes, we check this both in Python (before creating the JSON file) and in JavaScript (in case people are not using chemiscope to create the JSON files and they have errors inside).

@andrewtarzia andrewtarzia marked this pull request as ready for review September 24, 2024 15:28
@andrewtarzia
Copy link
Contributor Author

Perfect!

I have marked this ready to review for feedback on the processing, to make sure I understood it all. And if so, I can split into separate PRs if you see fit.

Copy link
Contributor

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of small comments in passing, I still have to do a more in depth review of the code

python/chemiscope/input.py Outdated Show resolved Hide resolved
python/chemiscope/jupyter.py Show resolved Hide resolved
python/chemiscope/structures/__init__.py Outdated Show resolved Hide resolved
python/chemiscope/structures/__init__.py Outdated Show resolved Hide resolved
python/chemiscope/structures/_stk.py Show resolved Hide resolved
python/chemiscope/structures/_stk.py Outdated Show resolved Hide resolved
python/chemiscope/structures/_stk.py Show resolved Hide resolved
python/examples/8-showing_custom_bonds.py Outdated Show resolved Hide resolved
@ceriottm
Copy link
Contributor

ceriottm commented Sep 24, 2024

All looks quite nice, but the example does not build into the docs. It definitely misses dependences in requirements.txt, and I think the data folder should be hardcoded to be "data/" as in other examples, without trying to guess the path. If one wants to run these manually the should be ran from the examples/ folder. Even after fixing these, I couldn't get it to generate a page.

@andrewtarzia
Copy link
Contributor Author

Ok, I will try and fix the issues to get the docs buildings.

I think the data folder should be hardcoded to be "data/" as in other examples, without trying to guess the path. If one wants to run these manually the should be ran from the examples/ folder.

If I understand correctly, the use of pathlib.Path(__file__).resolve().parent is not guessing, but it is hardcoding the working directory to where the script is, which is examples. This allows the user to run the script from elsewhere without there being any problems

@andrewtarzia
Copy link
Contributor Author

Fixing the documentation requires removing this line anyway it seems, so I will go back to data/

@andrewtarzia
Copy link
Contributor Author

andrewtarzia commented Sep 25, 2024

As far as I can tell locally, everything in tests and website.yml run now

@andrewtarzia
Copy link
Contributor Author

I am working on the versioning because the latest stk/stko requires Python 3.11 and the website is 3.10. Will update soon.

@Luthaf
Copy link
Contributor

Luthaf commented Sep 26, 2024

We can update the website building to run on Python 3.11, that's fine with me

@ceriottm
Copy link
Contributor

Ok, I will try and fix the issues to get the docs buildings.

I think the data folder should be hardcoded to be "data/" as in other examples, without trying to guess the path. If one wants to run these manually the should be ran from the examples/ folder.

If I understand correctly, the use of pathlib.Path(__file__).resolve().parent is not guessing, but it is hardcoding the working directory to where the script is, which is examples. This allows the user to run the script from elsewhere without there being any problems

Didn't work with tox, and given that the other examples don't do any of this, I'd stick to the local assumption (or change all examples for consistency).

@ceriottm
Copy link
Contributor

We can update the website building to run on Python 3.11, that's fine with me

If it doesn't break other things, I think we can.

@andrewtarzia
Copy link
Contributor Author

Ok, I will try and fix the issues to get the docs buildings.

I think the data folder should be hardcoded to be "data/" as in other examples, without trying to guess the path. If one wants to run these manually the should be ran from the examples/ folder.

If I understand correctly, the use of pathlib.Path(__file__).resolve().parent is not guessing, but it is hardcoding the working directory to where the script is, which is examples. This allows the user to run the script from elsewhere without there being any problems

Didn't work with tox, and given that the other examples don't do any of this, I'd stick to the local assumption (or change all examples for consistency).

Indeed because it did not work with tox, I went back to the existing assumption.

@andrewtarzia
Copy link
Contributor Author

We can update the website building to run on Python 3.11, that's fine with me

I am happy to test this change before you review this PR.

Copy link
Contributor

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I don't think splitting this into two separate PR is really necessary, but if you want to keep multiple commit in the history could you please squash them into logical units? Otherwise I'll squash the whole PR when merging.

docs/requirements.txt Show resolved Hide resolved
python/README.md Outdated Show resolved Hide resolved
Comment on lines 33 to 35
__all__ = [
"convert_stk_bonds_as_shapes",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. __all__ controls what gets imported when someone does from xxx import *.

If this is a workaround the linter about unused import, feel free to use # noqa for this import!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

python/chemiscope/structures/__init__.py Outdated Show resolved Hide resolved
python/chemiscope/structures/__init__.py Outdated Show resolved Hide resolved
python/chemiscope/structures/__init__.py Outdated Show resolved Hide resolved
python/chemiscope/structures/__init__.py Outdated Show resolved Hide resolved
python/examples/9-showing_custom_bonds.py Outdated Show resolved Hide resolved
python/examples/data/stk_0.mol Outdated Show resolved Hide resolved
@andrewtarzia
Copy link
Contributor Author

Ok, all done through these comments. Thank you for the feedback! I think its ok for you to squash all commits here upon merging.

Copy link
Contributor

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this work! I'll merge once CI is happy

@andrewtarzia
Copy link
Contributor Author

Excellent! Thank you! What is the release schedule for chemiscope? (just asking to allow for lazy pip-install usage aha)

@ceriottm
Copy link
Contributor

ceriottm commented Oct 3, 2024

Maybe we could do a point release after this is merged?

@Luthaf Luthaf merged commit b90df19 into lab-cosmo:main Oct 4, 2024
6 checks passed
@Luthaf
Copy link
Contributor

Luthaf commented Oct 4, 2024

Yes, we can do a point release!

EDIT: I'll finish up #367 first and also include it in the release

Comment on lines +21 to +30
self.assertEqual(
data["structures"][0]["x"],
[
1.6991195138834223,
0.7737143493209756,
-0.41192204250544034,
-0.7778845126633998,
-1.1777543806588109,
-0.10527292738297804,
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewtarzia when running this test locally, I get an error:

AssertionError: Lists differ: [1.8856062905183792, 0.8568739943293787, -0.[74 chars]6763] != [1.6991195138834223, 0.7737143493209756, -0.[77 chars]7804]

First differing element 0:
1.8856062905183792
1.6991195138834223

- [1.8856062905183792,
-  0.8568739943293787,
-  -0.4418130132537843,
-  -1.224037530127693,
-  -0.6838309641680245,
-  -0.3927987773106763]
+ [1.6991195138834223,
+  0.7737143493209756,
+  -0.41192204250544034,
+  -0.7778845126633998,
+  -1.1777543806588109,
+  -0.10527292738297804]

Do you think this is because the orientation of the molecule is not fixed? Or that rdkit gives different results here for some reason?

I'm running on an arm64 CPU / macOS if that's relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm that is strange. Molecule construction in stk should be constant by setting rdkit seeds and such. That said, I don't think we've ever tested on a mac, I will check with another Mac user. Sorry!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so Lukas confirmed that the STK tests fail on a Mac. I assume that the coordinates are self consistent, but the random numbers used differ across hardware. What's the best course of action? I could provide coordinates to the molecules instead of using rdkit conformer generation? Or can we mute a test on certain hardware?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tests in chemiscope, I'll just remove the tests about the exact coordinate values, it does not matter too much to test the functionality here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in this case!

@andrewtarzia andrewtarzia deleted the add_example branch October 4, 2024 11:59
@Luthaf Luthaf changed the title Add example for using bonds to shapes in various forms. Integrate stk as a structure provider, and generate shapes for custom bonds Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants