Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python bindings for writing out charges #8

Open
danny305 opened this issue Jun 29, 2021 · 6 comments
Open

python bindings for writing out charges #8

danny305 opened this issue Jun 29, 2021 · 6 comments

Comments

@danny305
Copy link
Contributor

Yo Tomas,

I need your library to write out the cif file (PR I am submitting).

I am creating a branch to do this, however, I think it would be good for us to collaborate and have python bindings to write out all files (pdb, mol2, cif).

Let me know how you want to go about this.

Danny

@krab1k
Copy link
Member

krab1k commented Jul 1, 2021

Sure, I think it's a rather good idea, and it would be great if the bindings were on par with the overall functionality as the standalone app has.

Originally, ChargeFW2 was not designed to work as a library, e.g., I guess there are several places where there is an exit() call instead of throw, which would fit better for the library. And there are definitely other things. For example, MoleculeSet and objects within it are not modified after the initialization which might not suit the Python style (?).

Unfortunately, I am quite busy at the moment (I really have to finish and submit a thesis in a few months), but if you have some time, then please suggest first the idea for the Python interface (which object/functions to expose in Python), and we can talk this through.

(Without putting too much thought into this...) I see the main design difference between the app and the bindings in the usage. When running an app, all parameters and settings are known in advance from the command line, whereas in Python, we should first create some object with molecules, then initialize some method object (with or without parameters) and finally produce charges to be saved somewhere.

I am definitely open to suggestions. ;-)

@danny305
Copy link
Contributor Author

danny305 commented Jul 1, 2021

Yeah I agree with you on the design difference. Good luck writing your thesis!

As for the python interface, Ill think about it some more and then comment here.

One thing I want your opinion/advice right now has to do with running chargefw2 on a protein that has cations in it. This is what I am currently working on and would like the bindings to have a solution for.

Specifically, I would like to calculate partial charges for proteins that have cations with the method 'eem' and parameter set 'EEM_10_Cheminf_b3lyp_aim'. I find this method/parameter set gives me the most trustworthy values on a large protein dataset.

The functionality I would like is for it to ignore these cation atoms as if they were not there and run the calculation and then write out the cif file where the cations are left blank ('?') or have their formal charge inserted instead.

The solution I am currently implementing is to check the method for the allowed elements, then I remove all atoms from the cif file that do not belong to that element set, I run the calculation and generate a cif file, and then I add the cation atoms back in the out cif file.

This is very hacky. It would be great if the bindings can ignore the cations (similar to water) for the calculation and leave them blank ('?') or fill in their formal charge in the out file.

Do you know what I would need to do to Chargefw2 to make this possible/make it an option in the bindings?

@krab1k
Copy link
Member

krab1k commented Jul 2, 2021

Are those ions classified as HETATMs? Or have their own residues like https://www.rcsb.org/ligand/CA or https://www.rcsb.org/ligand/K? What about something similar to --ignore-water but more general like --ignore-residues CA and some equivalent form in the binding? But the interface is one thing and the internal representation of these ignored atoms another one.

Anyway, I've been thinking about a similar issue (concerning various metals in protein structures) a while back. The problem is that there is great difficulty with obtaining the reference QM charges, which are needed to derive some parameter set for a method like EEM. So the majority of parameter sets lacks those elements and therefore are not usable.

Well, I don't know your use case, but to me, the neighbourhood of these ions (i.e., charges on those atoms) are the important part, not the ions themselves. The problem is that if you omit these ions from the structure and the charge computation, then you lost their influence on the neighbourhood, and thus the charges near these ions might not be accurate.

Actually, I have an idea of a workaround for EEM, I've been thinking about writing a paper on the subject, but I never really get into it. This would need proper testing (some use cases to prove whether the idea is working or not, which I don't have). Maybe an opportunity for a collaboration? But it depends on what's your goal with these structures, that is, if you are happy with just ignoring these atoms or if you would benefit from something I described.

@danny305
Copy link
Contributor Author

danny305 commented Jul 2, 2021

So in proteins, cations are listed as HETATMs. I have yet to encounter a protein where they are not. And if they are not listed as HETATMs in a protein, I believe this is an annotation error.

So I work in a protein engineering lab (Andrew Ellington Lab). I build deep learning models trained on protein structures. I am rebuilding our molecular representation and swapping out forcefield partial charges with chargefw2 partial charges.

Right now our molecular representation does not take into account cations nor the polarization of atoms based on the surrounding neighborhood. So you are right that by removing the cations, I lose their influence on the environment but that is still a step up from forcefield partial charges (which treat for example the beta carbon of threonine with the same partial charge across all proteins regardless of its neighborhood).

I would love to hear your idea for the workaround, like I said, our lab is a protein engineering lab so almost all of our graduate students and post-docs are experimentalist and we validate our AI models with real proteins. Our lab also collaborates with Yi Lu's lab (metalloenzyme professor). Yi Lu could come up with specific use cases based on what your idea is (protein engineering at the cation-protein interface).

You should email me ([email protected]) to continue the discussion.

For right now, I am happy with ignoring these atoms atoms in the calculation and replace their value with their formal charge. I have a lot of other applications (protein-protein interactions/protein-ligand interactions) not requiring cations that are held up because a significant portion of the dataset has cations and are preventing me from training neural networks. I need to be training nets for our different applications within the next two weeks.

However, for the next iteration of our molecular representation (later this year) I would love to do this right and not ignore cations and their influence in the neighborhood so we can better approach problems that interest Yi Lu.

@danny305
Copy link
Contributor Author

shoot me an email whenever you would like to discuss a collaboration. I am really interested in addressing cations with eem.

@krab1k
Copy link
Member

krab1k commented Aug 13, 2021

Sure. I need to submit the first draft of the thesis next week, however, after that, I'll contact you on email (hopefully I will remember the hack for EEM by then. :-)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants