Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic guesser features #3753

Merged
merged 295 commits into from
Oct 19, 2024
Merged

Conversation

aya9aladdin
Copy link
Contributor

@aya9aladdin aya9aladdin commented Jul 15, 2022

Fixes #

Changes made in this Pull Request:

  • I have created a guesser package that will contain all upcoming the context aware guessers. it contain the following so far:

a core module that only contain one method that will be used by the guess_topologyAttribute method of the universe
a base module that contains the BaseGusser class that will be inherited by every guesser
a DefaultGuesser module which will handle the currently existing guess methods (for now, I just added the mass guessing to test with one attribute guessing method)

  • I made changes to the universe class to work with the new methodology
  • I removed the mass guessing from the PDBParser to test guessing it at the universe level

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2022

Hello @aya9aladdin! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 184:80: E501 line too long (101 > 79 characters)
Line 185:80: E501 line too long (118 > 79 characters)
Line 197:80: E501 line too long (107 > 79 characters)
Line 200:80: E501 line too long (90 > 79 characters)
Line 202:1: W293 blank line contains whitespace

Line 3298:1: W293 blank line contains whitespace

Line 387:80: E501 line too long (90 > 79 characters)
Line 589:80: E501 line too long (85 > 79 characters)
Line 1464:80: E501 line too long (92 > 79 characters)
Line 1474:80: E501 line too long (81 > 79 characters)
Line 1474:82: W291 trailing whitespace
Line 1476:1: W293 blank line contains whitespace
Line 1480:80: E501 line too long (87 > 79 characters)
Line 1491:1: W293 blank line contains whitespace
Line 1492:79: W291 trailing whitespace
Line 1498:1: W293 blank line contains whitespace
Line 1507:80: E501 line too long (126 > 79 characters)
Line 1510:52: W291 trailing whitespace
Line 1522:80: E501 line too long (83 > 79 characters)
Line 1525:13: E303 too many blank lines (2)
Line 1528:35: E127 continuation line over-indented for visual indent
Line 1528:52: W291 trailing whitespace
Line 1532:30: E127 continuation line over-indented for visual indent

Line 48:80: E501 line too long (98 > 79 characters)
Line 59:1: W293 blank line contains whitespace
Line 61:1: W293 blank line contains whitespace
Line 69:80: E501 line too long (89 > 79 characters)
Line 70:80: E501 line too long (104 > 79 characters)
Line 71:80: E501 line too long (92 > 79 characters)
Line 71:93: W291 trailing whitespace
Line 73:1: W293 blank line contains whitespace
Line 85:1: W293 blank line contains whitespace
Line 117:1: W293 blank line contains whitespace
Line 133:32: W291 trailing whitespace
Line 141:1: W293 blank line contains whitespace

Line 32:1: E302 expected 2 blank lines, found 1
Line 34:80: E501 line too long (115 > 79 characters)
Line 34:116: W291 trailing whitespace
Line 35:80: E501 line too long (104 > 79 characters)
Line 37:14: W291 trailing whitespace
Line 45:1: W293 blank line contains whitespace
Line 46:80: E501 line too long (86 > 79 characters)
Line 52:1: W293 blank line contains whitespace
Line 57:1: W293 blank line contains whitespace
Line 65:34: E203 whitespace before ':'
Line 88:1: W293 blank line contains whitespace
Line 95:80: E501 line too long (84 > 79 characters)
Line 120:80: E501 line too long (112 > 79 characters)
Line 156:14: E225 missing whitespace around operator
Line 159:14: W291 trailing whitespace
Line 232:80: E501 line too long (80 > 79 characters)
Line 234:80: E501 line too long (80 > 79 characters)
Line 236:80: E501 line too long (80 > 79 characters)
Line 240:80: E501 line too long (82 > 79 characters)
Line 241:80: E501 line too long (80 > 79 characters)
Line 242:80: E501 line too long (83 > 79 characters)
Line 245:80: E501 line too long (80 > 79 characters)
Line 246:80: E501 line too long (82 > 79 characters)
Line 261:80: E501 line too long (80 > 79 characters)
Line 282:80: E501 line too long (81 > 79 characters)
Line 292:42: E713 test for membership should be 'not in'
Line 319:5: E303 too many blank lines (2)
Line 322:80: E501 line too long (93 > 79 characters)
Line 324:80: E501 line too long (85 > 79 characters)
Line 325:27: E128 continuation line under-indented for visual indent
Line 325:80: E501 line too long (81 > 79 characters)
Line 326:1: W293 blank line contains whitespace
Line 355:80: E501 line too long (80 > 79 characters)
Line 356:80: E501 line too long (83 > 79 characters)
Line 365:80: E501 line too long (93 > 79 characters)
Line 405:80: E501 line too long (94 > 79 characters)
Line 406:1: W293 blank line contains whitespace
Line 410:80: E501 line too long (81 > 79 characters)
Line 437:20: E713 test for membership should be 'not in'
Line 446:5: E303 too many blank lines (2)
Line 455:5: E303 too many blank lines (2)
Line 488:1: W293 blank line contains whitespace

Line 334:80: E501 line too long (103 > 79 characters)
Line 341:80: E501 line too long (108 > 79 characters)

Line 293:80: E501 line too long (100 > 79 characters)

Line 122:16: E111 indentation is not a multiple of four

Line 618:80: E501 line too long (90 > 79 characters)
Line 622:80: E501 line too long (105 > 79 characters)
Line 624:1: W293 blank line contains whitespace
Line 629:1: W293 blank line contains whitespace
Line 903:9: E741 ambiguous variable name 'l'
Line 904:80: E501 line too long (97 > 79 characters)
Line 905:80: E501 line too long (91 > 79 characters)

Line 177:23: E127 continuation line over-indented for visual indent
Line 179:80: E501 line too long (89 > 79 characters)
Line 180:80: E501 line too long (106 > 79 characters)
Line 205:80: E501 line too long (95 > 79 characters)

Line 148:80: E501 line too long (83 > 79 characters)

Line 383:1: E302 expected 2 blank lines, found 1
Line 393:1: W293 blank line contains whitespace
Line 400:13: E117 over-indented
Line 403:1: W293 blank line contains whitespace
Line 410:1: W293 blank line contains whitespace
Line 518:5: E301 expected 1 blank line, found 0
Line 755:80: E501 line too long (86 > 79 characters)
Line 1370:80: E501 line too long (86 > 79 characters)
Line 1374:9: E303 too many blank lines (2)

Line 29:1: E302 expected 2 blank lines, found 1
Line 30:1: W293 blank line contains whitespace
Line 35:29: W291 trailing whitespace
Line 38:1: W293 blank line contains whitespace
Line 39:1: W391 blank line at end of file

Line 53:1: E302 expected 2 blank lines, found 1
Line 60:80: E501 line too long (93 > 79 characters)
Line 63:5: E303 too many blank lines (2)
Line 66:25: E201 whitespace after '('
Line 66:80: E501 line too long (91 > 79 characters)
Line 73:80: E501 line too long (121 > 79 characters)
Line 79:80: E501 line too long (80 > 79 characters)
Line 90:80: E501 line too long (84 > 79 characters)
Line 150:51: E231 missing whitespace after ','
Line 150:53: E231 missing whitespace after ','
Line 174:1: E302 expected 2 blank lines, found 1
Line 176:80: E501 line too long (92 > 79 characters)
Line 178:27: E127 continuation line over-indented for visual indent
Line 182:1: E302 expected 2 blank lines, found 1
Line 188:19: E127 continuation line over-indented for visual indent
Line 190:1: E302 expected 2 blank lines, found 1
Line 196:19: E127 continuation line over-indented for visual indent
Line 232:27: E127 continuation line over-indented for visual indent

Line 29:1: W293 blank line contains whitespace

Line 249:1: W293 blank line contains whitespace

Line 259:1: W293 blank line contains whitespace

Comment last updated at 2022-09-12 19:40:16 UTC

@aya9aladdin aya9aladdin marked this pull request as ready for review July 15, 2022 01:58
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aya9aladdin, this is a neat start! I only had time to skim but left a couple comments. The tests are failing because _GUESSERS can't be imported -- it's not in the top level __init__.py yet.

@@ -1436,6 +1439,19 @@ def from_smiles(cls, smiles, sanitize=True, addHs=True,

return cls(mol, **kwargs)

def guess_topoloyAttribute(self, context, to_guess):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def guess_topoloyAttribute(self, context, to_guess):
def guess_TopologyAttr(self, context, to_guess):

There's already a method called add_TopologyAttr, and the class is called a TopologyAttr. Going with the principal of least astonishment that advises a consistent API, could you please rename this method to something that users could easily guess (😛) themselves?

context = 'default'

def __init__(self):
self._guess = {'mass': self.guess_masses}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Instead of this, would it be tidier to have a class-level _guess dictionary like TopologyAttr.transplants? I'm not sure it makes sense for instances of guesser classes to differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this and see how I could integrate the transplant idea

.format(self.context, a))
return True

def guessTopologyAttribute(self, to_guess):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of type hinting, could you please change the name of the to_guess variables to be clearer? here it's one string, but in is_guessed it's a list of strings -- that is quite confusing. Also, could you please name your methods with conventional snake_case and use TopologyAttr instead of TopologyAttribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to have an overall method for guessing all the attributes at once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm planning to have a method for guessing all the attributes at once, this was just a test for guessing one attribute

values = self._guess[to_guess]()
return values

def setAtoms(self, atoms):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method need to exist? Could __init__ just take atoms instead, like a Writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why I did it this way tbh, so I'll remove it

package/MDAnalysis/guesser/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
@jbarnoud
Copy link
Contributor

@aya9aladdin I recommend that you push your changes often. Do not wait for the code to be finished. By pushing often, we get to see your progress, we get to comment on your latest version instead of a version that may already be obsolete, and we may catch issues earlier, reducing the amount of work you will have to redo.

@aya9aladdin
Copy link
Contributor Author

@aya9aladdin I recommend that you push your changes often. Do not wait for the code to be finished. By pushing often, we get to see your progress, we get to comment on your latest version instead of a version that may already be obsolete, and we may catch issues earlier, reducing the amount of work you will have to redo.

I was planning to push everything when I finish indeed. I don't know if I have to make a pull request at this unmatured level or not?

@jbarnoud
Copy link
Contributor

You already have a pull request, no need to open another one. Instead, update this pull request by pushing your new commits.

@aya9aladdin
Copy link
Contributor Author

I have made some updates:
1- I removed all mass and type guessing from parsers and transferred it to take place inside the universe initiation, so mass and atom types will still be guessed automatically until we are ready to remove it

2- some attributes depend on other attributes to be guessed. if those parent attributes also need to be guessed, this can raise unnecessary errors if we attempted to guess the child attribute before the parent one. So, I added a rank dictionary to the guesser class which will rank each attribute based on its dependency on parent attributes to be guessed. For example ,AtomType depend on atom name then it will have rank 1. While masses depends on atom type which depends on atom name then it will have rank 2. By using this rank atom types will be guessed then masses to avoid errors.
(I hard coded it for now in the DefaultGuesser for testing)

next update will have:
1-finishing the BaseGuesser
2- add guesser metaclass for registration of guessers and other required dictionaries
3- add AtomType attributes to be equal to Element for PDBParser, FHIAIMSParser, TXYZParser, and XYZParser parsers

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a partial look through, I hope the minor comments help.

package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
@aya9aladdin
Copy link
Contributor Author

I'm just more concerned with the logic than the documentation at the moment that's why it's not very accurate

@aya9aladdin
Copy link
Contributor Author

I updated some files as follows:
1- I modified guess_bonds inside the universe to pass a context to it (honestly I didn't get why bond guessing happens inside the AtomGroup, not the universe, so I left it as it is just modified the code to pass the context)
2- I modified the RDKitParser, PDBParser, XYZParser, and FHIAMSParser to add AtomType if it exists, and if not assign it to the elements then names attributes
3- at the point most of the DefualtGuesser is considered complete
4- I added a GuesserMeta to register the guesser classes to be implemented. I was planning to make the _guess dictionary mimic the TopologyAtrribute.transplant but I didn't find it has an application right now, so I postponed it to when I begin working on new guesser classes and see how it could work

What I would do next:
1- update modules where old guessing takes place (hydrogen bond analysis modules as I remember)
2- test and document what I have done so far

aya9aladdin and others added 4 commits September 17, 2024 19:23
- updated guess_TopologyAttrs docs
- fixed some tests
- capitalize atomtypes from elements in RDKitParser
@aya9aladdin
Copy link
Contributor Author

@lilyminium @IAlibay I pushed last updates, let me know if any other updates is needed

@lilyminium
Copy link
Member

Thanks @aya9aladdin!! Will aim to get my review in by this weekend.

@lilyminium
Copy link
Member

Looks like I'd already approved the PR! I did just look over your changes and they seem to address the remaining concerns @IAlibay and I discussed, but I'll also wait on @IAlibay to re-review.

@IAlibay
Copy link
Member

IAlibay commented Sep 22, 2024

I won't be able to re-review for a little while unfortunately, probably end of the week or next weekend.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, I have two remaining questions, but happy to approve.

@lilyminium would it be ok if you took care of shepherding the remaining things?

"""
def __init__(self, topology=None, *coordinates, all_coordinates=False,
format=None, topology_format=None, transformations=None,
guess_bonds=False, vdwradii=None, fudge_factor=0.55,
lower_bound=0.1, in_memory=False,
lower_bound=0.1, in_memory=False, context='default',
to_guess=('types', 'masses'), force_guess=(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to_guess=('types', 'masses'), force_guess=(),
to_guess=('types', 'masses'), force_guess=None,

Could you confirm that keeping () instead of None was intentional?
See: #3753

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyminium feel free to turn this into an issue, we can fix this in post.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's intended to be that way indeed, for easier manuplation

@@ -303,8 +306,7 @@ def parse(self, **kwargs):
if atomtypes:
attrs.append(Atomtypes(np.array(atomtypes, dtype=object)))
else:
atomtypes = guessers.guess_types(names)
attrs.append(Atomtypes(atomtypes, guessed=True))
atomtypes = np.char.upper(elements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aya9aladdin @lilyminium - this is assigned but nothing else is done with it afterwards it seems? I feel like I'm missing something, could you please just double check?

@IAlibay
Copy link
Member

IAlibay commented Oct 13, 2024

repinging @lilyminium - we should try to get this (and any subsequent fixes) into 2.8.0

package/CHANGELOG Outdated Show resolved Hide resolved
Co-authored-by: Lily Wang <[email protected]>
@lilyminium
Copy link
Member

lilyminium commented Oct 19, 2024

Thanks, @aya9aladdin, everything looks good to go! Really excited to merge this. The only discussion point left is in which release. We have one coming up (2.8.0) that we could make as the feature freeze is tomorrow (Sunday). IMO this should definitely go in before 3.0, but it may be too fast for 2.8.0 as we wouldn't have a lot of testing before putting it in a published release.

@MDAnalysis/coredevs do you have any opinions on if this PR should get merged into 2.8.0, or wait for a prospective 2.9? The deadline for making it into 2.8.0 is 8 hours from the time of this comment as that's the time I have before the feature freeze sets in (apologies for how fast that is). If there are no comments otherwise I will follow the default current plan as I understand it and merge this into develop 8 hours from now. If there is discussion with different opinions but no consensus I will delay merge until after 2.8.

Below are some pros/cons IMO for merging into 2.8:

Pros:

  • this PR has been open for quite a while and it would be good to have it merged
  • In the event there is no 2.9, we should get deprecation warnings in before changing behaviour in 3.0
  • In the event there is a 2.9, it's still better to have deprecation warnings in earlier to give more time to respond
  • this PR shouldn't result in behaviour changes* (asterisked -- see con below)
  • This is the current plan as I understand it

Cons:

  • this is a major PR and while it passes current tests, there may be unforeseen effects that we don't see that make it into a release. More time in develop means more time to test.

@lilyminium lilyminium mentioned this pull request Oct 19, 2024
5 tasks
@IAlibay
Copy link
Member

IAlibay commented Oct 19, 2024

My take right now is that if we're hesitant on getting this in for 2.8 then let's wait on 2.9.

@lilyminium lilyminium enabled auto-merge (squash) October 19, 2024 13:19
@lilyminium
Copy link
Member

Since there's been no discussion otherwise, I'm calling it -- this should get auto-merged into develop once tests go green. Huge congratulations Aya -- this was a truly monumental PR and you did an amazing job, pushing the project forward for years! I'm really excited to see how this new guesser paradigm will unlock more flexibility for MDAnalysis in handling different molecular simulation contexts going forward -- your contribution here is really fantastic!

@jbarnoud
Copy link
Contributor

That is huge ! Congratulations @aya9aladdin ! I am extremely glad to see this project reach its conclusion 🚀

@lilyminium lilyminium dismissed stale reviews from yuxuanzhuang and jbarnoud October 19, 2024 14:20

Dismissed old review due to being stale.

@lilyminium lilyminium merged commit 9b69745 into MDAnalysis:develop Oct 19, 2024
24 checks passed
@aya9aladdin
Copy link
Contributor Author

I am thrilled to have this merged!! thankful for all your revies and the experience.... looking forward to use the new functionality and contributing more to MDAnalysis!

@IAlibay
Copy link
Member

IAlibay commented Oct 19, 2024

Congrats @aya9aladdin - it's been a long road but definitely a great addition to the library.

@orbeckst
Copy link
Member

Congratulations @aya9aladdin — awesome!!

also big thank you to @lilyminium and @IAlibay for all your support!

@IAlibay
Copy link
Member

IAlibay commented Nov 10, 2024

Apologies for the delay, it's a busy time of year for social obligations.
This is looking good!
Here's my first half of the review: mostly docstring changes.
@aya9aladdin could you confirm that you are ok releasing this using LGPLv2+?

sorry for the late reply... sure I've no problem.

Hi @aya9aladdin - I emailed you about relicensing some of your historical contributions, please let me know if you didn't get the email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC GSoC project
Projects
None yet
10 participants