basic guesser features #3753

aya9aladdin · 2022-07-15T01:49:41Z

Fixes #

Changes made in this Pull Request:

I have created a guesser package that will contain all upcoming the context aware guessers. it contain the following so far:

a core module that only contain one method that will be used by the guess_topologyAttribute method of the universe
a base module that contains the BaseGusser class that will be inherited by every guesser
a DefaultGuesser module which will handle the currently existing guess methods (for now, I just added the mass guessing to test with one attribute guessing method)

I made changes to the universe class to work with the new methodology
I removed the mass guessing from the PDBParser to test guessing it at the universe level

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

pep8speaks · 2022-07-15T01:49:45Z

Hello @aya9aladdin! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/converters/OpenMMParser.py:

Line 184:80: E501 line too long (101 > 79 characters)
Line 185:80: E501 line too long (118 > 79 characters)
Line 197:80: E501 line too long (107 > 79 characters)
Line 200:80: E501 line too long (90 > 79 characters)
Line 202:1: W293 blank line contains whitespace

In the file package/MDAnalysis/core/groups.py:

Line 3298:1: W293 blank line contains whitespace

In the file package/MDAnalysis/core/universe.py:

Line 387:80: E501 line too long (90 > 79 characters)
Line 589:80: E501 line too long (85 > 79 characters)
Line 1464:80: E501 line too long (92 > 79 characters)
Line 1474:80: E501 line too long (81 > 79 characters)
Line 1474:82: W291 trailing whitespace
Line 1476:1: W293 blank line contains whitespace
Line 1480:80: E501 line too long (87 > 79 characters)
Line 1491:1: W293 blank line contains whitespace
Line 1492:79: W291 trailing whitespace
Line 1498:1: W293 blank line contains whitespace
Line 1507:80: E501 line too long (126 > 79 characters)
Line 1510:52: W291 trailing whitespace
Line 1522:80: E501 line too long (83 > 79 characters)
Line 1525:13: E303 too many blank lines (2)
Line 1528:35: E127 continuation line over-indented for visual indent
Line 1528:52: W291 trailing whitespace
Line 1532:30: E127 continuation line over-indented for visual indent

In the file package/MDAnalysis/guesser/base.py:

Line 48:80: E501 line too long (98 > 79 characters)
Line 59:1: W293 blank line contains whitespace
Line 61:1: W293 blank line contains whitespace
Line 69:80: E501 line too long (89 > 79 characters)
Line 70:80: E501 line too long (104 > 79 characters)
Line 71:80: E501 line too long (92 > 79 characters)
Line 71:93: W291 trailing whitespace
Line 73:1: W293 blank line contains whitespace
Line 85:1: W293 blank line contains whitespace
Line 117:1: W293 blank line contains whitespace
Line 133:32: W291 trailing whitespace
Line 141:1: W293 blank line contains whitespace

In the file package/MDAnalysis/guesser/default_guesser.py:

Line 32:1: E302 expected 2 blank lines, found 1
Line 34:80: E501 line too long (115 > 79 characters)
Line 34:116: W291 trailing whitespace
Line 35:80: E501 line too long (104 > 79 characters)
Line 37:14: W291 trailing whitespace
Line 45:1: W293 blank line contains whitespace
Line 46:80: E501 line too long (86 > 79 characters)
Line 52:1: W293 blank line contains whitespace
Line 57:1: W293 blank line contains whitespace
Line 65:34: E203 whitespace before ':'
Line 88:1: W293 blank line contains whitespace
Line 95:80: E501 line too long (84 > 79 characters)
Line 120:80: E501 line too long (112 > 79 characters)
Line 156:14: E225 missing whitespace around operator
Line 159:14: W291 trailing whitespace
Line 232:80: E501 line too long (80 > 79 characters)
Line 234:80: E501 line too long (80 > 79 characters)
Line 236:80: E501 line too long (80 > 79 characters)
Line 240:80: E501 line too long (82 > 79 characters)
Line 241:80: E501 line too long (80 > 79 characters)
Line 242:80: E501 line too long (83 > 79 characters)
Line 245:80: E501 line too long (80 > 79 characters)
Line 246:80: E501 line too long (82 > 79 characters)
Line 261:80: E501 line too long (80 > 79 characters)
Line 282:80: E501 line too long (81 > 79 characters)
Line 292:42: E713 test for membership should be 'not in'
Line 319:5: E303 too many blank lines (2)
Line 322:80: E501 line too long (93 > 79 characters)
Line 324:80: E501 line too long (85 > 79 characters)
Line 325:27: E128 continuation line under-indented for visual indent
Line 325:80: E501 line too long (81 > 79 characters)
Line 326:1: W293 blank line contains whitespace
Line 355:80: E501 line too long (80 > 79 characters)
Line 356:80: E501 line too long (83 > 79 characters)
Line 365:80: E501 line too long (93 > 79 characters)
Line 405:80: E501 line too long (94 > 79 characters)
Line 406:1: W293 blank line contains whitespace
Line 410:80: E501 line too long (81 > 79 characters)
Line 437:20: E713 test for membership should be 'not in'
Line 446:5: E303 too many blank lines (2)
Line 455:5: E303 too many blank lines (2)
Line 488:1: W293 blank line contains whitespace

In the file package/MDAnalysis/topology/PDBParser.py:

Line 334:80: E501 line too long (103 > 79 characters)
Line 341:80: E501 line too long (108 > 79 characters)

In the file package/MDAnalysis/topology/TOPParser.py:

Line 293:80: E501 line too long (100 > 79 characters)

In the file package/MDAnalysis/topology/TXYZParser.py:

Line 122:16: E111 indentation is not a multiple of four

In the file testsuite/MDAnalysisTests/analysis/test_encore.py:

Line 618:80: E501 line too long (90 > 79 characters)
Line 622:80: E501 line too long (105 > 79 characters)
Line 624:1: W293 blank line contains whitespace
Line 629:1: W293 blank line contains whitespace
Line 903:9: E741 ambiguous variable name 'l'
Line 904:80: E501 line too long (97 > 79 characters)
Line 905:80: E501 line too long (91 > 79 characters)

In the file testsuite/MDAnalysisTests/converters/test_openmm_parser.py:

Line 177:23: E127 continuation line over-indented for visual indent
Line 179:80: E501 line too long (89 > 79 characters)
Line 180:80: E501 line too long (106 > 79 characters)
Line 205:80: E501 line too long (95 > 79 characters)

In the file testsuite/MDAnalysisTests/converters/test_rdkit.py:

Line 148:80: E501 line too long (83 > 79 characters)

In the file testsuite/MDAnalysisTests/core/test_universe.py:

Line 383:1: E302 expected 2 blank lines, found 1
Line 393:1: W293 blank line contains whitespace
Line 400:13: E117 over-indented
Line 403:1: W293 blank line contains whitespace
Line 410:1: W293 blank line contains whitespace
Line 518:5: E301 expected 1 blank line, found 0
Line 755:80: E501 line too long (86 > 79 characters)
Line 1370:80: E501 line too long (86 > 79 characters)
Line 1374:9: E303 too many blank lines (2)

In the file testsuite/MDAnalysisTests/guesser/test_base.py:

Line 29:1: E302 expected 2 blank lines, found 1
Line 30:1: W293 blank line contains whitespace
Line 35:29: W291 trailing whitespace
Line 38:1: W293 blank line contains whitespace
Line 39:1: W391 blank line at end of file

In the file testsuite/MDAnalysisTests/guesser/test_default_guesser.py:

Line 53:1: E302 expected 2 blank lines, found 1
Line 60:80: E501 line too long (93 > 79 characters)
Line 63:5: E303 too many blank lines (2)
Line 66:25: E201 whitespace after '('
Line 66:80: E501 line too long (91 > 79 characters)
Line 73:80: E501 line too long (121 > 79 characters)
Line 79:80: E501 line too long (80 > 79 characters)
Line 90:80: E501 line too long (84 > 79 characters)
Line 150:51: E231 missing whitespace after ','
Line 150:53: E231 missing whitespace after ','
Line 174:1: E302 expected 2 blank lines, found 1
Line 176:80: E501 line too long (92 > 79 characters)
Line 178:27: E127 continuation line over-indented for visual indent
Line 182:1: E302 expected 2 blank lines, found 1
Line 188:19: E127 continuation line over-indented for visual indent
Line 190:1: E302 expected 2 blank lines, found 1
Line 196:19: E127 continuation line over-indented for visual indent
Line 232:27: E127 continuation line over-indented for visual indent

In the file testsuite/MDAnalysisTests/topology/base.py:

Line 29:1: W293 blank line contains whitespace

In the file testsuite/MDAnalysisTests/topology/test_itp.py:

Line 249:1: W293 blank line contains whitespace

In the file testsuite/MDAnalysisTests/topology/test_lammpsdata.py:

Line 259:1: W293 blank line contains whitespace

Comment last updated at 2022-09-12 19:40:16 UTC

lilyminium

Hi @aya9aladdin, this is a neat start! I only had time to skim but left a couple comments. The tests are failing because _GUESSERS can't be imported -- it's not in the top level __init__.py yet.

lilyminium · 2022-07-15T06:37:19Z

package/MDAnalysis/core/universe.py

@@ -1436,6 +1439,19 @@ def from_smiles(cls, smiles, sanitize=True, addHs=True,

        return cls(mol, **kwargs)

+    def guess_topoloyAttribute(self, context, to_guess):


Suggested change

def guess_topoloyAttribute(self, context, to_guess):

def guess_TopologyAttr(self, context, to_guess):

There's already a method called add_TopologyAttr, and the class is called a TopologyAttr. Going with the principal of least astonishment that advises a consistent API, could you please rename this method to something that users could easily guess (😛) themselves?

lilyminium · 2022-07-15T07:01:19Z

package/MDAnalysis/guesser/DefaultGuesser.py

+    context = 'default'
+
+    def __init__(self):
+        self._guess = {'mass': self.guess_masses}


Hmm. Instead of this, would it be tidier to have a class-level _guess dictionary like TopologyAttr.transplants? I'm not sure it makes sense for instances of guesser classes to differ.

I'll check this and see how I could integrate the transplant idea

lilyminium · 2022-07-15T07:04:57Z

package/MDAnalysis/guesser/base.py

+                                 .format(self.context, a))
+        return True
+
+    def guessTopologyAttribute(self, to_guess):


In the absence of type hinting, could you please change the name of the to_guess variables to be clearer? here it's one string, but in is_guessed it's a list of strings -- that is quite confusing. Also, could you please name your methods with conventional snake_case and use TopologyAttr instead of TopologyAttribute?

Do you plan to have an overall method for guessing all the attributes at once?

Yes I'm planning to have a method for guessing all the attributes at once, this was just a test for guessing one attribute

lilyminium · 2022-07-15T07:05:35Z

package/MDAnalysis/guesser/base.py

+        values = self._guess[to_guess]()
+        return values
+
+    def setAtoms(self, atoms):


Does this method need to exist? Could __init__ just take atoms instead, like a Writer?

I don't remember why I did it this way tbh, so I'll remove it

package/MDAnalysis/guesser/base.py

package/MDAnalysis/core/universe.py

jbarnoud · 2022-07-21T16:10:22Z

@aya9aladdin I recommend that you push your changes often. Do not wait for the code to be finished. By pushing often, we get to see your progress, we get to comment on your latest version instead of a version that may already be obsolete, and we may catch issues earlier, reducing the amount of work you will have to redo.

aya9aladdin · 2022-07-21T17:05:55Z

@aya9aladdin I recommend that you push your changes often. Do not wait for the code to be finished. By pushing often, we get to see your progress, we get to comment on your latest version instead of a version that may already be obsolete, and we may catch issues earlier, reducing the amount of work you will have to redo.

I was planning to push everything when I finish indeed. I don't know if I have to make a pull request at this unmatured level or not?

jbarnoud · 2022-07-22T05:15:50Z

You already have a pull request, no need to open another one. Instead, update this pull request by pushing your new commits.

aya9aladdin · 2022-07-23T02:13:45Z

I have made some updates:
1- I removed all mass and type guessing from parsers and transferred it to take place inside the universe initiation, so mass and atom types will still be guessed automatically until we are ready to remove it

2- some attributes depend on other attributes to be guessed. if those parent attributes also need to be guessed, this can raise unnecessary errors if we attempted to guess the child attribute before the parent one. So, I added a rank dictionary to the guesser class which will rank each attribute based on its dependency on parent attributes to be guessed. For example ,AtomType depend on atom name then it will have rank 1. While masses depends on atom type which depends on atom name then it will have rank 2. By using this rank atom types will be guessed then masses to avoid errors.
(I hard coded it for now in the DefaultGuesser for testing)

next update will have:
1-finishing the BaseGuesser
2- add guesser metaclass for registration of guessers and other required dictionaries
3- add AtomType attributes to be equal to Element for PDBParser, FHIAIMSParser, TXYZParser, and XYZParser parsers

package/MDAnalysis/core/universe.py

orbeckst

I had a partial look through, I hope the minor comments help.

package/MDAnalysis/core/universe.py

package/MDAnalysis/guesser/DefaultGuesser.py

aya9aladdin · 2022-07-23T21:54:01Z

I'm just more concerned with the logic than the documentation at the moment that's why it's not very accurate

remove guessing types and masses from parsers

…lysis into guesser-basics

aya9aladdin · 2022-07-26T23:46:59Z

I updated some files as follows:
1- I modified guess_bonds inside the universe to pass a context to it (honestly I didn't get why bond guessing happens inside the AtomGroup, not the universe, so I left it as it is just modified the code to pass the context)
2- I modified the RDKitParser, PDBParser, XYZParser, and FHIAMSParser to add AtomType if it exists, and if not assign it to the elements then names attributes
3- at the point most of the DefualtGuesser is considered complete
4- I added a GuesserMeta to register the guesser classes to be implemented. I was planning to make the _guess dictionary mimic the TopologyAtrribute.transplant but I didn't find it has an application right now, so I postponed it to when I begin working on new guesser classes and see how it could work

What I would do next:
1- update modules where old guessing takes place (hydrogen bond analysis modules as I remember)
2- test and document what I have done so far

- updated guess_TopologyAttrs docs - fixed some tests - capitalize atomtypes from elements in RDKitParser

aya9aladdin · 2024-09-18T15:43:06Z

@lilyminium @IAlibay I pushed last updates, let me know if any other updates is needed

lilyminium · 2024-09-20T00:57:16Z

Thanks @aya9aladdin!! Will aim to get my review in by this weekend.

lilyminium · 2024-09-22T14:33:17Z

Looks like I'd already approved the PR! I did just look over your changes and they seem to address the remaining concerns @IAlibay and I discussed, but I'll also wait on @IAlibay to re-review.

IAlibay · 2024-09-22T15:05:40Z

I won't be able to re-review for a little while unfortunately, probably end of the week or next weekend.

IAlibay

Apologies for the delay, I have two remaining questions, but happy to approve.

@lilyminium would it be ok if you took care of shepherding the remaining things?

IAlibay · 2024-09-29T20:09:40Z

package/MDAnalysis/core/universe.py

    """
    def __init__(self, topology=None, *coordinates, all_coordinates=False,
                 format=None, topology_format=None, transformations=None,
                 guess_bonds=False, vdwradii=None, fudge_factor=0.55,
-                 lower_bound=0.1, in_memory=False,
+                 lower_bound=0.1, in_memory=False, context='default',
+                 to_guess=('types', 'masses'), force_guess=(),


Suggested change

to_guess=('types', 'masses'), force_guess=(),

to_guess=('types', 'masses'), force_guess=None,

Could you confirm that keeping () instead of None was intentional?
See: #3753

@lilyminium feel free to turn this into an issue, we can fix this in post.

yes it's intended to be that way indeed, for easier manuplation

IAlibay · 2024-09-29T20:13:54Z

package/MDAnalysis/converters/RDKitParser.py

@@ -303,8 +306,7 @@ def parse(self, **kwargs):
        if atomtypes:
            attrs.append(Atomtypes(np.array(atomtypes, dtype=object)))
        else:
-            atomtypes = guessers.guess_types(names)
-            attrs.append(Atomtypes(atomtypes, guessed=True))
+            atomtypes = np.char.upper(elements)


@aya9aladdin @lilyminium - this is assigned but nothing else is done with it afterwards it seems? I feel like I'm missing something, could you please just double check?

IAlibay · 2024-10-13T06:32:28Z

repinging @lilyminium - we should try to get this (and any subsequent fixes) into 2.8.0

package/CHANGELOG

Co-authored-by: Lily Wang <[email protected]>

lilyminium · 2024-10-19T03:10:33Z

Thanks, @aya9aladdin, everything looks good to go! Really excited to merge this. The only discussion point left is in which release. We have one coming up (2.8.0) that we could make as the feature freeze is tomorrow (Sunday). IMO this should definitely go in before 3.0, but it may be too fast for 2.8.0 as we wouldn't have a lot of testing before putting it in a published release.

@MDAnalysis/coredevs do you have any opinions on if this PR should get merged into 2.8.0, or wait for a prospective 2.9? The deadline for making it into 2.8.0 is 8 hours from the time of this comment as that's the time I have before the feature freeze sets in (apologies for how fast that is). If there are no comments otherwise I will follow the default current plan as I understand it and merge this into develop 8 hours from now. If there is discussion with different opinions but no consensus I will delay merge until after 2.8.

Below are some pros/cons IMO for merging into 2.8:

Pros:

this PR has been open for quite a while and it would be good to have it merged
In the event there is no 2.9, we should get deprecation warnings in before changing behaviour in 3.0
In the event there is a 2.9, it's still better to have deprecation warnings in earlier to give more time to respond
this PR shouldn't result in behaviour changes* (asterisked -- see con below)
This is the current plan as I understand it

Cons:

this is a major PR and while it passes current tests, there may be unforeseen effects that we don't see that make it into a release. More time in develop means more time to test.

IAlibay · 2024-10-19T05:52:03Z

My take right now is that if we're hesitant on getting this in for 2.8 then let's wait on 2.9.

lilyminium · 2024-10-19T13:20:03Z

Since there's been no discussion otherwise, I'm calling it -- this should get auto-merged into develop once tests go green. Huge congratulations Aya -- this was a truly monumental PR and you did an amazing job, pushing the project forward for years! I'm really excited to see how this new guesser paradigm will unlock more flexibility for MDAnalysis in handling different molecular simulation contexts going forward -- your contribution here is really fantastic!

jbarnoud · 2024-10-19T13:24:54Z

That is huge ! Congratulations @aya9aladdin ! I am extremely glad to see this project reach its conclusion 🚀

Dismissed old review due to being stale.

aya9aladdin · 2024-10-19T16:42:11Z

I am thrilled to have this merged!! thankful for all your revies and the experience.... looking forward to use the new functionality and contributing more to MDAnalysis!

IAlibay · 2024-10-19T16:53:58Z

Congrats @aya9aladdin - it's been a long road but definitely a great addition to the library.

orbeckst · 2024-10-20T03:44:32Z

Congratulations @aya9aladdin — awesome!!

also big thank you to @lilyminium and @IAlibay for all your support!

IAlibay · 2024-11-10T15:13:36Z

Apologies for the delay, it's a busy time of year for social obligations.
This is looking good!
Here's my first half of the review: mostly docstring changes.
@aya9aladdin could you confirm that you are ok releasing this using LGPLv2+?

sorry for the late reply... sure I've no problem.

Hi @aya9aladdin - I emailed you about relicensing some of your historical contributions, please let me know if you didn't get the email.

github-actions bot added Component-Core Component-Topology labels Jul 15, 2022

aya9aladdin marked this pull request as ready for review July 15, 2022 01:58

lilyminium requested changes Jul 15, 2022

View reviewed changes

aya9aladdin added 5 commits July 23, 2022 05:17

Update core.py

60b24c1

Update core.py

ce68bd1

Update core.py

ae862fc

Merge branch 'develop' into guesser-basics

c5d3453

Update core.py

f6053dc

orbeckst reviewed Jul 23, 2022

View reviewed changes

package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved

orbeckst reviewed Jul 23, 2022

View reviewed changes

aya9aladdin and others added 8 commits July 25, 2022 21:18

remove guessing from parser

bb04030

remove guessing types and masses from parsers

Update universe.py

91f0d08

Update universe.py

2a013f1

Update base.py

93ffc3f

Update core.py

bf41f72

Update tables.py

9cd3460

Merge branch 'guesser-basics' of https://github.com/aya9aladdin/mdana…

ca1f871

…lysis into guesser-basics

Parsers modification / guess_bond update

6d48520

github-actions bot added Component-Converters Component-Readers labels Jul 26, 2022

aya9aladdin added 2 commits July 27, 2022 01:51

aa

4f2782f

plural attr

6661c36

github-actions bot removed the Component-Converters label Sep 17, 2024

aya9aladdin and others added 4 commits September 17, 2024 19:23

Merge branch 'develop' into guesser-basics

80a1b74

- removed guessing types from masses

8986a97

- updated guess_TopologyAttrs docs - fixed some tests - capitalize atomtypes from elements in RDKitParser

Update CHANGELOG

c0b833e

Update universe.py

00dff7c

Merge branch 'develop' into guesser-basics

1c15f21

IAlibay approved these changes Sep 29, 2024

View reviewed changes

lilyminium reviewed Oct 17, 2024

View reviewed changes

package/CHANGELOG Outdated Show resolved Hide resolved

Update package/CHANGELOG

7cc0341

Co-authored-by: Lily Wang <[email protected]>

Merge remote-tracking branch 'upstream/develop' into guesser-basics

c225f75

lilyminium mentioned this pull request Oct 19, 2024

[DNM] Update guesser docs #4741

Closed

5 tasks

Merge branch 'develop' into guesser-basics

e7ddaaf

lilyminium enabled auto-merge (squash) October 19, 2024 13:19

lilyminium merged commit 9b69745 into MDAnalysis:develop Oct 19, 2024
24 checks passed

Marcello-Sega mentioned this pull request Oct 20, 2024

[pytim-develop] Failed CI run MDAnalysis/MDAKits#217

Closed

		@@ -1436,6 +1439,19 @@ def from_smiles(cls, smiles, sanitize=True, addHs=True,

		return cls(mol, **kwargs)

		def guess_topoloyAttribute(self, context, to_guess):

	def guess_topoloyAttribute(self, context, to_guess):
	def guess_TopologyAttr(self, context, to_guess):

	to_guess=('types', 'masses'), force_guess=(),
	to_guess=('types', 'masses'), force_guess=None,

basic guesser features #3753

basic guesser features #3753

Conversation

aya9aladdin commented Jul 15, 2022 • edited by orbeckst Loading

PR Checklist

pep8speaks commented Jul 15, 2022 • edited Loading

Comment last updated at 2022-09-12 19:40:16 UTC

lilyminium left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbarnoud commented Jul 21, 2022

aya9aladdin commented Jul 21, 2022

jbarnoud commented Jul 22, 2022

aya9aladdin commented Jul 23, 2022

orbeckst left a comment

Choose a reason for hiding this comment

aya9aladdin commented Jul 23, 2022

aya9aladdin commented Jul 26, 2022

aya9aladdin commented Sep 18, 2024

lilyminium commented Sep 20, 2024

lilyminium commented Sep 22, 2024

IAlibay commented Sep 22, 2024

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay commented Oct 13, 2024

lilyminium commented Oct 19, 2024 • edited Loading

IAlibay commented Oct 19, 2024

lilyminium commented Oct 19, 2024

jbarnoud commented Oct 19, 2024

aya9aladdin commented Oct 19, 2024

IAlibay commented Oct 19, 2024

orbeckst commented Oct 20, 2024

IAlibay commented Nov 10, 2024

aya9aladdin commented Jul 15, 2022 •

edited by orbeckst

Loading

pep8speaks commented Jul 15, 2022 •

edited

Loading

lilyminium commented Oct 19, 2024 •

edited

Loading