Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad change to mmCIF chain vs segment? #1902

Open
jamesmkrieger opened this issue Jun 4, 2024 · 0 comments
Open

bad change to mmCIF chain vs segment? #1902

jamesmkrieger opened this issue Jun 4, 2024 · 0 comments

Comments

@jamesmkrieger
Copy link
Contributor

mmCIF files often split different kinds of entities into chains and segments and have more divisions in the hierarchical view than PDB files

For example, with 4ake, we get 4 instead of 2. We have the option unite_chains, which restores this back and gives a similar behaviour to ChimeraX, but the issue is what happens when we don't use this option to have a more similar behaviour to PyMOL.

In the released version tag v2.4.1, we get something similar to PyMOL:

In [21]: ag = prody.parseMMCIF('4ake')

In [22]: list(ag.getHierView())
Out[22]: 
[<Chain: A from Segment A from 4ake (214 residues, 1656 atoms)>,
 <Chain: B from Segment B from 4ake (214 residues, 1656 atoms)>,
 <Chain: A from Segment C from 4ake (72 residues, 72 atoms)>,
 <Chain: B from Segment D from 4ake (75 residues, 75 atoms)>]

In our current ProDy master, we get them switched and that's probably an issue:

In [2]: ag = prody.parseMMCIF('4ake')

In [3]: list(ag.getHierView())
Out[3]: 
[<Chain: A from Segment A from 4ake (214 residues, 1656 atoms)>,
 <Chain: B from Segment B from 4ake (214 residues, 1656 atoms)>,
 <Chain: C from Segment A from 4ake (72 residues, 72 atoms)>,
 <Chain: D from Segment B from 4ake (75 residues, 75 atoms)>]

There seems to be a difference related to biomol assemblies with v2.4.1 giving an error and master not giving one, but not necessarily giving the right result although I think it does. Here is the example for 1ake:

v2.4.1

In [28]: ag = prody.parseMMCIF('1ake', biomol=True)

In [29]: ag
Out[29]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [30]: [bm.numChains() for bm in ag]
Out[30]: [1, 1]

In [31]: [list(bm.getHierView()) for bm in ag]
Out[31]: 
[[<Chain: A from Segment 1 from 1ake biomolecule 1 (456 residues, 1954 atoms)>],
 [<Chain: B from Segment 1 from 1ake biomolecule 2 (352 residues, 1850 atoms)>]]

In [32]: ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)

File ~/software/scipion3/software/em/prody-2.4.1/ProDy/prody/proteins/ciffile.py:125, in parseMMCIF(pdb, **kwargs)
    123 cif.close()
    124 if unite_chains:
--> 125     result.setSegnames(result.getChids())
    126 return result

AttributeError: 'list' object has no attribute 'setSegnames'

In [33]: ag = prody.parseMMCIF('1ake', biomol=True)

In [34]: [list(bm.protein.getHierView()) for bm in ag]
Out[34]: 
[[<Chain: A from Segment 1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>],
 [<Chain: B from Segment 1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>]]

master

In [10]: ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)

In [11]: ag
Out[11]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [12]: [list(bm.getHierView()) for bm in ag]
Out[12]: 
[[<Chain: A1 from Segment A1 from 1ake biomolecule 1 (456 residues, 1954 atoms)>],
 [<Chain: B1 from Segment B1 from 1ake biomolecule 2 (352 residues, 1850 atoms)>]]

In [13]: ag = prody.parseMMCIF('1ake', biomol=True)

In [14]: ag
Out[14]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [17]: [list(bm.getHierView()) for bm in ag]
Out[17]: 
[[<Chain: A from Segment A1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>,
  <Chain: C from Segment A1 from 1ake biomolecule 1 (1 residues, 57 atoms)>,
  <Chain: E from Segment A1 from 1ake biomolecule 1 (241 residues, 241 atoms)>],
 [<Chain: B from Segment B1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>,
  <Chain: D from Segment B1 from 1ake biomolecule 2 (1 residues, 57 atoms)>,
  <Chain: F from Segment B1 from 1ake biomolecule 2 (137 residues, 137 atoms)>]]

In [18]: [list(bm.protein.getHierView()) for bm in ag]
Out[18]: 
[[<Chain: A from Segment A1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>],
 [<Chain: B from Segment B1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant