The sequence length does not match the number of residues #538

DoubleSheep2 · 2023-08-17T16:23:14Z

I'm trying to create a coarse-grained model for a virus with an atomic model consisting of 60 chains, each having around 500 amino acids. When I use the following command for coarse-graining,

martinize2 -f particle_forCMD_minimize.pdb -o particle_forCMD_minimize.top -x particle_forCMD_minimize_cg.pdb -ff martini3001 -p backbone -maxwarn 1 -mutate HSD:HIS -mutate HSP:HIH -dssp /home/emuser/miniconda2/envs/dssp/bin/mkdssp

I get an error saying, "The sequence length does not match the number of residues. The sequence has 476 elements for 477 residues." This error occurs during the dssp step. I believe this error isn't related to the input model because when I reduced the number of chains, the command worked fine. How can I resolve this issue?

The text was updated successfully, but these errors were encountered:

pckroon · 2023-08-18T08:56:57Z

How many residues do you have in your system exactly? How many residues does DSSP find/annotate if you run it on particle_forCMD_minimize.pdb?
If this doesn't shed light, try running with -v. This will preserve any intermediate files, such as the one that we feed to dssp.

DoubleSheep2 · 2023-08-18T12:27:15Z

The entire virus consists of 60 identical capsid protein monomers, with each chain containing 477 amino acids (aa.129-605), totaling 28,620 amino acids. In debug mode, I inspected the last dssp-generated pdb file (34th) before the error. It appears quite unusual - the 1-33 chains start at position 129 and end at 605, while this specific chain (34th) starts at position 544, goes up to 605, then resets and starts from position 129.
The order of the chains in PDB file does not affect the occurrence of the error when processing the 34th chain. Hence, could this be due to the large system size causing the program to encounter issues similar to stack overflow problems?
running environment: mkdssp v3.0.0 (conda) and martinize2 v0.9.3 (conda).

pckroon · 2023-08-18T12:32:04Z

The order of the chains in PDB file does not affect the occurrence of the error when processing the 34th chain. Hence, could this be due to the large system size causing the program to encounter issues similar to stack overflow problems?

No I don't think so. If it did I also think they would show up differently.

Does it work if you remove the afflicted/suspect chain from your input file? Does it have missing atoms in critical spots? What does the dssp output look like if you feed that specific DSSP input file to it?

DoubleSheep2 · 2023-08-18T16:09:49Z

I think I've identified the cause of the error, which might be related to atom serial number. Due to limitations in the PDB format, atom numbering can't go beyond 99999, and my system has a total of 230,000 atoms. When I adjusted all atom serial numbers to 99999, dssp threw error when processing the 7th chain. However, when I cyclically numbered atoms from 1 to 99999, the error occurred when processing the 42nd chain. Therefore, for larger systems, is there a preprocessing approach that can be employed?

pckroon · 2023-08-21T09:40:45Z

Hmmn, I know for sure we've seen this issue before, but I can't remember the fix/workaround. How do the atom numbers look for the 42nd chain? It may be a reasonably quick solution to renumber the atoms when writing the PDB for dssp.

DoubleSheep2 · 2023-08-21T16:29:57Z

Thank you so much for your assistance. I tried numbering each chain's atoms starting from 1, and it resolved the issue. Even in a system with 140 chains, no errors occurred. Hopefully, this solution can help others as well.

pckroon · 2023-08-22T08:50:36Z

Thanks for confirming that fixes it (and I'm happy you found a workaround). I'll put it on the list to have the DSSP processor renumber atoms before writing the dssp input pdb files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The sequence length does not match the number of residues #538

The sequence length does not match the number of residues #538

DoubleSheep2 commented Aug 17, 2023 •

edited by pckroon

Loading

pckroon commented Aug 18, 2023

DoubleSheep2 commented Aug 18, 2023

pckroon commented Aug 18, 2023

DoubleSheep2 commented Aug 18, 2023

pckroon commented Aug 21, 2023

DoubleSheep2 commented Aug 21, 2023

pckroon commented Aug 22, 2023

The sequence length does not match the number of residues #538

The sequence length does not match the number of residues #538

Comments

DoubleSheep2 commented Aug 17, 2023 • edited by pckroon Loading

pckroon commented Aug 18, 2023

DoubleSheep2 commented Aug 18, 2023

pckroon commented Aug 18, 2023

DoubleSheep2 commented Aug 18, 2023

pckroon commented Aug 21, 2023

DoubleSheep2 commented Aug 21, 2023

pckroon commented Aug 22, 2023

DoubleSheep2 commented Aug 17, 2023 •

edited by pckroon

Loading