Using SIFTS data for renumbering residues to match the Uniprot sequence resids #110

mrauha · 2022-08-18T08:43:50Z

Hi all,

stumbled upon this paper describing the mapping of PDB residue id's to the ones in the sequence deposited in Uniprot:

Choudhary, P.; Anyango, S.; Berrisford, J.; Varadi, M.; Tolchard, J.; Velankar, S. Unified Access to up-to-Date Residue-Level Annotations from UniProt and Other Biological Databases for PDB Data via PDBx/mmCIF Files. bioRxiv, 2022, 2022.08.10.503473. https://doi.org/10.1101/2022.08.10.503473.

Frustrated by the inconsistencies in numbering, I'm writing some code to output pdb's with these Uniprot sequence matching id's, and using biopandas for the crunching.

The mmCIF's with the mapped residues can be downloaded from the url:

https://www.ebi.ac.uk/pdbe/entry-files/download/{pdb_id}_updated.cif"

The CIF file is nicely read with the mmCIF parser. The resid matching the one in Uniprot is in the column pdbx_sifts_xref_db_num, giving None for those without mapping to sequence, eg. ligands and the UNK's.

This paper/python code/webserver describes a similar thing using the SIFTS:

Faezov, B.; Dunbrack, R. L., Jr. PDBrenum: A Webserver and Program Providing Protein Data Bank Files Renumbered according to Their UniProt Sequences. PLoS One 2021, 16 (7), e0253411. https://doi.org/10.1371/journal.pone.0253411.

For the residues without a mapping, the residues are renumbered using an offset of 5k/50k so that there's no overlap with the new resids of amino acids.

However, occasionally a part of the chain is are UNK's, so I will implemented a way to use continuous numbering wrt the Uniprot mapped resids for these.

Work in progress - if there's an already existing way to do this, let me know :)

The text was updated successfully, but these errors were encountered:

Ruibin-Liu · 2022-12-20T16:17:33Z

The missing residues are not matched, which is a caveat for some uses.

mrauha changed the title ~~Using~~ Using SIFTS data for renumbering residues to match the Uniprot sequence resids Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using SIFTS data for renumbering residues to match the Uniprot sequence resids #110

Using SIFTS data for renumbering residues to match the Uniprot sequence resids #110

mrauha commented Aug 18, 2022

Ruibin-Liu commented Dec 20, 2022

Using SIFTS data for renumbering residues to match the Uniprot sequence resids #110

Using SIFTS data for renumbering residues to match the Uniprot sequence resids #110

Comments

mrauha commented Aug 18, 2022

Ruibin-Liu commented Dec 20, 2022