Skip to content

Latest commit

 

History

History
45 lines (29 loc) · 2.31 KB

README.md

File metadata and controls

45 lines (29 loc) · 2.31 KB

HMMER3Di - This is a fork of hmmer3.3.2 and easel 0.48 patched to support the Foldseek (3Di) alphabet

This program was used in the work:

Johnson, Sean R., et al. “Sensitive Remote Homology Search by Local Alignment of Small Positional Embeddings from Protein Language Models.” eLife, vol. 12, Feb. 2024. elifesciences.org, https://doi.org/10.7554/eLife.91415.2.

Note that this patched version of HMMER doesn't seem to perform any better on 3Di sequences than the original (amino acid tuned) version. I'm not sure exactly why.

The original hmmer and easel repositories are here

github.

and here: github.

Install

   % git clone [email protected]:seanrjohnson/hmmer3di.git 
   % autoconf
   % ./configure --prefix /your/install/path
   % make
   % source copy_executables.sh 
   % # copy executables will create a new directory called hmmer3Di, then it will copy
   % # hmmalign, hmmbuild, hmmpress, hmmsearch, hmmscan, and phmmer into that directory
   % # with 3Di_ added to the start of their names. From there you can execute them 
   % # or copy them into your $PATH

Background frequencines for 3Di sequences can be found here

3Di_background_frequencies.txt

Dirichlet priors calculated from 3Di MSAs

To generate a set of 3Di MSAs, we converted the AlphaFold UniProt Foldseek database (Jumper et al., 2021; van Kempen et al., 2023; Varadi et al., 2022) to a 3Di fasta file. We then looked up every sequence name from the Pfam 35 seed file in the UniProt 3Di fasta file and, for cases where the corresponding sequence was identifiable, extracted the sub-sequence corresponding to the Pfam 35 seed. 3Di seeds from each profile were aligned using MAFFT. MSA columns with more than 10 rows were used to calculate background frequencies and Dirichlet priors using the HMMER3 program esl-mixdchlet fit with options -s 17 9 20. pfam_35_3Di_msa_counts_lb_10.mixdchlet.txt

Changes made to support the 3Di alphabet

A full list of changes can be seen in the following diff: https://github.com/seanrjohnson/hmmer3di/compare/2637afc..87a5d15