Skip to content

hmmer3 and easel patched to support the 3di alphabet.

License

Notifications You must be signed in to change notification settings

seanrjohnson/hmmer3di

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HMMER3Di - This is a fork of hmmer3.3.2 and easel 0.48 patched to support the Foldseek (3Di) alphabet

This program was used in the work:

Johnson, Sean R., et al. “Sensitive Remote Homology Search by Local Alignment of Small Positional Embeddings from Protein Language Models.” eLife, vol. 12, Feb. 2024. elifesciences.org, https://doi.org/10.7554/eLife.91415.2.

Note that this patched version of HMMER doesn't seem to perform any better on 3Di sequences than the original (amino acid tuned) version. I'm not sure exactly why.

The original hmmer and easel repositories are here

github.

and here: github.

Install

   % git clone [email protected]:seanrjohnson/hmmer3di.git 
   % autoconf
   % ./configure --prefix /your/install/path
   % make
   % source copy_executables.sh 
   % # copy executables will create a new directory called hmmer3Di, then it will copy
   % # hmmalign, hmmbuild, hmmpress, hmmsearch, hmmscan, and phmmer into that directory
   % # with 3Di_ added to the start of their names. From there you can execute them 
   % # or copy them into your $PATH

Background frequencines for 3Di sequences can be found here

3Di_background_frequencies.txt

Dirichlet priors calculated from 3Di MSAs

To generate a set of 3Di MSAs, we converted the AlphaFold UniProt Foldseek database (Jumper et al., 2021; van Kempen et al., 2023; Varadi et al., 2022) to a 3Di fasta file. We then looked up every sequence name from the Pfam 35 seed file in the UniProt 3Di fasta file and, for cases where the corresponding sequence was identifiable, extracted the sub-sequence corresponding to the Pfam 35 seed. 3Di seeds from each profile were aligned using MAFFT. MSA columns with more than 10 rows were used to calculate background frequencies and Dirichlet priors using the HMMER3 program esl-mixdchlet fit with options -s 17 9 20. pfam_35_3Di_msa_counts_lb_10.mixdchlet.txt

Changes made to support the 3Di alphabet

A full list of changes can be seen in the following diff: https://github.com/seanrjohnson/hmmer3di/compare/2637afc..87a5d15

About

hmmer3 and easel patched to support the 3di alphabet.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published