Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepeatMaskerLib.embl not built (DateRepeats) #150

Open
EricDeveaud opened this issue Jan 31, 2022 · 4 comments
Open

RepeatMaskerLib.embl not built (DateRepeats) #150

EricDeveaud opened this issue Jan 31, 2022 · 4 comments
Assignees
Labels

Comments

@EricDeveaud
Copy link

Describe the issue

RepeatMaskerLib.embl is not built while configuring RepeatMasker-4.1.2-p1 and is requestrd by DateRepeats

rpm_maker:RepeatMasker/RepeatMasker-4.1.2-p1 > DateRepeats 
Indicate directory with the RepeatMasker repeat libraries near line 136 of /opt/gensoft/exe/RepeatMasker/4.1.2-p1/bin/DateRepeats

Reproduction steps

wget https://www.repeatmasker.org/RepeatMasker/RepeatMasker-4.1.2-p1.tar.gz
tar xf RepeatMasker-4.1.2-p1.tar.gz
mv RepeatMasker RepeatMasker-4.1.2-p1 && cd RepeatMasker-4.1.2-p1
tar xf ${HOME}/RepBaseRepeatMaskerEdition-20181026.tar.gz
wget https://www.dfam.org/releases/Dfam_3.1/families/Dfam.embl.gz
gunzip  -c Dfam.embl.gz > Libraries/Dfam.embl
module load rmblastn/2.10.0 \
            phrap/1.090518 \
            hmmer/3.2.1 \
            trf/4.09
perl configure -rmblast_dir $(dirname $(command -v rmblastn)) \
               -crossmatch_dir $(dirname $(command -v  cross_match)) \
               -hmmer_dir $(dirname $(command -v hmmconvert)) \
               -trf_prgm $(command -v trf) \
               -default_search_engine rmblast

Log output

 -- Setting perl interpreter...
RepeatMasker Configuration Program


Checking for libraries...

Rebuilding RepeatMaskerLib.h5 master library
  - Read in 49011 sequences from /opt/gensoft/src/RepeatMasker/RepeatMasker_full-4.1.2-p1/Libraries/RMRBSeqs.embl
  - Read in 49011 annotations from /opt/gensoft/src/RepeatMasker/RepeatMasker_full-4.1.2-p1/Libraries/RMRBMeta.embl
  Merging Dfam + RepBase into RepeatMaskerLib.h5 library..........................................

File: /opt/gensoft/src/RepeatMasker/RepeatMasker_full-4.1.2-p1/Libraries/RepeatMaskerLib.h5
Database: Dfam withRBRM
Version: 3.3
Date: 2020-11-09

Dfam - A database of transposable element (TE) sequence alignments and HMMs.
RBRM - RepBase RepeatMasker Edition - version 20181026

Total consensus sequences: 51780
Total HMMs: 6915

.
Building FASTA version of RepeatMasker.lib .......................
Building RMBlast frozen libraries..
The program is installed with a the following repeat libraries:
File: /opt/gensoft/src/RepeatMasker/RepeatMasker_full-4.1.2-p1/Libraries/RepeatMaskerLib.h5
Database: Dfam withRBRM
Version: 3.3
Date: 2020-11-09

Dfam - A database of transposable element (TE) sequence alignments and HMMs.
RBRM - RepBase RepeatMasker Edition - version 20181026

Total consensus sequences: 51780
Total HMMs: 6915


Further documentation on the program may be found here:
  /opt/gensoft/src/RepeatMasker/RepeatMasker_full-4.1.2-p1/repeatmasker.help

BUT !

ls Libraries/
Artefacts.embl   RMRBSeqs.embl            RepeatMasker.lib.nsq  RepeatPeps.lib.pin
Dfam.embl        RepeatAnnotationData.pm  RepeatMasker.lib.ntf  RepeatPeps.lib.pot
Dfam.h5          RepeatMasker.lib         RepeatMasker.lib.nto  RepeatPeps.lib.psq
README.RMRBSeqs  RepeatMasker.lib.ndb     RepeatMaskerLib.h5    RepeatPeps.lib.ptf
README.meta      RepeatMasker.lib.nhr     RepeatPeps.lib        RepeatPeps.lib.pto
RMRB.embl        RepeatMasker.lib.nin     RepeatPeps.lib.pdb    RepeatPeps.readme
RMRBMeta.embl    RepeatMasker.lib.not     RepeatPeps.lib.phr    taxonomy.dat

and

./DateRepeats
Indicate directory with the RepeatMasker repeat libraries near line 135 of ./DateRepeats

no RepeatMasker.embl required by DateRepeats

Environment (please include as much of the following information as you can find out):

perl: 5.30.1
Python: version 3.8.1 (hdf5py 3.6.0)
rmblastn: version 2.10.0
phrap: version 1.090518
hmmer: version 3.2.1
trf: version 4.09
  • How did you install RepeatMasker?
    manual installation from repeatmasker.org from tar.gz archive

  • Which version of RepeatMasker do you have?

./RepeatMasker -v        
RepeatMasker version 4.1.2-p1
  • Operating system and version. The output of uname -a and lsb_release -a can be used to find this.
 uname -a
Linux 1b305326d2fe 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Apr 8 19:01:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Additional context
version 4.1.0 previously installed works as expected.

@rmhubley
Copy link
Member

This is indeed a problem. DateRepeats is quite an old tool and may need some modifications in order to make it work with the new *.h5 database format. I will let you know if I can find a quick workaround.

@galt
Copy link

galt commented Oct 12, 2022

DateRepeats 4.1.2 is also failing at UCSC Genome Browser building our hg38 patch 14. We use it to strip out the human specific repeats.

I added the famdbfile setting to DateRepeats
so it does not complain about famdbfile path not found:
my $tax = Taxonomy->new( taxonomyDataFile => $taxFile, famdbfile => "$dir/RepeatMaskerLib.h5");

However, it runs for more than 27 hours using CPU the whole time until I killed it.

With RM version 4.1.0, all the small patch chromosomes finished in just about one minute each.

Please let me know if it would be handy to supply the commandline and input file for testing.

@galt
Copy link

galt commented Oct 12, 2022

Hanging command is:
DateRepeats chr5_MU273352v1_fix.txt -query human -comp 'mus musculus'

chr5_MU273352v1_fix.txt

@rmhubley
Copy link
Member

rmhubley commented Nov 9, 2022

Thanks Galt. I removed DateRepeats in the latest version (4.1.4) as it needs refactoring. I will make sure this is a high priority for the next release.

@rmhubley rmhubley changed the title RepeatMaskerLib.embl not built RepeatMaskerLib.embl not built (DateRepeats) Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants