Major update. Previous versions of uLTRA had several bottlenecks which made is infeasible for mapping larger datasets (number of reads). Most notable updates:
- A faster and more memory-efficient seed finder namfinder. (>10x faster than previously used MEM finders)
- Removed loading reads/SAM files into memory on several places and instead stream over the files (Previously a sam file of alignments was loaded into RAM)
- Compressing intermediate output.
This version has been tested on the datasets I used in the publication of uLTRA from 2021. The largest dataset in the evaluation is the IsoSeq Alzheimer dataset (4.5M reads). On the Alzheimer dataset using 19 cores, peak memory usage is now less than 30Gb (previously ~100Gb), the runtime is 3h 40m (previously 5h 40m), and disk usage has gone down due to compressed files (I have not measured the reductions in size).
The accuracy of v0.1 is only a very small fraction lower than previous version (v0.0.4.2) on the tested simulated datasets. The non-identical output to previous versions is due to the new seed finder. The boost in aligning to, e.g., small exons is still there compared to other aligners.