By Pengkai Zhu
Institution: Fujian Agriculture and Forestry University
Email: [email protected]
Cite: Zhu, P., He, T., Zheng, Y., and Chen, L. (2023). The need for masked genomes in gymnosperms. Frontiers in Plant Science 14. doi: 10.3389/fpls.2023.1309744.
Ultra-Large genomes often strain computational resources during alignment or indexing, leading to analysis issues. However, some analyses focus on specific genome regions, like exons, introns, UTRs, and key loci, which may represent only 50% or less of the total genome size. Aligning the entire genome results in unnecessary resource usage. Therefore, I propose masking repetitive regions to shrink the reference genome, making the analysis more efficient and lowering resource demands for large genome alignments.
mkdir -p OUTPUT
Red -gnm /path/to/genome/dir/ -msk ./OUTPUT -rpt ./OUTPUT
awk '!/>/ {gsub(/[atcg]/,"N")} 1' ./OUTPUT/genome.msk > ./OUTPUT/genome.hardmasked.fa
gff2bed < LTR.gff3 > LTR.bed
bedtools maskfasta -fi genome.fa -bed LTR.bed -fo genome.hardmasked.fasta