A collection of python and bash scripts for various bioinformatics-related tasks that I've found useful over the years.
FindMeHemes.py -contigs contigs.fna -outDir ./ -mode metagenome
Seqs.py -fasta assembly.fasta
A wrapper for five CheckM commands to generate alignments, trees, and summary stats (completion/redundancy scores) for a directory of genomes or bins (metagenome-assembled genomes)
checkm-quality.sh .fa path/to/bins/ 16
Takes as input a GFF file that represents a genome. Outputs to stdout a tab-delimited summary of the following information (in order)
file name - coding density - genome size (in Mb) - number of coding genes - number of pseudogenes (if included in annotation) - number of transposases - number of mutS genes - number of mutL genes - number of recA genes
coding-density.py -gff genome.gff
daltons.py -f proteins.faa
masker.py -i alignment.fa -o alignment.masked.fa -m 0.5
ribosome.py -i coding_genes.ffn -o protein_translations.faa -x y
A wrapper for three samtools commands to generate sorted BAM files for a directory containing SAM files (with .sam filename extension)
samtools_looper.sh /path/to/samFiles/
Finds and lists (in a generated CSV file) all 16S sequences and all available meta-data associated with a provided taxomomic name (at any taxonomic rank)
silva-survey.py -taxa Sodalis -silva_DB /path/to/silva_db.fasta -out_dir ./
Takes as input rRNA sequence reads, maps them to the SILVA datbase of 16S sequences, and genertes a taxonomic summary
ssuSilva.py -reads rRNA.fasta -silva_DB /path/to/silva_DB.fasta -t 16 -perc_identity 97 -min_aln 100 -out rRNA_taxa_summary
Provides a summary of codon usage for each amino acid. Takes as input gene sequences in nucleotide FASTA format.
codonUsage.py -g genes.ffn -o genes.codonUsage.csv