BagOfTricks

A collection of python and bash scripts for various bioinformatics-related tasks that I've found useful over the years.

Some examples:

FindFindMeHemes.py

Prediction of heme-binding motifs

FindMeHemes.py -contigs contigs.fna -outDir ./ -mode metagenome

Seqs.py

Print to stdout assembly stats

Seqs.py -fasta assembly.fasta

checkm-quality.sh

A wrapper for five CheckM commands to generate alignments, trees, and summary stats (completion/redundancy scores) for a directory of genomes or bins (metagenome-assembled genomes)

checkm-quality.sh .fa path/to/bins/ 16

coding-density.py

Takes as input a GFF file that represents a genome. Outputs to stdout a tab-delimited summary of the following information (in order)

file name - coding density - genome size (in Mb) - number of coding genes - number of pseudogenes (if included in annotation) - number of transposases - number of mutS genes - number of mutL genes - number of recA genes

coding-density.py -gff genome.gff

daltons.py

Calculate exact weights in daltons from protein sequences

daltons.py -f proteins.faa

masker.py

Mask alignment positions with too many gaps (user-defined fraction)

masker.py -i alignment.fa -o alignment.masked.fa -m 0.5

ribosome.py

Translates user-provided DNA sequences to proteins

ribosome.py -i coding_genes.ffn -o protein_translations.faa -x y

samtools_looper.sh

A wrapper for three samtools commands to generate sorted BAM files for a directory containing SAM files (with .sam filename extension)

samtools_looper.sh /path/to/samFiles/

silva-survey.py

Finds and lists (in a generated CSV file) all 16S sequences and all available meta-data associated with a provided taxomomic name (at any taxonomic rank)

silva-survey.py -taxa Sodalis -silva_DB /path/to/silva_db.fasta -out_dir ./

ssuSilva.py

Takes as input rRNA sequence reads, maps them to the SILVA datbase of 16S sequences, and genertes a taxonomic summary

ssuSilva.py -reads rRNA.fasta -silva_DB /path/to/silva_DB.fasta -t 16 -perc_identity 97 -min_aln 100 -out rRNA_taxa_summary

codonUsage.py

Provides a summary of codon usage for each amino acid. Takes as input gene sequences in nucleotide FASTA format.

codonUsage.py -g genes.ffn -o genes.codonUsage.csv

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
BED-maker.py		BED-maker.py
DAS_Tool-convert.py		DAS_Tool-convert.py
FindMeHemes.py		FindMeHemes.py
GC-calc.py		GC-calc.py
LICENSE		LICENSE
ORFS-to-BED.py		ORFS-to-BED.py
ORFS-to-GFF.py		ORFS-to-GFF.py
README.md		README.md
ReadLengthSep.py		ReadLengthSep.py
Seqs.py		Seqs.py
binning-results.py		binning-results.py
blast-to-fasta-2.sh		blast-to-fasta-2.sh
blast-to-fasta.py		blast-to-fasta.py
blast-to-fasta.sh		blast-to-fasta.sh
break-seq.py		break-seq.py
checkm-quality.sh		checkm-quality.sh
clu2fasta.py		clu2fasta.py
coding-density.py		coding-density.py
codonUsage.py		codonUsage.py
consensus-seq.py		consensus-seq.py
count-to-rpkm.py		count-to-rpkm.py
count-to-tpm.py		count-to-tpm.py
cytoscan.py		cytoscan.py
daltons.py		daltons.py
de-align.py		de-align.py
dereplicate.py		dereplicate.py
dirSplit.py		dirSplit.py
external-genome_format-for-anvio.py		external-genome_format-for-anvio.py
fasta-filter.py		fasta-filter.py
fasta-split.py		fasta-split.py
fastas-rename.py		fastas-rename.py
fastq-deduplicate.py		fastq-deduplicate.py
fastq-substract.py		fastq-substract.py
file_rename.py		file_rename.py
filter-fasta-header.py		filter-fasta-header.py
filter-fasta-seq.py		filter-fasta-seq.py
filter-seq-len.py		filter-seq-len.py
genecall-mod.py		genecall-mod.py
gif-maker.py		gif-maker.py
header-format.py		header-format.py
masker.py		masker.py
pacbio-blasr-extract.py		pacbio-blasr-extract.py
reformat-fasta.py		reformat-fasta.py
reformat-fastas.py		reformat-fastas.py
ribosome.py		ribosome.py
samtools_looper.sh		samtools_looper.sh
seq-len.py		seq-len.py
seq-pull.py		seq-pull.py
silva-sampler.py		silva-sampler.py
silva-survey.py		silva-survey.py
ssuSilva.py		ssuSilva.py
sub-sample.py		sub-sample.py
tar-pigz.sh		tar-pigz.sh
tblout-to-fasta.sh		tblout-to-fasta.sh
totalLength.py		totalLength.py
touch.sh		touch.sh
unique-seq.py		unique-seq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BagOfTricks

Some examples:

FindFindMeHemes.py

Prediction of heme-binding motifs

Seqs.py

Print to stdout assembly stats

checkm-quality.sh

A wrapper for five CheckM commands to generate alignments, trees, and summary stats (completion/redundancy scores) for a directory of genomes or bins (metagenome-assembled genomes)

coding-density.py

Takes as input a GFF file that represents a genome. Outputs to stdout a tab-delimited summary of the following information (in order)

daltons.py

Calculate exact weights in daltons from protein sequences

masker.py

Mask alignment positions with too many gaps (user-defined fraction)

ribosome.py

Translates user-provided DNA sequences to proteins

samtools_looper.sh

A wrapper for three samtools commands to generate sorted BAM files for a directory containing SAM files (with .sam filename extension)

silva-survey.py

Finds and lists (in a generated CSV file) all 16S sequences and all available meta-data associated with a provided taxomomic name (at any taxonomic rank)

ssuSilva.py

Takes as input rRNA sequence reads, maps them to the SILVA datbase of 16S sequences, and genertes a taxonomic summary

codonUsage.py

Provides a summary of codon usage for each amino acid. Takes as input gene sequences in nucleotide FASTA format.

About

Releases 1

Packages

Languages

License

Arkadiy-Garber/BagOfTricks

Folders and files

Latest commit

History

Repository files navigation

BagOfTricks

Some examples:

FindFindMeHemes.py

Prediction of heme-binding motifs

Seqs.py

Print to stdout assembly stats

checkm-quality.sh

A wrapper for five CheckM commands to generate alignments, trees, and summary stats (completion/redundancy scores) for a directory of genomes or bins (metagenome-assembled genomes)

coding-density.py

Takes as input a GFF file that represents a genome. Outputs to stdout a tab-delimited summary of the following information (in order)

daltons.py

Calculate exact weights in daltons from protein sequences

masker.py

Mask alignment positions with too many gaps (user-defined fraction)

ribosome.py

Translates user-provided DNA sequences to proteins

samtools_looper.sh

A wrapper for three samtools commands to generate sorted BAM files for a directory containing SAM files (with .sam filename extension)

silva-survey.py

Finds and lists (in a generated CSV file) all 16S sequences and all available meta-data associated with a provided taxomomic name (at any taxonomic rank)

ssuSilva.py

Takes as input rRNA sequence reads, maps them to the SILVA datbase of 16S sequences, and genertes a taxonomic summary

codonUsage.py

Provides a summary of codon usage for each amino acid. Takes as input gene sequences in nucleotide FASTA format.

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages