Skip to content

bartns/inositol

Repository files navigation

Tested on Ubuntu 18.04

Requirements:

File explanations:

  • RAST_mgmIDs: RAST data files used for this analyses

  • SRR_datasets: corresponding SRR dataset IDs from RAST

  • rename_RastID_to_SRR.sh: Rename file with RASTid to it's corresponding SRR id (only for this dataset)

  • totalReads_samples.tsv: Total reads in sample. (RAST screened and passed fastq)

  • Anaerostipes_bracken_abundance.tsv: Abundances of Anaerostipus in the samples from bracken

  • inositol_pathway_proteins.faa: Protein sequences of the inositol pathway.

  • inositol_SRR_diamond_0.4_counts.tsv: Protein hits from diamond analysis.

  • transform.py: Script to make tables and figures from diamond and bracken data.

DOWNLOAD data and run DIAMOND

#Download all screened fastq files from RAST using a list of RAST ids**
mkdir -p screen.passed.fastq/fasta
cd screen.passed.fastq
cat ../RAST_mgmIDs | parallel -P2 'curl "http://api.metagenomics.anl.gov/1/download/{}?file=299.1" > {}.299.screen.passed.fastq'

#Change file name to their SRR identifiers
sh ../rename_RastID_to_SRR.sh

# Convert fastq to fasta
cat ../SRR_datasets  | parallel -P2 "sed -n '1~4s/^@/>/p;2~4p' {}.299.screen.passed.fastq > fasta/{}.screen.fa"

## Make DIAMOND database from protein file
cd ../
diamond makedb --in inositol_pathway_proteins.faa -d inositol_pathway_proteins

## Run DIAMOND blastx on each read (fasta) file
mkdir diamond_out
cat SRR_datasets | parallel -P1 'diamond blastx -p 1 -e 0.000000001 --db inositol_pathway_proteins -q screen.passed.fastq/fasta/{}.screen.fa -o diamond_out/{}_diamondx_inositol.out'

## Count number of hits in DIAMOND outputs with a cut-off indentity value of at least 40%:
cd diamond_out
awk -F"\t" '{if($3>=40){print $2"\t"FILENAME"\t1" }}' *_diamondx_inositol.out | sort | uniq -c | awk '{print $2"\t"$3"\t"$1}' | sed 's/_diamondx_inositol.out//g' > ../inositol_SRR_diamond_0.4_counts.tsv

KRAKEN and BRACKEN

Set up kraken and bracken in your own enviroment and edit locations of files in commands below to make it work

kraken2-build --use-ftp --standard --threads 1 --db KRAKEN2_STANDARD
cat ../SRR_datasets | parallel -P1 'kraken2 --threads 1 --db KRAKEN2_STANDARD screen.passed.fastq/fasta/{}.screen.fa --output {}_kraken2.out --report {}_kraken2.report'

BRACKEN

bracken-build -d ./KRAKEN2_STANDARD/ -t 1 -k 35 -l 350
cat ../SRR| parallel -P1 'python est_abundance.py -i {}_kraken2.report -k KRAKEN2_STANDARD/database350mers.kmer_distrib -o {}.report.bracken -t 1'

# Lookup taxonID: 207244 (Genus Anaerostipes) in bracken's est_abundance.py report outputs
grep "207244" *_bracken.report  | awk -F"[_|\t|:]" '{print $1"\t"$(4)"\t"$3}' > Anaerostipes_bracken_abundance.tsv

Generate tables and figures from diamond and bracken output

python3 transform.py inositol_SRR_diamond_0.4_counts.tsv Anaerostipes_bracken_abundance.tsv

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published