You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into the issue of not generating high quality bins for a known bacteria present in my samples (0.25 by DASTool scoring) when using nf-mag for generating MAGs from a metagenomic co-assembly. This resulted in the known bacteria bins not being included in the final refined DASTool bins using default parameters. I was able to generate a high quality (0.95 by DASTool scoring) bin of the known bacteria present in my samples when I ran MaxBin2, MetaBat2, and DASTool in a separate mNGS pipeline using the same parameters but instead passing reads_list into MaxBin2 instead of the abund_file like in nf-mag. When I looked at the input and output files for the nf-mag processes, I noticed far less depth information being used to generate bins with MaxBin2, only the total depth from METABAT2_JGISUMMARIZE output was being passed as the -abund_file for MaxBin2 instead of sample wise or individual read depth for each contig.
I believe the issue lies here in line 21 of CONVERT_DEPTHS process used in the BINNING subworkflow:
I figured out how to change the process to provide sample wise depths and generate my missing high quality bin. Instead of passing the abund_file that comes from CONVERT_DEPTHS output, the mNGS reads can be directly passed via the -reads or -reads_list flag as I did in my separate mNGS pipeline. Using this approach nf-mag generates the high quality bins for my known pathogen but requires more time and resources to do so. My fix is to use still use the depth information generated by METABAT2_JGISUMMARIZE but use the sample-wise depth information for all contigs and pass as -abund_list which is the solution I offer below.
Command used and terminal output
$ nextflow run nf-core/mag --coassemble_group --binning_map_mode 'group' --refine_bins_dastool --postbinning_input 'refined_bins_only'
#!/bin/bash -euo pipefailgunzip -f MEGAHIT-group-Col-depth.txt.gz
# Determine the number of abundance columnsn_abund=$(awk 'NR==1 {print int((NF-3)/2)}' MEGAHIT-group-Col-depth.txt)
# Generate abundance files for each readsetfor i in $(seq 1 $n_abund); do col=$((i*2+2)) bioawk -t '{if (NR > 1) {print $1, $'"$col"'}}' MEGAHIT-group-Col-depth.txt > group-Col_mb2_depth_$i.txtdone
# Create a list of abundance files with full paths, each on a new linefor file in group-Col_mb2_depth_*.txt; do echo "$PWD/$file" >> abund_list.txtdonecat <<-END_VERSIONS > versions.yml"NFCORE_UNO:UNO:BINNING:CONVERT_DEPTHS_ALL": bioawk: $(bioawk --version | cut -f 3 -d ' ' )END_VERSIONS
Attached files:
The log files for MAXBIN2 using the current CONVERT_DEPTHS output (maxbin2_CONVERT_DEPTHS.log) and my updated CONVERT_DEPTHS_ALL output (maxbin2_CONVERT_DEPTHS_ALL.log). The output from CONVERT_DEPTHS_ALL (abund_list.txt) The updated CONVERT_DEPTHS_ALL.nf script (convert_depths_all_reads.txt).
@uel3 thanks for this! Could you also provide the context you executed the pipeline, you said you were doing co-assembley on slack if I remember correctly?
Description of the bug
I ran into the issue of not generating high quality bins for a known bacteria present in my samples (0.25 by DASTool scoring) when using nf-mag for generating MAGs from a metagenomic co-assembly. This resulted in the known bacteria bins not being included in the final refined DASTool bins using default parameters. I was able to generate a high quality (0.95 by DASTool scoring) bin of the known bacteria present in my samples when I ran MaxBin2, MetaBat2, and DASTool in a separate mNGS pipeline using the same parameters but instead passing reads_list into MaxBin2 instead of the abund_file like in nf-mag. When I looked at the input and output files for the nf-mag processes, I noticed far less depth information being used to generate bins with MaxBin2, only the total depth from METABAT2_JGISUMMARIZE output was being passed as the -abund_file for MaxBin2 instead of sample wise or individual read depth for each contig.
I believe the issue lies here in line 21 of CONVERT_DEPTHS process used in the BINNING subworkflow:
bioawk -t '{ { if (NR > 1) { { print \$1, \$3 } } } }' ${depth.toString() - '.gz'} > ${prefix}_mb2_depth.txt
I figured out how to change the process to provide sample wise depths and generate my missing high quality bin. Instead of passing the abund_file that comes from CONVERT_DEPTHS output, the mNGS reads can be directly passed via the -reads or -reads_list flag as I did in my separate mNGS pipeline. Using this approach nf-mag generates the high quality bins for my known pathogen but requires more time and resources to do so. My fix is to use still use the depth information generated by METABAT2_JGISUMMARIZE but use the sample-wise depth information for all contigs and pass as -abund_list which is the solution I offer below.
Command used and terminal output
$ nextflow run nf-core/mag --coassemble_group --binning_map_mode 'group' --refine_bins_dastool --postbinning_input 'refined_bins_only'
Relevant files
Command.sh from CONVERT_DEPTHS with my data:
The first 10 lines of the files processed by CONVERT_DEPTHS command.sh to show the data transformation:
MEGAHIT-group-Col-depth.txt
contigName contigLen totalAvgDepth MEGAHIT-group-Col-Loopy.bam MEGAHIT-group-Col-Loopy.bam-var MEGAHIT-group-Col-Reinvent.bam MEGAHIT-group-Col-Reinvent.bam-var MEGAHIT-group-Col-Dizzy2.bam MEGAHIT-group-Col-Dizzy2.bam-var MEGAHIT-group-Col-Florid.bam MEGAHIT-group-Col-Florid.bam-var MEGAHIT-group-Col-Usual.bam MEGAHIT-group-Col-Usual.bam-var
k127_1462844 244 0 0 0 0 0 0 0 0 0 0 0
k127_3291397 255 0 0 0 0 0 0 0 0 0 0 0
k127_1097133 238 0 0 0 0 0 0 0 0 0 0 0
k127_2925687 323 6 0 0 0 0 2 0 1 0 3 0
k127_1828555 269 0 0 0 0 0 0 0 0 0 0 0
k127_2559976 451 7.08638 4.09302 1.43798 2.15947 0.621155 0 0 0.833887 0.138981 0 0
k127_1462849 222 0 0 0 0 0 0 0 0 0 0 0
k127_2925689 207 0 0 0 0 0 0 0 0 0 0 0
k127_4022816 444 19.7007 2.28231 0.551426 1.47279 0.25011 2.26531 0.632444 5.80952 3.80658 7.87075 5.80579
group-Col_mb2_depth.txt
k127_1462844 0
k127_3291397 0
k127_1097133 0
k127_2925687 6
k127_1828555 0
k127_2559976 7.08638
k127_1462849 0
k127_2925689 0
k127_4022816 19.7007
My Fix: CONVERT_DEPTHS_ALL with my data:
Attached files:
The log files for MAXBIN2 using the current CONVERT_DEPTHS output (maxbin2_CONVERT_DEPTHS.log) and my updated CONVERT_DEPTHS_ALL output (maxbin2_CONVERT_DEPTHS_ALL.log). The output from CONVERT_DEPTHS_ALL (abund_list.txt) The updated CONVERT_DEPTHS_ALL.nf script (convert_depths_all_reads.txt).
abund_list.txt
convert_depths_all_reads.txt
maxbin2_CONVERT_DEPTHS.log
maxbin2_CONVERT_DEPTHS_ALL.log
System information
nextflow/23.10.0
run on HPC executed locally
The text was updated successfully, but these errors were encountered: