convert_depths for co-assembly taking only average depth and not sample wise read depths for maxbin2 binning with solution #663

uel3 · 2024-09-05T19:57:58Z

Description of the bug

I ran into the issue of not generating high quality bins for a known bacteria present in my samples (0.25 by DASTool scoring) when using nf-mag for generating MAGs from a metagenomic co-assembly. This resulted in the known bacteria bins not being included in the final refined DASTool bins using default parameters. I was able to generate a high quality (0.95 by DASTool scoring) bin of the known bacteria present in my samples when I ran MaxBin2, MetaBat2, and DASTool in a separate mNGS pipeline using the same parameters but instead passing reads_list into MaxBin2 instead of the abund_file like in nf-mag. When I looked at the input and output files for the nf-mag processes, I noticed far less depth information being used to generate bins with MaxBin2, only the total depth from METABAT2_JGISUMMARIZE output was being passed as the -abund_file for MaxBin2 instead of sample wise or individual read depth for each contig.

I believe the issue lies here in line 21 of CONVERT_DEPTHS process used in the BINNING subworkflow:

bioawk -t '{ { if (NR > 1) { { print \$1, \$3 } } } }' ${depth.toString() - '.gz'} > ${prefix}_mb2_depth.txt

I figured out how to change the process to provide sample wise depths and generate my missing high quality bin. Instead of passing the abund_file that comes from CONVERT_DEPTHS output, the mNGS reads can be directly passed via the -reads or -reads_list flag as I did in my separate mNGS pipeline. Using this approach nf-mag generates the high quality bins for my known pathogen but requires more time and resources to do so. My fix is to use still use the depth information generated by METABAT2_JGISUMMARIZE but use the sample-wise depth information for all contigs and pass as -abund_list which is the solution I offer below.

Command used and terminal output

$ nextflow run nf-core/mag --coassemble_group --binning_map_mode 'group' --refine_bins_dastool --postbinning_input 'refined_bins_only'

Relevant files

Command.sh from CONVERT_DEPTHS with my data:

#!/bin/bash -euo pipefail
gunzip -f MEGAHIT-group-Col-depth.txt.gz
bioawk -t '{ { if (NR > 1) { { print $1, $3 } } } }' MEGAHIT-group-Col-depth.txt > group-Col_mb2_depth.txt
 
cat <<-END_VERSIONS > versions.yml
"NFCORE_UNO:UNO:BINNING:CONVERT_DEPTHS":
    bioawk: $(bioawk --version | cut -f 3 -d ' ' )
END_VERSIONS

The first 10 lines of the files processed by CONVERT_DEPTHS command.sh to show the data transformation:
MEGAHIT-group-Col-depth.txt
contigName contigLen totalAvgDepth MEGAHIT-group-Col-Loopy.bam MEGAHIT-group-Col-Loopy.bam-var MEGAHIT-group-Col-Reinvent.bam MEGAHIT-group-Col-Reinvent.bam-var MEGAHIT-group-Col-Dizzy2.bam MEGAHIT-group-Col-Dizzy2.bam-var MEGAHIT-group-Col-Florid.bam MEGAHIT-group-Col-Florid.bam-var MEGAHIT-group-Col-Usual.bam MEGAHIT-group-Col-Usual.bam-var
k127_1462844 244 0 0 0 0 0 0 0 0 0 0 0
k127_3291397 255 0 0 0 0 0 0 0 0 0 0 0
k127_1097133 238 0 0 0 0 0 0 0 0 0 0 0
k127_2925687 323 6 0 0 0 0 2 0 1 0 3 0
k127_1828555 269 0 0 0 0 0 0 0 0 0 0 0
k127_2559976 451 7.08638 4.09302 1.43798 2.15947 0.621155 0 0 0.833887 0.138981 0 0
k127_1462849 222 0 0 0 0 0 0 0 0 0 0 0
k127_2925689 207 0 0 0 0 0 0 0 0 0 0 0
k127_4022816 444 19.7007 2.28231 0.551426 1.47279 0.25011 2.26531 0.632444 5.80952 3.80658 7.87075 5.80579

group-Col_mb2_depth.txt
k127_1462844 0
k127_3291397 0
k127_1097133 0
k127_2925687 6
k127_1828555 0
k127_2559976 7.08638
k127_1462849 0
k127_2925689 0
k127_4022816 19.7007

My Fix: CONVERT_DEPTHS_ALL with my data:

#!/bin/bash -euo pipefail
gunzip -f MEGAHIT-group-Col-depth.txt.gz

# Determine the number of abundance columns
n_abund=$(awk 'NR==1 {print int((NF-3)/2)}' MEGAHIT-group-Col-depth.txt)

# Generate abundance files for each read set
for i in $(seq 1 $n_abund); do
    col=$((i*2+2))
    bioawk -t '{if (NR > 1) {print $1, $'"$col"'}}' MEGAHIT-group-Col-depth.txt > group-Col_mb2_depth_$i.txt
done

# Create a list of abundance files with full paths, each on a new line
for file in group-Col_mb2_depth_*.txt; do
    echo "$PWD/$file" >> abund_list.txt
done
cat <<-END_VERSIONS > versions.yml
"NFCORE_UNO:UNO:BINNING:CONVERT_DEPTHS_ALL":
    bioawk: $(bioawk --version | cut -f 3 -d ' ' )
END_VERSIONS

Attached files:
The log files for MAXBIN2 using the current CONVERT_DEPTHS output (maxbin2_CONVERT_DEPTHS.log) and my updated CONVERT_DEPTHS_ALL output (maxbin2_CONVERT_DEPTHS_ALL.log). The output from CONVERT_DEPTHS_ALL (abund_list.txt) The updated CONVERT_DEPTHS_ALL.nf script (convert_depths_all_reads.txt).

abund_list.txt
convert_depths_all_reads.txt
maxbin2_CONVERT_DEPTHS.log
maxbin2_CONVERT_DEPTHS_ALL.log

System information

nextflow/23.10.0
run on HPC executed locally

jfy133 · 2024-09-06T15:45:07Z

@uel3 thanks for this! Could you also provide the context you executed the pipeline, you said you were doing co-assembley on slack if I remember correctly?

jfy133 · 2024-09-06T15:45:13Z

The slack thread: https://nfcore.slack.com/archives/CE9MS66BS/p1724939276414649

uel3 added the bug Something isn't working label Sep 5, 2024

d4straub mentioned this issue Oct 11, 2024

Fix MaxBin2 abundance input #690

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert_depths for co-assembly taking only average depth and not sample wise read depths for maxbin2 binning with solution #663

convert_depths for co-assembly taking only average depth and not sample wise read depths for maxbin2 binning with solution #663

uel3 commented Sep 5, 2024 •

edited

Loading

jfy133 commented Sep 6, 2024

jfy133 commented Sep 6, 2024

convert_depths for co-assembly taking only average depth and not sample wise read depths for maxbin2 binning with solution #663

convert_depths for co-assembly taking only average depth and not sample wise read depths for maxbin2 binning with solution #663

Comments

uel3 commented Sep 5, 2024 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

jfy133 commented Sep 6, 2024

jfy133 commented Sep 6, 2024

uel3 commented Sep 5, 2024 •

edited

Loading