Preseq failing most of the time #161

ewels · 2020-05-29T10:08:35Z

Anyone running the pipeline will be familiar with this log message:

terminated with an error status (1) -- Error is ignored.

Preseq has a history of failing a lot, especially for small or low complexity files. But it seems to be failing a lot now, maybe all of the time. This needs investigating.

Phil

The text was updated successfully, but these errors were encountered:

bsiranosian · 2021-08-25T23:24:15Z

At the very least, adding an ignore errorStrategy to this process will help your whole run not get killed due to a preseq failure.

/*
 * STEP 9 - preseq
 */
process preseq {
    errorStrategy 'ignore'

ewels · 2021-09-04T05:12:35Z

Yup! The pipeline already has that set as default so you shouldn't need to set that in any additional configs:

methylseq/conf/base.config

Lines 59 to 61 in 03972a6

    
           withName:preseq { 
        
             errorStrategy = 'ignore' 
        
           }

It would be nice to try to get it to fail a little less though 😅

bsiranosian · 2021-09-04T14:56:48Z

Oh good, I didn't notice that. I'm not sure why my whole run failed then.

apeltzer · 2022-11-03T12:58:28Z

Later preseq versions received some updates to fail more gracefully, so if you upgrade the preseq version a bit, you should be fine 👍🏻

ewels · 2022-11-03T23:17:59Z

I think we're already on 3.1.2 which is quite recent. Do you know when those versions went out? I still see the same failures on every test run.

ewels · 2022-11-03T23:20:57Z

It's tempting to update the config to allow the error exit code, so that we don't always get the pipeline report saying that the pipeline completed with errors (which always worries me / others).

ewels · 2022-11-03T23:24:21Z

Using BED files instead of BAM, as suggested in #96 (comment) could also potentially help..

Rohit-Satyam · 2023-01-10T08:03:05Z

I tried the BED file as input but it still fails

gatk MarkDuplicatesSpark -I ${bam} -O ${sid}.dedup.bam -M ${sid}_markdup_metrics.txt --tmp-dir . -OBI
        gatk EstimateLibraryComplexity -I ${bam} -O ${sid}_est_lib_complex_metrics.txt
        # convert to BED file with paired-ends (BEDPE format)
        bamToBed -i ${sid}.dedup.bam -bedpe >  ${sid}.sorted.bed
        preseq lc_extrap -v -P ${sid}.sorted.bed -o ${sid}.lc.preseq.txt
        preseq c_curve  -v -P ${sid}.sorted.bed -o ${sid}.c.preseq.txt

PAIRED_END_BED_INPUT
  TOTAL READS     = 14155
  DISTINCT READS  = 14006
  DISTINCT COUNTS = 5
  MAX COUNT       = 94
  COUNTS OF 1     = 13956
  MAX TERMS       = 2
  OBSERVED COUNTS (95)
  1	13956
  2	46
  3	1
  5	2
  94	1
  
  ERROR:	max count before zero is less than min required count (4) duplicates removed

bounlu · 2023-10-04T16:22:18Z

May I suggest to implement to run in defect mode as suggested by Preseq developer when the number of reads is >50M?

ERROR: too many defects in the approximation, consider running in defect mode

smithlabcode/preseq#29

sateeshperi · 2024-09-17T17:05:15Z

@bounlu does the defect run mode fix this issue ?

bounlu · 2024-09-22T16:09:40Z

Sometimes yes, but not always. It may still fail in defect mode.

sateeshperi · 2024-09-23T15:06:31Z

i see, any recommendations on what can be done or should we mark this as expected and close the issue ?

mahesh-panchal · 2024-10-01T14:00:03Z

Does https://gatk.broadinstitute.org/hc/en-us/articles/360037591931-EstimateLibraryComplexity-Picard accomplish the same thing?

ewels added the bug Something isn't working label May 29, 2020

ewels mentioned this issue May 29, 2020

No internet mode #160

Closed

TomKellyGenetics mentioned this issue Oct 27, 2020

Complexity curve #96

Open

ewels added help wanted Extra attention is needed and removed bug Something isn't working labels Nov 3, 2022

bounlu mentioned this issue Feb 22, 2024

fix annoying PRESEQ failure #382

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preseq failing most of the time #161

Preseq failing most of the time #161

ewels commented May 29, 2020

bsiranosian commented Aug 25, 2021

ewels commented Sep 4, 2021

bsiranosian commented Sep 4, 2021

apeltzer commented Nov 3, 2022

ewels commented Nov 3, 2022

ewels commented Nov 3, 2022

ewels commented Nov 3, 2022

Rohit-Satyam commented Jan 10, 2023 •

edited

Loading

bounlu commented Oct 4, 2023

sateeshperi commented Sep 17, 2024

bounlu commented Sep 22, 2024

sateeshperi commented Sep 23, 2024

mahesh-panchal commented Oct 1, 2024

Preseq failing most of the time #161

Preseq failing most of the time #161

Comments

ewels commented May 29, 2020

bsiranosian commented Aug 25, 2021

ewels commented Sep 4, 2021

bsiranosian commented Sep 4, 2021

apeltzer commented Nov 3, 2022

ewels commented Nov 3, 2022

ewels commented Nov 3, 2022

ewels commented Nov 3, 2022

Rohit-Satyam commented Jan 10, 2023 • edited Loading

bounlu commented Oct 4, 2023

sateeshperi commented Sep 17, 2024

bounlu commented Sep 22, 2024

sateeshperi commented Sep 23, 2024

mahesh-panchal commented Oct 1, 2024

Rohit-Satyam commented Jan 10, 2023 •

edited

Loading