Community feedback needed: (pre)-validation at the Sarek level #798

maxulysse · 2022-10-13T16:42:27Z

Description of feature

Could be good to do whatever can be done when we release to help prepare for validation.
I'm thinking benchmarking, GiAB, checking results...
Could be interesting to talk with multiple collaborators to figure out what can be done and how we can help the community.

FriederikeHanssen · 2022-10-13T17:04:47Z

Tagging @adamrtalbot @c-mertes @ggabernet @asp8200 @jemten as we all are trying to do similar things and could join forces 🚀

c-mertes · 2022-10-13T18:51:20Z

Thank you @FriederikeHanssen for tagging me. Yes, indeed this is a goal of GHGA to automate benchmarking in a kind of CI/CD fashion. We currently do this together with the NGS-Competence-Network. We do have GiaB samples for WES and WGS available and are setting up right now a benchmarking pipeline. This pipeline could be used later on to test SAREK on releases into master also to track the performance over time.

In the case of SAREK we would need to run SAREK with all combinations of alignment tools and variant calling tools to benchmark every combination.

This is actually what we are doing right now to find the best configuration for GHGA. @nickhsmith is doing this.

Happy to join forces here.

nicorap · 2022-10-13T19:17:16Z

So we have at DNGC quite a thorough analysis plan for Germline testing of pipelines. It’s even been implemented in dsl2. I think we can definitely share word and git repo here. For somatic benchmarking, we have little, but some directions at least.

maxulysse · 2022-10-13T19:48:43Z

@apeltzer might you be interested as well?

apeltzer · 2022-10-13T19:49:50Z

Yes, count me in 👍😬

adamrtalbot · 2022-10-14T07:12:39Z

Yep, count me in. Lots of people are trying to solve this problem so might as well share it.

jemten · 2022-10-14T07:27:47Z

Definitely interested! We validate our old germline pipeline by running GIAB samples as well as running a selection of our clinical samples with known pathogenic variants. We can also think about whether we should/could present the validation results in an automated fashion in the readme or release page

adamrtalbot · 2022-10-14T07:33:04Z

What sort of somatic mutations are you looking for?

FriederikeHanssen · 2022-10-14T08:11:30Z

what tool are you all using for the evaluation of the vcfs? hap.py/som.py?

adamrtalbot · 2022-10-14T08:13:04Z

Happy using the RTGeval engine. Use GIAB stratifications for easy or hard regions: https://github.com/genome-in-a-bottle/genome-stratifications

c-mertes · 2022-10-14T08:16:55Z

We also use the hap.py + eval on a GIAB sample with some extra tweaks. Currently, the pipeline is written in snakemake.

FriederikeHanssen · 2022-10-14T08:18:44Z

🙈 forgot @lassefolkersen . He is doing an amazing job benchmarking sarek 3.0 at the moment. (sorry, Lasse not on purpose, but my oversight after a long day yesterday! )

nicorap · 2022-10-14T10:33:58Z

Hehe Good to see that @lassefolkersen is on it. This is work we started at DNGC. But just to confirm, we use Happy on a variety of regions. We also have some tboughs (and code) for benchmarking CNVs. On top of that, we also collect some informations on bams and vcfs. It would be good to have a final module that produces a report , where sarek (or other pipeline ) is compared to dragen, sentieon

adamrtalbot · 2022-10-15T11:03:40Z

Definitely - run comparisons as part of CI/CD and you can track performance over time.

maxulysse · 2022-10-17T09:49:14Z

@drpatelh and @robsyme you should participate in this

FriederikeHanssen · 2022-10-17T10:39:08Z

not sure how easy it would be to compare multiple different workflows
In the long run it would also be great to compare all previous versions and any tool combination. However, I think we then need to have a second workflow that is outside of sarek and triggered on results generation + some DB that would keep all these results around.

As an MVP I started working on a subworkflow now that compares the output that the specific sarek run generates with a truth data set when a benchmarking flag is set. This can then be enabled on the aws full size tests and would also empower anyone to quickly run their own benchmark.

johanneskoester · 2022-10-20T09:07:25Z

As @c-mertes was mentioning, we have an ongoing benchmark project within the NGS-CN SIG4 which is meant to continuously benchmark any kind of variant calling pipelines. Sarek is already included there by @ggabernet.

The workflow is based on a combination of Zenodo (for callsets from pipelines), Snakemake, hap.py and custom code for the reporting. I would be happy to discuss this in detail. You could either join our SIG4 meeting next Monday, or we can make a separate call. Just send me an email in any case.

ggabernet · 2022-10-20T10:38:36Z

Thanks @johanneskoester! Yes exactly, in that benchmark I included Sarek 2.7 with variant callers Strelka2, HaplotypeCaller and Freebayes, as only germline variant calls are benchmarked. The purpose of that benchmark is to compare the VC results also to other variant calling pipelines being used at other Bioinformatics cores across Germany (the NGS competence Network: NGS-CN).

However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases.

Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...)

nicorap · 2022-10-20T10:53:49Z

It looks like there’s lots of thing going on here. Maybe we should have a call all of us soon to align, for example, when we run validation of a pipeline we run no less than 400 wgs through at various coverages etc. This cost resources time etc. We also have a test each month with o e GIAB to track drift in the wet lab setup. That’s maybe an overkill :).

johanneskoester · 2022-10-20T11:09:25Z

Thanks @johanneskoester! Yes exactly, in that benchmark I included Sarek 2.7 with variant callers Strelka2, HaplotypeCaller and Freebayes, as only germline variant calls are benchmarked. The purpose of that benchmark is to compare the VC results also to other variant calling pipelines being used at other Bioinformatics cores across Germany (the NGS competence Network: NGS-CN).

However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases.

Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...)

Absolutely, automating this would be awesome. Do you think you can run the entire sarek pipeline on one of the giab datasets within Github actions CI (i.e. within the 6 hour limit on a single thread), or do you have a different CI for that?

nicorap · 2022-10-20T11:11:37Z

Thanks @johanneskoester! Yes exactly, in that benchmark I included Sarek 2.7 with variant callers Strelka2, HaplotypeCaller and Freebayes, as only germline variant calls are benchmarked. The purpose of that benchmark is to compare the VC results also to other variant calling pipelines being used at other Bioinformatics cores across Germany (the NGS competence Network: NGS-CN).
However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases.
Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...)

Absolutely, automating this would be awesome. Do you think you can run the entire sarek pipeline on one of the giab datasets within Github actions CI (i.e. within the 6 hour limit on a single thread), or do you have a different CI for that?

This is not gonna work. I like the idea though. :)

FriederikeHanssen · 2022-10-20T11:50:33Z

Hi all! We use aws for running full size tests on release. To keep costs somewhat reasonable we can run it there with one Giab sample and for the somatic track with HCC1395 on each release. We are already doing this but are not further processing the output. The idea for this particular benchmarking idea is to check on each release the generated variantcalling results against the ground truth giving us and users reassurance that everything is working as expected. In addition it would give users a quick way to run their own benchmarking datasets through without necessary the need of setting up additional benchmarking pipelines.
So it would very much be self-contained within the pipeline and not so much an overarching pipeline that runs all existing variantcalling pipelines.
My current impression is that we have two benchmarking approaches that could very well live next to each other. How are you making the benchmarking results available to the general public? Can they be easily added to the nf-core website to make them easily findable for users?

Github actions has a time limit of 4hours, 6GB memory and 4 cpus iirc, so while it is very suitable for tests with small data a complete datasets cannot be used here.

A meeting sounds like a good idea. I am on vacation (supposedly :D ) But around again the week after next

ggabernet · 2022-10-20T13:27:03Z

also to clarify, this is done also through a GitHub actions workflow (https://github.com/nf-core/sarek/blob/master/.github/workflows/awsfulltest.yml), but this action just submits the Sarek run to AWS batch via Nextflow Tower.

We've already added the NA12878 GIAB sample to AWS S3, but we don't use the full WGS, but rather a 30x subsample of WXS to reduce costs on AWS. The config for the germline full test can be found here: https://github.com/nf-core/sarek/blob/master/conf/test_full_germline.config

lassefolkersen · 2022-10-24T08:26:50Z

@ggabernet how do you handle limiting the cost of the GitHub actions part when you run full genomes that way? Doesn't it become un-manageable when a full genome (=30x I mean) is run on every commit? Or you just limit the number of commits?

adamrtalbot · 2022-10-24T08:28:40Z

We wire in our full sized test to our release pipeline. So we hit the 'release' button, it runs the full sized test and only if it passes will it push a draft release to Github. Added bonus, you could put the results in the release notes, I haven't done this but it's a good idea! We don't use Github Actions so not much help right now.

lassefolkersen · 2022-10-24T08:31:56Z

@adamrtalbot so it's configured to only run (that) GitHub action when you do actual releases? That sounds very smart 👍 Have you any code-links to look at for that?

adamrtalbot · 2022-10-24T08:41:18Z

It's wired the other way. In our CI/CD platform there is the concept of CI/CD tests and a release pipeline. You hit the release button, it runs the minimal tests to confirm the code works, if these pass it runs the bigger tests to check the pipeline is valid. If these pass it runs nf-core bump and creates a draft release.

The point is to only run the big tests at snapshots rather than every single commit.

ggabernet · 2022-10-24T11:04:32Z

It's part of the GitHub action yml, you can specify to trigger any GitHub action only on certain actions, e.g. releases. This one also has a trigger button, so that you can test it once before releasing (but it shouldn't be abused):

sarek/.github/workflows/awsfulltest.yml

Line 7 in bcd7bf9

release:

FriederikeHanssen · 2022-11-03T15:21:21Z

We wire in our full sized test to our release pipeline. So we hit the 'release' button, it runs the full sized test and only if it passes will it push a draft release to Github. Added bonus, you could put the results in the release notes, I haven't done this but it's a good idea! We don't use Github Actions so not much help right now.

uh that is cool.

FriederikeHanssen · 2022-11-03T15:22:30Z

ndle limiting the cost of the GitHub actions part when you run full genomes that way? Doesn't it become un-manageable when a full genome (=30x I mean) is run on every commit? Or you just limit the number of commits?

We currently only run on release and not on every commit (that would be way to expensive and I also don't think we would gain any insights from that)

FriederikeHanssen · 2022-11-03T15:23:20Z

Thank you all for the input. 🥳 I will sent around a doodle next week and we can meet up to discuss more details.

lassefolkersen · 2022-11-08T16:06:09Z

So we talked a little more about this today @FriederikeHanssen @asp8200 @ggabernet - and it seems that most feasible solution right now, with resources as they are, is to jump on the awsfulltest files, here. They are (smartly) from GiAB, and there already is a hap.py-module. So then the only other things needed is the GiAB gold standard file (HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz in example command below), which is on their FTP-server, and a bed file to specify only checking in high confidence regions. And that's it.

Then, we are thinking to have that implemented as a very short subworkflow, based on something like below and to be called only on release, and just saving the hap.py output txt-file. @ggabernet maybe you fill in the auto-trigger things I missed here? 😄 And of course - open discussion welcome, this is why I post it here; in case anyone can see an even smarter solution (I've no doubt people can see more extensive solutions, but small steps first 😃 )


docker run -it -v \
/home/ubuntu:/data pkrusche/hap.py /opt/hap.py/bin/hap.py \
/data/data2/giab_HG002_benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
/data/data2/giab_HG002_benchmark/HG002_deep_one_file_a.deepvariant.vcf.gz \
-o /data/data2/giab_HG002_benchmark/HHG002_deep_one_file_a.deepvariant \
-r /data/reference/Homo_sapiens_assembly38.fasta \
-f /data/data2/giab_HG002_benchmark/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed \
--pass-only

adamrtalbot · 2022-11-08T16:21:32Z

Sounds like a good plan!

Here are the GIAB stratifications: https://github.com/genome-in-a-bottle/genome-stratifications and how to use them: https://github.com/Illumina/hap.py/blob/master/doc/happy.md#stratification-via-bed-regions

and here's how to use RTG-eval as the engine: https://github.com/Illumina/hap.py/blob/master/doc/happy.md#using-rtg-tools-vcfeval-as-the-comparison-engine

This will give you more data than you know what to do with!

maxulysse added the enhancement New feature or request label Oct 13, 2022

jemten mentioned this issue Nov 9, 2022

Validation with rtgtools/hap.py nf-core/raredisease#250

Closed

maxulysse added this to the 3.3 milestone Feb 21, 2023

maxulysse changed the title ~~Have (pre) validation at a Sarek level~~ Community feedback needed: (pre)-validation at a Sarek level Mar 4, 2023

maxulysse changed the title ~~Community feedback needed: (pre)-validation at a Sarek level~~ Community feedback needed: (pre)-validation at the Sarek level Mar 4, 2023

maxulysse added the community-feedback label Mar 4, 2023

adamrtalbot mentioned this issue Mar 31, 2023

Add Illumina/Hap.py module for germline samples with truth set. #987

Closed

maxulysse modified the milestones: 3.3, 3.4, 3.5 Feb 8, 2024

maxulysse pinned this issue Feb 15, 2024

FriederikeHanssen removed this from the 3.5 milestone Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Community feedback needed: (pre)-validation at the Sarek level #798

Community feedback needed: (pre)-validation at the Sarek level #798

maxulysse commented Oct 13, 2022

FriederikeHanssen commented Oct 13, 2022

c-mertes commented Oct 13, 2022

nicorap commented Oct 13, 2022

maxulysse commented Oct 13, 2022

apeltzer commented Oct 13, 2022

adamrtalbot commented Oct 14, 2022

jemten commented Oct 14, 2022

adamrtalbot commented Oct 14, 2022

FriederikeHanssen commented Oct 14, 2022

adamrtalbot commented Oct 14, 2022

c-mertes commented Oct 14, 2022

FriederikeHanssen commented Oct 14, 2022 •

edited

Loading

nicorap commented Oct 14, 2022 •

edited

Loading

adamrtalbot commented Oct 15, 2022

maxulysse commented Oct 17, 2022

FriederikeHanssen commented Oct 17, 2022 •

edited

Loading

johanneskoester commented Oct 20, 2022

ggabernet commented Oct 20, 2022

nicorap commented Oct 20, 2022

johanneskoester commented Oct 20, 2022 •

edited

Loading

nicorap commented Oct 20, 2022

FriederikeHanssen commented Oct 20, 2022

ggabernet commented Oct 20, 2022

lassefolkersen commented Oct 24, 2022

adamrtalbot commented Oct 24, 2022 •

edited

Loading

lassefolkersen commented Oct 24, 2022

adamrtalbot commented Oct 24, 2022

ggabernet commented Oct 24, 2022

FriederikeHanssen commented Nov 3, 2022

FriederikeHanssen commented Nov 3, 2022

FriederikeHanssen commented Nov 3, 2022

lassefolkersen commented Nov 8, 2022

adamrtalbot commented Nov 8, 2022

Community feedback needed: (pre)-validation at the Sarek level #798

Community feedback needed: (pre)-validation at the Sarek level #798

Comments

maxulysse commented Oct 13, 2022

Description of feature

FriederikeHanssen commented Oct 13, 2022

c-mertes commented Oct 13, 2022

nicorap commented Oct 13, 2022

maxulysse commented Oct 13, 2022

apeltzer commented Oct 13, 2022

adamrtalbot commented Oct 14, 2022

jemten commented Oct 14, 2022

adamrtalbot commented Oct 14, 2022

FriederikeHanssen commented Oct 14, 2022

adamrtalbot commented Oct 14, 2022

c-mertes commented Oct 14, 2022

FriederikeHanssen commented Oct 14, 2022 • edited Loading

nicorap commented Oct 14, 2022 • edited Loading

adamrtalbot commented Oct 15, 2022

maxulysse commented Oct 17, 2022

FriederikeHanssen commented Oct 17, 2022 • edited Loading

johanneskoester commented Oct 20, 2022

ggabernet commented Oct 20, 2022

nicorap commented Oct 20, 2022

johanneskoester commented Oct 20, 2022 • edited Loading

nicorap commented Oct 20, 2022

FriederikeHanssen commented Oct 20, 2022

ggabernet commented Oct 20, 2022

lassefolkersen commented Oct 24, 2022

adamrtalbot commented Oct 24, 2022 • edited Loading

lassefolkersen commented Oct 24, 2022

adamrtalbot commented Oct 24, 2022

ggabernet commented Oct 24, 2022

FriederikeHanssen commented Nov 3, 2022

FriederikeHanssen commented Nov 3, 2022

FriederikeHanssen commented Nov 3, 2022

lassefolkersen commented Nov 8, 2022

adamrtalbot commented Nov 8, 2022

FriederikeHanssen commented Oct 14, 2022 •

edited

Loading

nicorap commented Oct 14, 2022 •

edited

Loading

FriederikeHanssen commented Oct 17, 2022 •

edited

Loading

johanneskoester commented Oct 20, 2022 •

edited

Loading

adamrtalbot commented Oct 24, 2022 •

edited

Loading