-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Community feedback needed: (pre)-validation at the Sarek level #798
Comments
Tagging @adamrtalbot @c-mertes @ggabernet @asp8200 @jemten as we all are trying to do similar things and could join forces 🚀 |
Thank you @FriederikeHanssen for tagging me. Yes, indeed this is a goal of GHGA to automate benchmarking in a kind of CI/CD fashion. We currently do this together with the NGS-Competence-Network. We do have GiaB samples for WES and WGS available and are setting up right now a benchmarking pipeline. This pipeline could be used later on to test SAREK on releases into master also to track the performance over time. In the case of SAREK we would need to run SAREK with all combinations of alignment tools and variant calling tools to benchmark every combination. This is actually what we are doing right now to find the best configuration for GHGA. @nickhsmith is doing this. Happy to join forces here. |
So we have at DNGC quite a thorough analysis plan for Germline testing of pipelines. It’s even been implemented in dsl2. I think we can definitely share word and git repo here. For somatic benchmarking, we have little, but some directions at least. |
@apeltzer might you be interested as well? |
Yes, count me in 👍😬 |
Yep, count me in. Lots of people are trying to solve this problem so might as well share it. |
Definitely interested! We validate our old germline pipeline by running GIAB samples as well as running a selection of our clinical samples with known pathogenic variants. We can also think about whether we should/could present the validation results in an automated fashion in the readme or release page |
What sort of somatic mutations are you looking for? |
what tool are you all using for the evaluation of the vcfs? hap.py/som.py? |
Happy using the RTGeval engine. Use GIAB stratifications for easy or hard regions: https://github.com/genome-in-a-bottle/genome-stratifications |
We also use the hap.py + eval on a GIAB sample with some extra tweaks. Currently, the pipeline is written in snakemake. |
🙈 forgot @lassefolkersen . He is doing an amazing job benchmarking sarek 3.0 at the moment. (sorry, Lasse not on purpose, but my oversight after a long day yesterday! ) |
Hehe Good to see that @lassefolkersen is on it. This is work we started at DNGC. But just to confirm, we use Happy on a variety of regions. We also have some tboughs (and code) for benchmarking CNVs. On top of that, we also collect some informations on bams and vcfs. It would be good to have a final module that produces a report , where sarek (or other pipeline ) is compared to dragen, sentieon |
Definitely - run comparisons as part of CI/CD and you can track performance over time. |
not sure how easy it would be to compare multiple different workflows As an MVP I started working on a subworkflow now that compares the output that the specific sarek run generates with a truth data set when a benchmarking flag is set. This can then be enabled on the aws full size tests and would also empower anyone to quickly run their own benchmark. |
As @c-mertes was mentioning, we have an ongoing benchmark project within the NGS-CN SIG4 which is meant to continuously benchmark any kind of variant calling pipelines. Sarek is already included there by @ggabernet. The workflow is based on a combination of Zenodo (for callsets from pipelines), Snakemake, hap.py and custom code for the reporting. I would be happy to discuss this in detail. You could either join our SIG4 meeting next Monday, or we can make a separate call. Just send me an email in any case. |
Thanks @johanneskoester! Yes exactly, in that benchmark I included However, I see that this is not incompatible with adding a benchmarking step within Sarek, that would run hap.py to benchmark the variant callers (and all the other parameters of Sarek) at Sarek-level only, and that is run automatically on each pipeline release. That would allow us to automatically check as well that the benchmarking results stay consistent across releases. Additionally, we could submit the results also automatically for every Sarek release to the NGS-CN pipeline benchmark. So far this process still requires some manual intervention: i.e. uploading the results to Zenodo and creating a PR with the details. Maybe we could also work on automating that? Would that be within the scope of that benchmark @johanneskoester ? (I'll be at the meeting next Monday so we can also discuss this there...) |
It looks like there’s lots of thing going on here. Maybe we should have a call all of us soon to align, for example, when we run validation of a pipeline we run no less than 400 wgs through at various coverages etc. This cost resources time etc. We also have a test each month with o e GIAB to track drift in the wet lab setup. That’s maybe an overkill :). |
Absolutely, automating this would be awesome. Do you think you can run the entire sarek pipeline on one of the giab datasets within Github actions CI (i.e. within the 6 hour limit on a single thread), or do you have a different CI for that? |
This is not gonna work. I like the idea though. :) |
Hi all! We use aws for running full size tests on release. To keep costs somewhat reasonable we can run it there with one Giab sample and for the somatic track with HCC1395 on each release. We are already doing this but are not further processing the output. The idea for this particular benchmarking idea is to check on each release the generated variantcalling results against the ground truth giving us and users reassurance that everything is working as expected. In addition it would give users a quick way to run their own benchmarking datasets through without necessary the need of setting up additional benchmarking pipelines. Github actions has a time limit of 4hours, 6GB memory and 4 cpus iirc, so while it is very suitable for tests with small data a complete datasets cannot be used here. A meeting sounds like a good idea. I am on vacation (supposedly :D ) But around again the week after next |
also to clarify, this is done also through a GitHub actions workflow (https://github.com/nf-core/sarek/blob/master/.github/workflows/awsfulltest.yml), but this action just submits the Sarek run to AWS batch via Nextflow Tower. We've already added the NA12878 GIAB sample to AWS S3, but we don't use the full WGS, but rather a 30x subsample of WXS to reduce costs on AWS. The config for the germline full test can be found here: https://github.com/nf-core/sarek/blob/master/conf/test_full_germline.config |
@ggabernet how do you handle limiting the cost of the GitHub actions part when you run full genomes that way? Doesn't it become un-manageable when a full genome (=30x I mean) is run on every commit? Or you just limit the number of commits? |
We wire in our full sized test to our release pipeline. So we hit the 'release' button, it runs the full sized test and only if it passes will it push a draft release to Github. Added bonus, you could put the results in the release notes, I haven't done this but it's a good idea! We don't use Github Actions so not much help right now. |
@adamrtalbot so it's configured to only run (that) GitHub action when you do actual releases? That sounds very smart 👍 Have you any code-links to look at for that? |
It's wired the other way. In our CI/CD platform there is the concept of CI/CD tests and a release pipeline. You hit the release button, it runs the minimal tests to confirm the code works, if these pass it runs the bigger tests to check the pipeline is valid. If these pass it runs The point is to only run the big tests at snapshots rather than every single commit. |
It's part of the GitHub action yml, you can specify to trigger any GitHub action only on certain actions, e.g. releases. This one also has a trigger button, so that you can test it once before releasing (but it shouldn't be abused):
|
uh that is cool. |
We currently only run on release and not on every commit (that would be way to expensive and I also don't think we would gain any insights from that) |
Thank you all for the input. 🥳 I will sent around a doodle next week and we can meet up to discuss more details. |
So we talked a little more about this today @FriederikeHanssen @asp8200 @ggabernet - and it seems that most feasible solution right now, with resources as they are, is to jump on the awsfulltest files, here. They are (smartly) from GiAB, and there already is a hap.py-module. So then the only other things needed is the GiAB gold standard file ( Then, we are thinking to have that implemented as a very short subworkflow, based on something like below and to be called only on release, and just saving the hap.py output txt-file. @ggabernet maybe you fill in the auto-trigger things I missed here? 😄 And of course - open discussion welcome, this is why I post it here; in case anyone can see an even smarter solution (I've no doubt people can see more extensive solutions, but small steps first 😃 )
|
Sounds like a good plan! Here are the GIAB stratifications: https://github.com/genome-in-a-bottle/genome-stratifications and how to use them: https://github.com/Illumina/hap.py/blob/master/doc/happy.md#stratification-via-bed-regions and here's how to use RTG-eval as the engine: https://github.com/Illumina/hap.py/blob/master/doc/happy.md#using-rtg-tools-vcfeval-as-the-comparison-engine This will give you more data than you know what to do with! |
Description of feature
Could be good to do whatever can be done when we release to help prepare for validation.
I'm thinking benchmarking, GiAB, checking results...
Could be interesting to talk with multiple collaborators to figure out what can be done and how we can help the community.
The text was updated successfully, but these errors were encountered: