Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow to merge VCFs #6

Merged
merged 34 commits into from
Nov 20, 2024
Merged

Add workflow to merge VCFs #6

merged 34 commits into from
Nov 20, 2024

Conversation

amstilp
Copy link
Contributor

@amstilp amstilp commented Nov 20, 2024

This workflow calls "bcftools merge" and creates a merged VCF from those input files. It can run in parallel for multiple sets of VCFs to combine.

Right now it just merges text files with cat until we get the structure
of the workflow set.
Note that this doesn't let us specify an outfile name, which is not
ideal. It looks like you can't scatter over a map in WDL 1.0, which
is what we're required to use.
The workflow may or may not be working - the test data I'm using
has flipped alleles because they were converted from plink, which
uses major/minor instead of ref/alt. So we'll need to flip some
alleles to match a reference using a fasta file.
Getting an error that bcftools is failing to create the index.
I think they are needed but bcftools is failing here for some reason...
As expected, bcftools merge did fail with an error because no
index was found.
By default, bcftools index creates the index file in the same place
as the vcf (I think). It looks like we don't have permission to write
to the mounted directory, so we'll have to create it in the current
working directory and then later specify that it is the index for
a given vcf. (We'll get to that later - for now just trying to create
the vcf.)
I had misread 1.9 for 1.19 and thought I was using a newer version
of bcftools. Switch the docker image to bcftools 1.15, and use the
--no-index option when calling bcftools merge. (Switching the docker
image may let us use an index file in a different directory, too.)
It's not being created right now
File type should be determined automatically from the suffix.
I think ${output_prefix} evaulated to zero, so it was creating a
hidden file with the name .vcf.gz. Fix the syntax so WDL expands the
variable (instead of bash).
I think DRS may be changing how "basename" works here, so try
setting the index file name using an input instead of creating it
from the vcf file name. Not sure.
Just create them no matter what.
bcftools merge has other options, and I don't want to code them
all as booleans. This way, users can specify any of the options they
want without having to make them arguments to the workflow.
Otherwise, it shows up in the Terra interface even if it's not an
input to the workflow itself, just the task.
@amstilp amstilp merged commit 95ef9e5 into main Nov 20, 2024
9 checks passed
@amstilp amstilp deleted the merge-vcf-workflow branch November 20, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant