-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Artic] Large overhaul for newer versions supporting clair3 #715
Draft
Michal-Babins
wants to merge
19
commits into
main
Choose a base branch
from
mb-artic-version-fix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…utput variable names
…nce genome, and add scheme length for remote schemes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR closes #697
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
Updates to the latest version of
Artic
(1.6.0 at the time of writing) from previous version1.3.0
being used. As ofArtic 1.5.1
medaka is no longer supported in favor of clair3. Theartic minion medaka
is dropped in favor ofartic minion
with other arguments being changed as well. In order to support the new version and multiple clair3 model options, a new docker image has been made: https://github.com/theiagen/theiagen_docker_builds/blob/mb-clair3-models/artic-ncov2019/1.6.0_rerio/Dockerfile. This PR introduces a large overhaul totask_artic_consensus
and only changes the docker image intask_artic_guppyplex
. These changes impactwf_theiacov_ont
andwf_theacov_clearlabs
.⚡ Impacted Workflows/Tasks
Workflows:
wf_theiacov_ont
wf_theiacov_clearlabs
Tasks:
task_artic_consensus
task_artic_guppyplex
This PR may lead to different results in pre-existing outputs: Yes
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
Small changes to
task_artic_guppyplex
:us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3
->us-docker.pkg.dev/general-theiagen/theiagen/artic:1.6.0_rerio
Remove
echo "DIRNAME: $(dirname)"
since variable never set.Large change to
task_artic_consensus
:The ARTIC pipeline underwent a significant overhaul to support new versions, transitioning from Medaka's variant calling system (
r941_min_high_g360
as default model) to the more modern Clair3 (r1041_e82_400bps_hac_v420
as default model). The new artic command replaces complex nested scheme directories (/Vuser
) with direct reference handling through--bed
and--ref
flags. The pipeline also supports remote scheme fetching from primerschemes repo and expands organism coverage to include MPXV alongside SARS-CoV-2 for scenarios where no primer bed and no reference genome are provided. We are also supporting more stringent flags for extracting aligned reads (moving from-F4
to-F0x904
). A new Docker image (us-docker.pkg.dev/general-theiagen/theiagen/artic:1.6.0_rerio) is introduced.Input additions and output changes to
wf_theiacov_ont
:Inputs:
String? clair3_model
added as optional input and passed tocall artic_consensus.consensus
,Outputs:
File? read1_trimmed
deprecatedmedaka_vcf
->clair3_vcf
medaka_reference
->artic_reference
Input additions and output changes to
wf_theiacov_clearlabs
:Inputs:
String? clair3_model
added as optional input and passed tocall artic_consensus.consensus
,Outputs:
File variants_from_ref_vcf = consensus.medaka_pass_vcf
->File variants_from_ref_vcf = consensus.artic_clair3_pass_vcf
String medaka_reference = consensus.medaka_reference
->String artic_reference = consensus.artic_pipeline_reference
⚙️ Algorithm
New ARTIC/Clair3 Pipeline Flow:
minimap2
: ONT reads → reference (-x map-ont
)align_trim
passes:Clair3
variant calling per read group:bcftools consensus
for final sequence generationThe newest version of Artic also changes up how the base command is instantiated, so everything is routed through
artic minion
, where if bed file and reference are provided we run:NOTE:
Newer versions of Artic has changed how bed files are being parsed. There are two instances in which this caused existing primer beds to fail in testing, but worked when updated. HIV primer beds and Midngight primer beds. We can update and host these newer beds, but for now the current locations are here:
Location of updated Midnight primers: gs://fc-secure-50c9efc6-4ca8-4bf5-9752-5bd6a6da17dd/Midnight_Primers_SARS-CoV-2.scheme_updated.bed
Location of updated HIV primers: gs://fc-secure-50c9efc6-4ca8-4bf5-9752-5bd6a6da17dd/HIV-1_v2.0.primer.hyphen1200.1.bed
➡️ Inputs
In
wf_theiacov_ont
andwf_theiacov_clearlabs
:String? clair3_model
has been addedIn
wf_theiacov_clearlabs
:medaka_docker
->artic_docker_image
In
task_artic_consensus
:String medaka_model
->String clair3_model
⬅️ Outputs
In
wf_theiacov_ont
:File? read1_trimmed
deprecatedFile medaka_vcf
->File clair3_vcf
String medaka_reference
->String artic_reference
In
wf_theiacov_clearlabs
:File variants_from_ref_vcf = consensus.medaka_pass_vcf
->File variants_from_ref_vcf = consensus.artic_clair3_pass_vcf
String medaka_reference = consensus.medaka_reference
->String artic_reference = consensus.artic_pipeline_reference
In
task_artic_consensus
:{samplename}.medaka.consensus.fasta
->{samplename}.consensus.fasta
medaka_reference
->artic_pipeline_reference
medaka_pass_vcf
->artic_clair3_pass_vcf
trim_fastq: {samplename}.primertrimmed.rg.fastq
has been removedNote: I tried to keep some of the naming more neutral to artic since clair3 technically is just doing the variant calling. I am completely open to any naming scheme we may want to adhere to.
🧪 Testing
Tests performed against non-hiv organisms in the theiacov_ont validation data
Hiv test was performed separately to use updated primers that work with it here, the primer bed file is currently just uploaded to my sandbox: gs://fc-secure-50c9efc6-4ca8-4bf5-9752-5bd6a6da17dd/HIV-1_v2.0.primer.hyphen1200.1.bed
Similarly with clearlabs, the primers had to be updated in order for the new version of ARTIC to work, test can be found here, and the primer bed is here: gs://fc-secure-50c9efc6-4ca8-4bf5-9752-5bd6a6da17dd/Midnight_Primers_SARS-CoV-2.scheme_updated.bed
Here is another test case with Puerto Rico ONT data that passes when updated primer bed files are used, but fails with the current primer bed.
Puerto Rico uses the V3 Midnight Primers: gs://theiagen-public-files/terra/titan-files/SARS-CoV-2.Midnight-ONT.V3.scheme.bed, the updated ones are currently here: gs://fc-secure-50c9efc6-4ca8-4bf5-9752-5bd6a6da17dd/SARS-CoV-2.Midnight-ONT.V3.scheme_updated.bed
I also tested a scenario where
sars-cov-2
is set as default with no primer bed as input and empty.bed get's selected between the new version and current version to confirm they both fail.On terra, all testing was done with provided bed file and reference picked up by organism parameters, to hit the scheme autodetection I tested locally directly against the task. I am happy to provide the test data if the reviewer wishes to repeat these tests.
For sars-cov-2:
miniwdl run tasks/quality_control/read_filtering/task_artic_guppyplex.wdl read1=barcode01.fastq.gz samplename=test
,For mpox:
miniwdl run tasks/assembly/task_artic_consensus.wdl samplename=test read1=barcode01.fastq.gz organism=MPXV
Suggested Scenarios for Reviewer to Test
Testing on theiacov_ont and theiacov_clearlabs validation sets will be a good confirmation. Please test with any other data that is relevant to this workflow. We will need to update the primer bed files before merging this PR.
🔬 Final Developer Checklist
workflows_overview
tables.🎯 Reviewer Checklist