You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The most time-consuming part of the workflow is the calling variant using the HaplotypeCaller. Therefore we focused on the HaplotypeCaller step.
Here are a few ways to try:
HaplotypeCallerSpark:
HaplotypeCallerSpark is a tool designed by gatk to replace the threading functionality in gatk3. However, it is still in BETA stage, and
many attempts to use the bee data caused problems. After a discussion with the gatk team, it is confirmed that it is a problem with
HaplotypeCallerSpark itself.
break down Reference into smaller chunks for HaplotypeCaller:
scattered intervals based on N masked regions of the reference genome and collecting each intervals calls at the end using
GatherVcfs tool.
Optimize JAVA setting:
Trying to adjust the parameters related to garbage collection:
-XX:ParallelGCThreads
Heap Space -Xmx
There is not much difference in the results.
CPU utilization:
Using the --native-pair-hmm-threads option in HaplotypeCaller there is not much difference in the results.
3 and 4 can refer to this website
To deal with the low integrity problem when breaking down the Reference genome into smaller chunks for HaplotypeCaller, the GATK team replied as follows:
Depending on how you scatter your intervals it should still hold true. Worst case scenario you may have to run HaplotypeCaller per contig/chromosome which is probably the safest way but if your reference is split by long repeats of N then you may want to split your intervals based on the positions of N repeats.
The most time-consuming part of the workflow is the calling variant using the HaplotypeCaller. Therefore we focused on the HaplotypeCaller step.
Here are a few ways to try:
HaplotypeCallerSpark:
HaplotypeCallerSpark is a tool designed by gatk to replace the threading functionality in gatk3. However, it is still in BETA stage, and
many attempts to use the bee data caused problems. After a discussion with the gatk team, it is confirmed that it is a problem with
HaplotypeCallerSpark itself.
The discussion link
break down Reference into smaller chunks for HaplotypeCaller:
scattered intervals based on N masked regions of the reference genome and collecting each intervals calls at the end using
GatherVcfs tool.
Optimize JAVA setting:
Trying to adjust the parameters related to garbage collection:
There is not much difference in the results.
CPU utilization:
Using the --native-pair-hmm-threads option in HaplotypeCaller there is not much difference in the results.
3 and 4 can refer to this website
Try different variant calling tools:
The text was updated successfully, but these errors were encountered: