complaining about dependencies / empty output files #29

alisamatisse · 2024-05-10T08:21:17Z

Hi,
it would be great if this package could be compatible with Guix, as this can help to make the work more reproducible.. are you planning this? because we use Guix, it would take too much effort to try to contain the package and its dependencies in the conda environment. Maybe I am missing something :(

wshuai294 · 2024-05-10T14:01:19Z

Hello, thanks for considering our method. We have provided the conda environment file which can be used to construct the env. We assume it is suitable for Linux systems. Is it not applicable to Guix? What error did you meet in running it?

alisamatisse · 2024-05-12T11:01:27Z

It is not compatible with the how our cluster paths are organized, but I guess there is not much that can be done from your side. Could it be executed independently from the conda environment? Really would like to try using your tool. Maybe will run it locally..

wshuai294 · 2024-05-13T02:28:32Z

Hello. I think it could be possible to use it without the conda environment. However, you should install the dependencies yourself. What functions do you want to use? If you only want to handle long-read data, there will be no need to install it. You can directly use the Python script. If you handle Illumina reads, it will be more complex. But I think it is also possible.

alisamatisse · 2024-05-14T06:43:08Z

Hello Shuai, that it is very promising, thank you. I am now trying to make it work...

alisamatisse · 2024-05-14T12:56:47Z

Terribly sorry for all the stupid questions, I tried to run the demo files

python3 script/long_read_typing.py -j 5 -n HG00733 -r HG00733_pacbio.subsample.fastq.gz -o output

but have this

[M::mm_idx_gen::0.568*0.84] collected minimizers
[M::mm_idx_gen::0.710*1.06] sorted minimizers
[M::main::0.713*1.06] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.714*1.06] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.715*1.06] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::462.895*2.00] mapped 851 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db/ref/hla_gen.format.filter.extend.DRB.no26789.fasta HG00733_pacbio.subsample.fastq.gz
[M::main] Real time: 462.901 sec; CPU: 923.485 sec; Peak RSS: 0.938 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.015*0.80] collected minimizers
[M::mm_idx_gen::0.017*0.88] sorted minimizers
[M::main::0.017*0.88] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.018*0.88] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.018*0.88] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::worker_pipeline::0.034*1.15] mapped 6 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa output/HG00733//A.long_read.fq.gz
[M::main] Real time: 0.036 sec; CPU: 0.040 sec; Peak RSS: 0.011 GB
  File "/fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/mask_low_depth_region.py", line 113
    print (gene, mask[0]-1, mask[1]-1, file = f)
                                            ^
SyntaxError: invalid syntax
cp: cannot stat ‘output/HG00733//low_depth.bed’: No such file or directory

Sorry again. I will understand if you decide not to reply, haha.

wshuai294 · 2024-05-14T14:16:05Z

Hello, cause it uses python instead of python3 in the 215-th line. I have edited this. Please update to the latest commit. I believe it will work now.

wshuai294 · 2024-06-14T14:49:19Z

Hi, It seems to work well except for the annotation step. Is the vcf files empty? Is there a sequence in hla.allele.*.HLA_*.fasta? If both not, I guess the reason is the database. Have you run the bash index.sh? You can comment the software installation in index.sh, and run it to construct the database only. Or you can run it on a local PC, and upload the database to the server. Please tell me if you have further issue. Best, Shuai

…

On Fri, Jun 14, 2024 at 9:24 PM Alisa ***@***.***> wrote: Hi Shuai, I keep trying to make the tool work! Sorry for the dumb questions again. I moved now to Ubuntu virtual machine where I can install all the dependencies myself. I thought maybe I am missing something.. but not so obvious what... I have empty output files in the end. The code output: `(base) ***@***.***:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n D NA01 -o /home/aiakupo/SpecHLA/output [M::mm_idx_gen::0.466 *1.00] collected minimizers [M::mm_idx_gen::0.516*1.38] sorted minimizers [M::main::0.516 *1.38] loaded/built the index for 1591 target sequence(s) [M::mm_mapopt_update::0.519*1.38] mid_occ = 936 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591 [M::mm_idx_stat::0.520 *1.38] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978 [M::worker_pipeline::2764.517*4.99] mapped 87302 sequences [M::worker_pipeline::5416.019 *5.00] mapped 90496 sequences [M::worker_pipeline::6441.330*4.98] mapped 34247 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fa stx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz [M::main] Real time: 6441.414 sec; CPU: 32064.015 sec; Peak RSS: 4.391 GB alignment done. reads-binning done. [M::mm_idx_gen::0.002 *1.63] collected minimizers [M::mm_idx_gen::0.003*2.80] sorted minimizers [M::main::0.003 *2.79] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*2.70] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004*2.66] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for A Mean depth {'HLA_A': 0} 2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 *1.78] collected minimizers [M::mm_idx_gen::0.003*3.24] sorted minimizers [M::main::0.003 *3.22] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*3.11] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*3.05] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.011 sec; Peak RSS: 0.003 GB downsample ratio is 1 for B Mean depth {'HLA_B': 0} 2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 *1.79] collected minimizers [M::mm_idx_gen::0.003*3.12] sorted minimizers [M::main::0.003 *3.10] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*3.00] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.94] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for C Mean depth {'HLA_C': 0} 2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 *1.62] collected minimizers [M::mm_idx_gen::0.003*2.60] sorted minimizers [M::main::0.003 *2.58] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.49] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 *2.44] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775 [M::worker_pipeline::0.267*2.61] mapped 231 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz [M::main] Real time: 0.268 sec; CPU: 0.699 sec; Peak RSS: 0.071 GB downsample ratio is 1 for DPA1 Mean depth {'HLA_DPA1': 0} 2024-06-14 13:19:49 Min read coverage set to 2. 2024-06-14 13:19:49 Max read coverage set to 100000. 2024-06-14 13:19:49 Estimating alignment parameters... 2024-06-14 13:19:49 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-06-14 13:19:49 Calling potential SNVs using pileup... 2024-06-14 13:19:49 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 *1.63] collected minimizers [M::mm_idx_gen::0.003*2.96] sorted minimizers [M::main::0.003 *2.94] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*2.83] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004 *2.76] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468 [M::worker_pipeline::7.140*2.84] mapped 5835 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz [M::main] Real time: 7.141 sec; CPU: 20.293 sec; Peak RSS: 0.147 GB downsample ratio is 1 for DPB1 Mean depth {'HLA_DPB1': 182} 2024-06-14 13:20:00 Min read coverage set to 2. 2024-06-14 13:20:00 Max read coverage set to 100000. 2024-06-14 13:20:00 Estimating alignment parameters... 2024-06-14 13:20:00 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.990 match -> insertion: 0.006 match -> deletion: 0.004 deletion -> match: 0.672 deletion -> deletion: 0.328 insertion -> match: 0.763 insertion -> insertion: 0.237 Emission Probabilities: match (equal): 0.925 match (not equal): 0.025 insertion: 1.000 deletion: 1.000 2024-06-14 13:20:00 Calling potential SNVs using pileup... 2024-06-14 13:20:00 123 potential variants identified. 2024-06-14 13:20:00 Generating haplotype fragments from reads... 2024-06-14 13:20:05 10% of variants processed... 2024-06-14 13:20:05 20% of variants processed... 2024-06-14 13:20:05 30% of variants processed... 2024-06-14 13:20:06 40% of variants processed... 2024-06-14 13:20:06 100% of variants processed. 2024-06-14 13:20:06 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:06 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:06 Round 1 of haplotype assembly... 2024-06-14 13:20:06 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57 2024-06-14 13:20:08 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24 2024-06-14 13:20:08 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51 2024-06-14 13:20:08 Round 2 of haplotype assembly... 2024-06-14 13:20:08 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-06-14 13:20:09 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35 2024-06-14 13:20:09 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-06-14 13:20:09 Printing VCF file... [M::mm_idx_gen::0.002 *1.83] collected minimizers [M::mm_idx_gen::0.003*3.03] sorted minimizers [M::main::0.003 *3.02] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.88] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 *2.82] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492 [M::worker_pipeline::6.387*3.58] mapped 3343 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz [M::main] Real time: 6.387 sec; CPU: 22.863 sec; Peak RSS: 0.153 GB downsample ratio is 1 for DQA1 Mean depth {'HLA_DQA1': 88} 2024-06-14 13:20:18 Min read coverage set to 2. 2024-06-14 13:20:18 Max read coverage set to 100000. 2024-06-14 13:20:18 Estimating alignment parameters... 2024-06-14 13:20:18 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.991 match -> insertion: 0.005 match -> deletion: 0.004 deletion -> match: 0.514 deletion -> deletion: 0.486 insertion -> match: 0.556 insertion -> insertion: 0.444 Emission Probabilities: match (equal): 0.975 match (not equal): 0.008 insertion: 1.000 deletion: 1.000 2024-06-14 13:20:18 Calling potential SNVs using pileup... 2024-06-14 13:20:18 183 potential variants identified. 2024-06-14 13:20:18 Generating haplotype fragments from reads... 2024-06-14 13:20:18 10% of variants processed... 2024-06-14 13:20:18 20% of variants processed... 2024-06-14 13:20:18 30% of variants processed... 2024-06-14 13:20:18 40% of variants processed... 2024-06-14 13:20:18 50% of variants processed... 2024-06-14 13:20:18 60% of variants processed... 2024-06-14 13:20:18 70% of variants processed... 2024-06-14 13:20:18 80% of variants processed... 2024-06-14 13:20:18 90% of variants processed... 2024-06-14 13:20:18 100% of variants processed. 2024-06-14 13:20:18 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:18 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:18 Round 1 of haplotype assembly... 2024-06-14 13:20:18 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53 2024-06-14 13:20:19 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91 2024-06-14 13:20:19 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78 2024-06-14 13:20:19 Round 2 of haplotype assembly... 2024-06-14 13:20:19 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 Printing VCF file... [M::mm_idx_gen::0.002 *1.73] collected minimizers [M::mm_idx_gen::0.003*2.89] sorted minimizers [M::main::0.003 *2.87] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.74] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 *2.69] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480 [M::worker_pipeline::0.884*2.89] mapped 668 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz [M::main] Real time: 0.884 sec; CPU: 2.556 sec; Peak RSS: 0.096 GB downsample ratio is 1 for DQB1 Mean depth {'HLA_DQB1': 95} 2024-06-14 13:20:21 Min read coverage set to 2. 2024-06-14 13:20:21 Max read coverage set to 100000. 2024-06-14 13:20:21 Estimating alignment parameters... 2024-06-14 13:20:21 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.993 match -> insertion: 0.003 match -> deletion: 0.004 deletion -> match: 0.406 deletion -> deletion: 0.594 insertion -> match: 0.297 insertion -> insertion: 0.703 Emission Probabilities: match (equal): 0.950 match (not equal): 0.017 insertion: 1.000 deletion: 1.000 2024-06-14 13:20:21 Calling potential SNVs using pileup... 2024-06-14 13:20:21 547 potential variants identified. 2024-06-14 13:20:21 Generating haplotype fragments from reads... 2024-06-14 13:20:21 10% of variants processed... 2024-06-14 13:20:22 20% of variants processed... 2024-06-14 13:20:22 30% of variants processed... 2024-06-14 13:20:22 40% of variants processed... 2024-06-14 13:20:22 50% of variants processed... 2024-06-14 13:20:22 60% of variants processed... 2024-06-14 13:20:22 70% of variants processed... 2024-06-14 13:20:22 80% of variants processed... 2024-06-14 13:20:22 90% of variants processed... 2024-06-14 13:20:22 100% of variants processed. 2024-06-14 13:20:22 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:22 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:22 Round 1 of haplotype assembly... 2024-06-14 13:20:22 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63 2024-06-14 13:20:22 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11 2024-06-14 13:20:22 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 Round 2 of haplotype assembly... 2024-06-14 13:20:22 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 Printing VCF file... [M::mm_idx_gen::0.002 *1.00] collected minimizers [M::mm_idx_gen::0.003*2.78] sorted minimizers [M::main::0.003 *2.76] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*2.58] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004 *2.53] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229 [M::worker_pipeline::5.742*0.83] mapped 9477 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz [M::main] Real time: 5.742 sec; CPU: 4.793 sec; Peak RSS: 0.245 GB downsample ratio is 1 for DRB1 Mean depth {'HLA_DRB1': 45} 2024-06-14 13:20:33 Min read coverage set to 2. 2024-06-14 13:20:33 Max read coverage set to 100000. 2024-06-14 13:20:33 Estimating alignment parameters... 2024-06-14 13:20:33 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.995 match -> insertion: 0.003 match -> deletion: 0.002 deletion -> match: 0.625 deletion -> deletion: 0.375 insertion -> match: 0.577 insertion -> insertion: 0.423 Emission Probabilities: match (equal): 0.972 match (not equal): 0.009 insertion: 1.000 deletion: 1.000 2024-06-14 13:20:33 Calling potential SNVs using pileup... 2024-06-14 13:20:33 592 potential variants identified. 2024-06-14 13:20:33 Generating haplotype fragments from reads... 2024-06-14 13:20:34 10% of variants processed... 2024-06-14 13:20:34 20% of variants processed... 2024-06-14 13:20:34 30% of variants processed... 2024-06-14 13:20:34 40% of variants processed... 2024-06-14 13:20:34 50% of variants processed... 2024-06-14 13:20:34 60% of variants processed... 2024-06-14 13:20:34 70% of variants processed... 2024-06-14 13:20:34 80% of variants processed... 2024-06-14 13:20:34 90% of variants processed... 2024-06-14 13:20:34 100% of variants processed. 2024-06-14 13:20:34 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:34 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:34 Round 1 of haplotype assembly... 2024-06-14 13:20:34 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42 2024-06-14 13:20:35 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72 2024-06-14 13:20:35 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21 2024-06-14 13:20:35 Round 2 of haplotype assembly... 2024-06-14 13:20:35 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21 2024-06-14 13:20:38 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22 2024-06-14 13:20:38 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28 2024-06-14 13:20:38 Printing VCF file... *Sequence is reconstructed, start annotation... parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0 No such file or directory* version: IPD-IMGT/HLA 3.38.0 Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DR B1_1 HLA_DRB1_2 optimize typing results by balancing alignment length and identity The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt Finished. ` The part where it breaks (I think) is highlighted in bold — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALS7DWSMLQH27II4EVB46FTZHLVHFAVCNFSM6AAAAABHQGSAT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYGAZTOMRTGY> . You are receiving this because you commented.Message ID: ***@***.***>

alisamatisse · 2024-07-08T20:07:48Z

Hi Shuai,

both vcf and hla.allele..HLA_.fasta contain information, so these steps seemed to work..
I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :(


(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt
Locus   Chromosome      Allele
A       1
A       2
B       1
B       2
C       1
C       2
DPA1    1
DPA1    2
DPB1    1
DPB1    2
DQA1    1
DQA1    2
DQB1    1
DQB1    2
DRB1    1
DRB1    2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.
hla.result.details.txt  hla.result.txt
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.txt
# version: IPD-IMGT/HLA 3.38.0
Sample  HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1      HLA_DPA1_2      HLA_DPB1_1      HLA_DPB1_2      HLA_DQA1_1      HLA_DQA1_2      HLA_DQB1_1      HLA_DQB1_2      HLA_DRB1_1     HLA_DRB1_2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.details.txt
# version: IPD-IMGT/HLA 3.38.0
Gene    G_best  allele  details:allele;Score;Caucasian;Black;Asian

aiakupo@sl-sci-lee1:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output
[M::mm_idx_gen::0.461*1.00] collected minimizers
[M::mm_idx_gen::0.516*1.41] sorted minimizers
[M::main::0.517*1.41] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.519*1.41] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.520*1.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::2785.920*4.99] mapped 87302 sequences
[M::worker_pipeline::5446.614*5.00] mapped 90496 sequences
[M::worker_pipeline::6477.379*4.98] mapped 34247 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz
[M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.002*2.07] collected minimizers
[M::mm_idx_gen::0.003*3.10] sorted minimizers
[M::main::0.003*3.08] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.96] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for A
Mean depth {'HLA_A': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.89] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.11] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*3.00] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for B
Mean depth {'HLA_B': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.77] collected minimizers
[M::mm_idx_gen::0.003*2.86] sorted minimizers
[M::main::0.003*2.84] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.71] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for C
Mean depth {'HLA_C': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.71] collected minimizers
[M::mm_idx_gen::0.003*2.73] sorted minimizers
[M::main::0.003*2.71] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.61] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775
[M::worker_pipeline::0.270*2.62] mapped 231 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz
[M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB
downsample ratio is 1 for DPA1
Mean depth {'HLA_DPA1': 0}

2024-07-08 16:05:01 Min read coverage set to 2.
2024-07-08 16:05:01 Max read coverage set to 100000.
2024-07-08 16:05:01 Estimating alignment parameters...
2024-07-08 16:05:01 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:01 Calling potential SNVs using pileup...
2024-07-08 16:05:01 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.64] collected minimizers
[M::mm_idx_gen::0.003*2.93] sorted minimizers
[M::main::0.003*2.91] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.79] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468
[M::worker_pipeline::7.230*2.84] mapped 5835 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz
[M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB
downsample ratio is 1 for DPB1
Mean depth {'HLA_DPB1': 182}

2024-07-08 16:05:12 Min read coverage set to 2.
2024-07-08 16:05:12 Max read coverage set to 100000.
2024-07-08 16:05:12 Estimating alignment parameters...
2024-07-08 16:05:12 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.990
                    match -> insertion:      0.006
                    match -> deletion:       0.004
                    deletion -> match:       0.672
                    deletion -> deletion:    0.328
                    insertion -> match:      0.763
                    insertion -> insertion:  0.237

                    Emission Probabilities:
                    match (equal):           0.925
                    match (not equal):       0.025
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:12 Calling potential SNVs using pileup...
2024-07-08 16:05:12 123 potential variants identified.
2024-07-08 16:05:12 Generating haplotype fragments from reads...
2024-07-08 16:05:17    10% of variants processed...
2024-07-08 16:05:17    20% of variants processed...
2024-07-08 16:05:18    30% of variants processed...
2024-07-08 16:05:18    40% of variants processed...
2024-07-08 16:05:18    100% of variants processed.
2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:18    Round 1 of haplotype assembly...
2024-07-08 16:05:18    (Before HapCUT2) Total phased heterozygous SNVs: 61  Total likelihood (phred): 1012374.57
2024-07-08 16:05:20    (After HapCUT2)  Total phased heterozygous SNVs: 61  Total likelihood (phred): 322425.24
2024-07-08 16:05:20    (After Greedy)   Total phased heterozygous SNVs: 61  Total likelihood (phred): 291216.51
2024-07-08 16:05:20    Round 2 of haplotype assembly...
2024-07-08 16:05:20    (Before HapCUT2) Total phased heterozygous SNVs: 46  Total likelihood (phred): 291216.51
2024-07-08 16:05:21    (After HapCUT2)  Total phased heterozygous SNVs: 46  Total likelihood (phred): 291900.35
2024-07-08 16:05:21    (After Greedy)   Total phased heterozygous SNVs: 46  Total likelihood (phred): 291216.51
2024-07-08 16:05:21 Printing VCF file...
[M::mm_idx_gen::0.002*1.75] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.12] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.99] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492
[M::worker_pipeline::6.387*3.58] mapped 3343 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz
[M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB
downsample ratio is 1 for DQA1
Mean depth {'HLA_DQA1': 88}

2024-07-08 16:05:30 Min read coverage set to 2.
2024-07-08 16:05:30 Max read coverage set to 100000.
2024-07-08 16:05:30 Estimating alignment parameters...
2024-07-08 16:05:30 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.991
                    match -> insertion:      0.005
                    match -> deletion:       0.004
                    deletion -> match:       0.514
                    deletion -> deletion:    0.486
                    insertion -> match:      0.556
                    insertion -> insertion:  0.444

                    Emission Probabilities:
                    match (equal):           0.975
                    match (not equal):       0.008
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:30 Calling potential SNVs using pileup...
2024-07-08 16:05:30 183 potential variants identified.
2024-07-08 16:05:30 Generating haplotype fragments from reads...
2024-07-08 16:05:30    10% of variants processed...
2024-07-08 16:05:30    20% of variants processed...
2024-07-08 16:05:30    30% of variants processed...
2024-07-08 16:05:30    40% of variants processed...
2024-07-08 16:05:30    50% of variants processed...
2024-07-08 16:05:30    60% of variants processed...
2024-07-08 16:05:30    70% of variants processed...
2024-07-08 16:05:31    80% of variants processed...
2024-07-08 16:05:31    90% of variants processed...
2024-07-08 16:05:31    100% of variants processed.
2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:31    Round 1 of haplotype assembly...
2024-07-08 16:05:31    (Before HapCUT2) Total phased heterozygous SNVs: 123  Total likelihood (phred): 92420.53
2024-07-08 16:05:31    (After HapCUT2)  Total phased heterozygous SNVs: 123  Total likelihood (phred): 38571.91
2024-07-08 16:05:31    (After Greedy)   Total phased heterozygous SNVs: 123  Total likelihood (phred): 34372.78
2024-07-08 16:05:31    Round 2 of haplotype assembly...
2024-07-08 16:05:31    (Before HapCUT2) Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32    (After HapCUT2)  Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32    (After Greedy)   Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32 Printing VCF file...
[M::mm_idx_gen::0.002*1.79] collected minimizers
[M::mm_idx_gen::0.003*3.00] sorted minimizers
[M::main::0.003*2.99] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.84] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480
[M::worker_pipeline::0.888*2.90] mapped 668 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz
[M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB
downsample ratio is 1 for DQB1
Mean depth {'HLA_DQB1': 95}

2024-07-08 16:05:33 Min read coverage set to 2.
2024-07-08 16:05:33 Max read coverage set to 100000.
2024-07-08 16:05:33 Estimating alignment parameters...
2024-07-08 16:05:33 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.993
                    match -> insertion:      0.003
                    match -> deletion:       0.004
                    deletion -> match:       0.406
                    deletion -> deletion:    0.594
                    insertion -> match:      0.297
                    insertion -> insertion:  0.703

                    Emission Probabilities:
                    match (equal):           0.950
                    match (not equal):       0.017
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:33 Calling potential SNVs using pileup...
2024-07-08 16:05:33 547 potential variants identified.
2024-07-08 16:05:33 Generating haplotype fragments from reads...
2024-07-08 16:05:34    10% of variants processed...
2024-07-08 16:05:34    20% of variants processed...
2024-07-08 16:05:34    30% of variants processed...
2024-07-08 16:05:34    40% of variants processed...
2024-07-08 16:05:34    50% of variants processed...
2024-07-08 16:05:34    60% of variants processed...
2024-07-08 16:05:34    70% of variants processed...
2024-07-08 16:05:34    80% of variants processed...
2024-07-08 16:05:34    90% of variants processed...
2024-07-08 16:05:34    100% of variants processed.
2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:34    Round 1 of haplotype assembly...
2024-07-08 16:05:34    (Before HapCUT2) Total phased heterozygous SNVs: 23  Total likelihood (phred): 757167.63
2024-07-08 16:05:34    (After HapCUT2)  Total phased heterozygous SNVs: 23  Total likelihood (phred): 36533.11
2024-07-08 16:05:34    (After Greedy)   Total phased heterozygous SNVs: 23  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    Round 2 of haplotype assembly...
2024-07-08 16:05:34    (Before HapCUT2) Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    (After HapCUT2)  Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    (After Greedy)   Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34 Printing VCF file...
[M::mm_idx_gen::0.002*1.70] collected minimizers
[M::mm_idx_gen::0.003*2.97] sorted minimizers
[M::main::0.003*2.95] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.81] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229
[M::worker_pipeline::5.751*0.83] mapped 9477 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz
[M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB
downsample ratio is 1 for DRB1
Mean depth {'HLA_DRB1': 45}

2024-07-08 16:05:45 Min read coverage set to 2.
2024-07-08 16:05:45 Max read coverage set to 100000.
2024-07-08 16:05:45 Estimating alignment parameters...
2024-07-08 16:05:45 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.995
                    match -> insertion:      0.003
                    match -> deletion:       0.002
                    deletion -> match:       0.625
                    deletion -> deletion:    0.375
                    insertion -> match:      0.577
                    insertion -> insertion:  0.423

                    Emission Probabilities:
                    match (equal):           0.972
                    match (not equal):       0.009
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:45 Calling potential SNVs using pileup...
2024-07-08 16:05:45 592 potential variants identified.
2024-07-08 16:05:45 Generating haplotype fragments from reads...
2024-07-08 16:05:46    10% of variants processed...
2024-07-08 16:05:46    20% of variants processed...
2024-07-08 16:05:46    30% of variants processed...
2024-07-08 16:05:46    40% of variants processed...
2024-07-08 16:05:46    50% of variants processed...
2024-07-08 16:05:46    60% of variants processed...
2024-07-08 16:05:46    70% of variants processed...
2024-07-08 16:05:46    80% of variants processed...
2024-07-08 16:05:46    90% of variants processed...
2024-07-08 16:05:46    100% of variants processed.
2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:46    Round 1 of haplotype assembly...
2024-07-08 16:05:46    (Before HapCUT2) Total phased heterozygous SNVs: 216  Total likelihood (phred): 224750.42
2024-07-08 16:05:48    (After HapCUT2)  Total phased heterozygous SNVs: 216  Total likelihood (phred): 61012.72
2024-07-08 16:05:48    (After Greedy)   Total phased heterozygous SNVs: 216  Total likelihood (phred): 44938.21
2024-07-08 16:05:48    Round 2 of haplotype assembly...
2024-07-08 16:05:48    (Before HapCUT2) Total phased heterozygous SNVs: 327  Total likelihood (phred): 44938.21
2024-07-08 16:05:51    (After HapCUT2)  Total phased heterozygous SNVs: 327  Total likelihood (phred): 47655.22
2024-07-08 16:05:51    (After Greedy)   Total phased heterozygous SNVs: 327  Total likelihood (phred): 44208.28
2024-07-08 16:05:51 Printing VCF file...
Sequence is reconstructed, start annotation...
parameter:      sample:DNA01    dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown     wxs:tgs G_nom:0
No such file or directory
# version: IPD-IMGT/HLA 3.38.0
Sample  HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1      HLA_DPA1_2      HLA_DPB1_1      HLA_DPB1_2      HLA_DQA1_1      HLA_DQA1_2      HLA_DQB1_1      HLA_DQB1_2      HLA_DRB1_1     HLA_DRB1_2
optimize typing results by balancing alignment length and identity
The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt
Finished.

wshuai294 · 2024-07-09T01:52:39Z

Hi Alisa, The log shows the HLA II genes have high depth. The database likely causes the issue. Does the folder '~/SpecHLA/db/' exist? If not, please download it. If it exists, updating to the latest version of SpecHLA might solve the problem. Please let me know if these work for you.

…

On Tue, Jul 9, 2024 at 4:08 AM Alisa ***@***.***> wrote: Hi Shuai, both vcf and hla.allele.*.HLA_*.fasta contain information, so these steps seemed to work.. I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :( (base) ***@***.***:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt Locus Chromosome Allele A 1 A 2 B 1 B 2 C 1 C 2 DPA1 1 DPA1 2 DPB1 1 DPB1 2 DQA1 1 DQA1 2 DQB1 1 DQB1 2 DRB1 1 DRB1 2 (base) ***@***.***:~/SpecHLA/output/DNA01$ cat hla.result. hla.result.details.txt hla.result.txt (base) ***@***.***:~/SpecHLA/output/DNA01$ cat hla.result.txt # version: IPD-IMGT/HLA 3.38.0 Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2 (base) ***@***.***:~/SpecHLA/output/DNA01$ cat hla.result.details.txt # version: IPD-IMGT/HLA 3.38.0 Gene G_best allele details:allele;Score;Caucasian;Black;Asian ***@***.***:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output [M::mm_idx_gen::0.461*1.00] collected minimizers [M::mm_idx_gen::0.516*1.41] sorted minimizers [M::main::0.517*1.41] loaded/built the index for 1591 target sequence(s) [M::mm_mapopt_update::0.519*1.41] mid_occ = 936 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591 [M::mm_idx_stat::0.520*1.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978 [M::worker_pipeline::2785.920*4.99] mapped 87302 sequences [M::worker_pipeline::5446.614*5.00] mapped 90496 sequences [M::worker_pipeline::6477.379*4.98] mapped 34247 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz [M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB alignment done. reads-binning done. [M::mm_idx_gen::0.002*2.07] collected minimizers [M::mm_idx_gen::0.003*3.10] sorted minimizers [M::main::0.003*3.08] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.96] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for A Mean depth {'HLA_A': 0} 2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002*1.89] collected minimizers [M::mm_idx_gen::0.003*3.13] sorted minimizers [M::main::0.003*3.11] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*3.00] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for B Mean depth {'HLA_B': 0} 2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002*1.77] collected minimizers [M::mm_idx_gen::0.003*2.86] sorted minimizers [M::main::0.003*2.84] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.71] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB downsample ratio is 1 for C Mean depth {'HLA_C': 0} 2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002*1.71] collected minimizers [M::mm_idx_gen::0.003*2.73] sorted minimizers [M::main::0.003*2.71] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.61] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004*2.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775 [M::worker_pipeline::0.270*2.62] mapped 231 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz [M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB downsample ratio is 1 for DPA1 Mean depth {'HLA_DPA1': 0} 2024-07-08 16:05:01 Min read coverage set to 2. 2024-07-08 16:05:01 Max read coverage set to 100000. 2024-07-08 16:05:01 Estimating alignment parameters... 2024-07-08 16:05:01 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:01 Calling potential SNVs using pileup... 2024-07-08 16:05:01 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002*1.64] collected minimizers [M::mm_idx_gen::0.003*2.93] sorted minimizers [M::main::0.003*2.91] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*2.79] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004*2.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468 [M::worker_pipeline::7.230*2.84] mapped 5835 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz [M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB downsample ratio is 1 for DPB1 Mean depth {'HLA_DPB1': 182} 2024-07-08 16:05:12 Min read coverage set to 2. 2024-07-08 16:05:12 Max read coverage set to 100000. 2024-07-08 16:05:12 Estimating alignment parameters... 2024-07-08 16:05:12 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.990 match -> insertion: 0.006 match -> deletion: 0.004 deletion -> match: 0.672 deletion -> deletion: 0.328 insertion -> match: 0.763 insertion -> insertion: 0.237 Emission Probabilities: match (equal): 0.925 match (not equal): 0.025 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:12 Calling potential SNVs using pileup... 2024-07-08 16:05:12 123 potential variants identified. 2024-07-08 16:05:12 Generating haplotype fragments from reads... 2024-07-08 16:05:17 10% of variants processed... 2024-07-08 16:05:17 20% of variants processed... 2024-07-08 16:05:18 30% of variants processed... 2024-07-08 16:05:18 40% of variants processed... 2024-07-08 16:05:18 100% of variants processed. 2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:18 Round 1 of haplotype assembly... 2024-07-08 16:05:18 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57 2024-07-08 16:05:20 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24 2024-07-08 16:05:20 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51 2024-07-08 16:05:20 Round 2 of haplotype assembly... 2024-07-08 16:05:20 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-07-08 16:05:21 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35 2024-07-08 16:05:21 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-07-08 16:05:21 Printing VCF file... [M::mm_idx_gen::0.002*1.75] collected minimizers [M::mm_idx_gen::0.003*3.13] sorted minimizers [M::main::0.003*3.12] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*2.99] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004*2.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492 [M::worker_pipeline::6.387*3.58] mapped 3343 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz [M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB downsample ratio is 1 for DQA1 Mean depth {'HLA_DQA1': 88} 2024-07-08 16:05:30 Min read coverage set to 2. 2024-07-08 16:05:30 Max read coverage set to 100000. 2024-07-08 16:05:30 Estimating alignment parameters... 2024-07-08 16:05:30 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.991 match -> insertion: 0.005 match -> deletion: 0.004 deletion -> match: 0.514 deletion -> deletion: 0.486 insertion -> match: 0.556 insertion -> insertion: 0.444 Emission Probabilities: match (equal): 0.975 match (not equal): 0.008 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:30 Calling potential SNVs using pileup... 2024-07-08 16:05:30 183 potential variants identified. 2024-07-08 16:05:30 Generating haplotype fragments from reads... 2024-07-08 16:05:30 10% of variants processed... 2024-07-08 16:05:30 20% of variants processed... 2024-07-08 16:05:30 30% of variants processed... 2024-07-08 16:05:30 40% of variants processed... 2024-07-08 16:05:30 50% of variants processed... 2024-07-08 16:05:30 60% of variants processed... 2024-07-08 16:05:30 70% of variants processed... 2024-07-08 16:05:31 80% of variants processed... 2024-07-08 16:05:31 90% of variants processed... 2024-07-08 16:05:31 100% of variants processed. 2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:31 Round 1 of haplotype assembly... 2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53 2024-07-08 16:05:31 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91 2024-07-08 16:05:31 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78 2024-07-08 16:05:31 Round 2 of haplotype assembly... 2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 Printing VCF file... [M::mm_idx_gen::0.002*1.79] collected minimizers [M::mm_idx_gen::0.003*3.00] sorted minimizers [M::main::0.003*2.99] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*2.84] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480 [M::worker_pipeline::0.888*2.90] mapped 668 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz [M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB downsample ratio is 1 for DQB1 Mean depth {'HLA_DQB1': 95} 2024-07-08 16:05:33 Min read coverage set to 2. 2024-07-08 16:05:33 Max read coverage set to 100000. 2024-07-08 16:05:33 Estimating alignment parameters... 2024-07-08 16:05:33 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.993 match -> insertion: 0.003 match -> deletion: 0.004 deletion -> match: 0.406 deletion -> deletion: 0.594 insertion -> match: 0.297 insertion -> insertion: 0.703 Emission Probabilities: match (equal): 0.950 match (not equal): 0.017 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:33 Calling potential SNVs using pileup... 2024-07-08 16:05:33 547 potential variants identified. 2024-07-08 16:05:33 Generating haplotype fragments from reads... 2024-07-08 16:05:34 10% of variants processed... 2024-07-08 16:05:34 20% of variants processed... 2024-07-08 16:05:34 30% of variants processed... 2024-07-08 16:05:34 40% of variants processed... 2024-07-08 16:05:34 50% of variants processed... 2024-07-08 16:05:34 60% of variants processed... 2024-07-08 16:05:34 70% of variants processed... 2024-07-08 16:05:34 80% of variants processed... 2024-07-08 16:05:34 90% of variants processed... 2024-07-08 16:05:34 100% of variants processed. 2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:34 Round 1 of haplotype assembly... 2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63 2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11 2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 Round 2 of haplotype assembly... 2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 Printing VCF file... [M::mm_idx_gen::0.002*1.70] collected minimizers [M::mm_idx_gen::0.003*2.97] sorted minimizers [M::main::0.003*2.95] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*2.81] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004*2.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229 [M::worker_pipeline::5.751*0.83] mapped 9477 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz [M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB downsample ratio is 1 for DRB1 Mean depth {'HLA_DRB1': 45} 2024-07-08 16:05:45 Min read coverage set to 2. 2024-07-08 16:05:45 Max read coverage set to 100000. 2024-07-08 16:05:45 Estimating alignment parameters... 2024-07-08 16:05:45 Done estimating alignment parameters. Transition Probabilities: match -> match: 0.995 match -> insertion: 0.003 match -> deletion: 0.002 deletion -> match: 0.625 deletion -> deletion: 0.375 insertion -> match: 0.577 insertion -> insertion: 0.423 Emission Probabilities: match (equal): 0.972 match (not equal): 0.009 insertion: 1.000 deletion: 1.000 2024-07-08 16:05:45 Calling potential SNVs using pileup... 2024-07-08 16:05:45 592 potential variants identified. 2024-07-08 16:05:45 Generating haplotype fragments from reads... 2024-07-08 16:05:46 10% of variants processed... 2024-07-08 16:05:46 20% of variants processed... 2024-07-08 16:05:46 30% of variants processed... 2024-07-08 16:05:46 40% of variants processed... 2024-07-08 16:05:46 50% of variants processed... 2024-07-08 16:05:46 60% of variants processed... 2024-07-08 16:05:46 70% of variants processed... 2024-07-08 16:05:46 80% of variants processed... 2024-07-08 16:05:46 90% of variants processed... 2024-07-08 16:05:46 100% of variants processed. 2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:46 Round 1 of haplotype assembly... 2024-07-08 16:05:46 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42 2024-07-08 16:05:48 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72 2024-07-08 16:05:48 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21 2024-07-08 16:05:48 Round 2 of haplotype assembly... 2024-07-08 16:05:48 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21 2024-07-08 16:05:51 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22 2024-07-08 16:05:51 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28 2024-07-08 16:05:51 Printing VCF file... Sequence is reconstructed, start annotation... parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0 No such file or directory # version: IPD-IMGT/HLA 3.38.0 Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2 optimize typing results by balancing alignment length and identity The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt Finished. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALS7DWX5TWCLKFWXTJFR57LZLLWSXAVCNFSM6AAAAABHQGSAT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJVGE2DOMRSGE> . You are receiving this because you commented.Message ID: ***@***.***>

alisamatisse · 2024-07-20T07:53:06Z

Hi Shuai,

Thanks for your patience, it worked! The problem was installing dependencies... maybe you could somehow specify what has to be installed exactly just for the "long_read_typing.py" script. At some point I had an error I did not have "bgzip" and "tabix"... it was not obvious to me as I specifically installed conda for this analysis for the first time. Sorry! Now if I have other questions, I will open a new issue :) grateful for your support!!

wshuai294 · 2024-07-22T05:40:55Z

So nice to hear it works. Also, it is a good idea to use a specific env for long_read_typing.py. I will add this. Thanks for the advice.

alisamatisse closed this as completed May 12, 2024

alisamatisse changed the title ~~installation on guix~~ complaining about dependencies / empty output files Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

complaining about dependencies / empty output files #29

complaining about dependencies / empty output files #29

alisamatisse commented May 10, 2024

wshuai294 commented May 10, 2024

alisamatisse commented May 12, 2024

wshuai294 commented May 13, 2024

alisamatisse commented May 14, 2024

alisamatisse commented May 14, 2024

wshuai294 commented May 14, 2024

wshuai294 commented Jun 14, 2024 via email

alisamatisse commented Jul 8, 2024

wshuai294 commented Jul 9, 2024 via email

alisamatisse commented Jul 20, 2024

wshuai294 commented Jul 22, 2024

complaining about dependencies / empty output files #29

complaining about dependencies / empty output files #29

Comments

alisamatisse commented May 10, 2024

wshuai294 commented May 10, 2024

alisamatisse commented May 12, 2024

wshuai294 commented May 13, 2024

alisamatisse commented May 14, 2024

alisamatisse commented May 14, 2024

wshuai294 commented May 14, 2024

wshuai294 commented Jun 14, 2024 via email

alisamatisse commented Jul 8, 2024

wshuai294 commented Jul 9, 2024 via email

alisamatisse commented Jul 20, 2024

wshuai294 commented Jul 22, 2024