Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complaining about dependencies / empty output files #29

Closed
alisamatisse opened this issue May 10, 2024 · 11 comments
Closed

complaining about dependencies / empty output files #29

alisamatisse opened this issue May 10, 2024 · 11 comments

Comments

@alisamatisse
Copy link

Hi,
it would be great if this package could be compatible with Guix, as this can help to make the work more reproducible.. are you planning this? because we use Guix, it would take too much effort to try to contain the package and its dependencies in the conda environment. Maybe I am missing something :(

@wshuai294
Copy link
Collaborator

Hello, thanks for considering our method. We have provided the conda environment file which can be used to construct the env. We assume it is suitable for Linux systems. Is it not applicable to Guix? What error did you meet in running it?

@alisamatisse
Copy link
Author

It is not compatible with the how our cluster paths are organized, but I guess there is not much that can be done from your side. Could it be executed independently from the conda environment? Really would like to try using your tool. Maybe will run it locally..

@wshuai294
Copy link
Collaborator

Hello. I think it could be possible to use it without the conda environment. However, you should install the dependencies yourself. What functions do you want to use? If you only want to handle long-read data, there will be no need to install it. You can directly use the Python script. If you handle Illumina reads, it will be more complex. But I think it is also possible.

@alisamatisse
Copy link
Author

Hello Shuai, that it is very promising, thank you. I am now trying to make it work...

@alisamatisse
Copy link
Author

Terribly sorry for all the stupid questions, I tried to run the demo files

python3 script/long_read_typing.py -j 5 -n HG00733 -r HG00733_pacbio.subsample.fastq.gz -o output

but have this

[M::mm_idx_gen::0.568*0.84] collected minimizers
[M::mm_idx_gen::0.710*1.06] sorted minimizers
[M::main::0.713*1.06] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.714*1.06] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.715*1.06] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::462.895*2.00] mapped 851 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db/ref/hla_gen.format.filter.extend.DRB.no26789.fasta HG00733_pacbio.subsample.fastq.gz
[M::main] Real time: 462.901 sec; CPU: 923.485 sec; Peak RSS: 0.938 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.015*0.80] collected minimizers
[M::mm_idx_gen::0.017*0.88] sorted minimizers
[M::main::0.017*0.88] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.018*0.88] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.018*0.88] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::worker_pipeline::0.034*1.15] mapped 6 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa output/HG00733//A.long_read.fq.gz
[M::main] Real time: 0.036 sec; CPU: 0.040 sec; Peak RSS: 0.011 GB
  File "/fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/mask_low_depth_region.py", line 113
    print (gene, mask[0]-1, mask[1]-1, file = f)
                                            ^
SyntaxError: invalid syntax
cp: cannot stat ‘output/HG00733//low_depth.bed’: No such file or directory

Sorry again. I will understand if you decide not to reply, haha.

@wshuai294
Copy link
Collaborator

Hello, cause it uses python instead of python3 in the 215-th line. I have edited this. Please update to the latest commit. I believe it will work now.

@wshuai294
Copy link
Collaborator

wshuai294 commented Jun 14, 2024 via email

@alisamatisse
Copy link
Author

Hi Shuai,

both vcf and hla.allele..HLA_.fasta contain information, so these steps seemed to work..
I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :(


(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt
Locus   Chromosome      Allele
A       1
A       2
B       1
B       2
C       1
C       2
DPA1    1
DPA1    2
DPB1    1
DPB1    2
DQA1    1
DQA1    2
DQB1    1
DQB1    2
DRB1    1
DRB1    2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.
hla.result.details.txt  hla.result.txt
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.txt
# version: IPD-IMGT/HLA 3.38.0
Sample  HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1      HLA_DPA1_2      HLA_DPB1_1      HLA_DPB1_2      HLA_DQA1_1      HLA_DQA1_2      HLA_DQB1_1      HLA_DQB1_2      HLA_DRB1_1     HLA_DRB1_2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.details.txt
# version: IPD-IMGT/HLA 3.38.0
Gene    G_best  allele  details:allele;Score;Caucasian;Black;Asian

aiakupo@sl-sci-lee1:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output
[M::mm_idx_gen::0.461*1.00] collected minimizers
[M::mm_idx_gen::0.516*1.41] sorted minimizers
[M::main::0.517*1.41] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.519*1.41] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.520*1.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::2785.920*4.99] mapped 87302 sequences
[M::worker_pipeline::5446.614*5.00] mapped 90496 sequences
[M::worker_pipeline::6477.379*4.98] mapped 34247 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz
[M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.002*2.07] collected minimizers
[M::mm_idx_gen::0.003*3.10] sorted minimizers
[M::main::0.003*3.08] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.96] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for A
Mean depth {'HLA_A': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.89] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.11] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*3.00] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for B
Mean depth {'HLA_B': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.77] collected minimizers
[M::mm_idx_gen::0.003*2.86] sorted minimizers
[M::main::0.003*2.84] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.71] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for C
Mean depth {'HLA_C': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.71] collected minimizers
[M::mm_idx_gen::0.003*2.73] sorted minimizers
[M::main::0.003*2.71] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.61] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775
[M::worker_pipeline::0.270*2.62] mapped 231 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz
[M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB
downsample ratio is 1 for DPA1
Mean depth {'HLA_DPA1': 0}

2024-07-08 16:05:01 Min read coverage set to 2.
2024-07-08 16:05:01 Max read coverage set to 100000.
2024-07-08 16:05:01 Estimating alignment parameters...
2024-07-08 16:05:01 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:01 Calling potential SNVs using pileup...
2024-07-08 16:05:01 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.64] collected minimizers
[M::mm_idx_gen::0.003*2.93] sorted minimizers
[M::main::0.003*2.91] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.79] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468
[M::worker_pipeline::7.230*2.84] mapped 5835 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz
[M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB
downsample ratio is 1 for DPB1
Mean depth {'HLA_DPB1': 182}

2024-07-08 16:05:12 Min read coverage set to 2.
2024-07-08 16:05:12 Max read coverage set to 100000.
2024-07-08 16:05:12 Estimating alignment parameters...
2024-07-08 16:05:12 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.990
                    match -> insertion:      0.006
                    match -> deletion:       0.004
                    deletion -> match:       0.672
                    deletion -> deletion:    0.328
                    insertion -> match:      0.763
                    insertion -> insertion:  0.237

                    Emission Probabilities:
                    match (equal):           0.925
                    match (not equal):       0.025
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:12 Calling potential SNVs using pileup...
2024-07-08 16:05:12 123 potential variants identified.
2024-07-08 16:05:12 Generating haplotype fragments from reads...
2024-07-08 16:05:17    10% of variants processed...
2024-07-08 16:05:17    20% of variants processed...
2024-07-08 16:05:18    30% of variants processed...
2024-07-08 16:05:18    40% of variants processed...
2024-07-08 16:05:18    100% of variants processed.
2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:18    Round 1 of haplotype assembly...
2024-07-08 16:05:18    (Before HapCUT2) Total phased heterozygous SNVs: 61  Total likelihood (phred): 1012374.57
2024-07-08 16:05:20    (After HapCUT2)  Total phased heterozygous SNVs: 61  Total likelihood (phred): 322425.24
2024-07-08 16:05:20    (After Greedy)   Total phased heterozygous SNVs: 61  Total likelihood (phred): 291216.51
2024-07-08 16:05:20    Round 2 of haplotype assembly...
2024-07-08 16:05:20    (Before HapCUT2) Total phased heterozygous SNVs: 46  Total likelihood (phred): 291216.51
2024-07-08 16:05:21    (After HapCUT2)  Total phased heterozygous SNVs: 46  Total likelihood (phred): 291900.35
2024-07-08 16:05:21    (After Greedy)   Total phased heterozygous SNVs: 46  Total likelihood (phred): 291216.51
2024-07-08 16:05:21 Printing VCF file...
[M::mm_idx_gen::0.002*1.75] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.12] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.99] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492
[M::worker_pipeline::6.387*3.58] mapped 3343 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz
[M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB
downsample ratio is 1 for DQA1
Mean depth {'HLA_DQA1': 88}

2024-07-08 16:05:30 Min read coverage set to 2.
2024-07-08 16:05:30 Max read coverage set to 100000.
2024-07-08 16:05:30 Estimating alignment parameters...
2024-07-08 16:05:30 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.991
                    match -> insertion:      0.005
                    match -> deletion:       0.004
                    deletion -> match:       0.514
                    deletion -> deletion:    0.486
                    insertion -> match:      0.556
                    insertion -> insertion:  0.444

                    Emission Probabilities:
                    match (equal):           0.975
                    match (not equal):       0.008
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:30 Calling potential SNVs using pileup...
2024-07-08 16:05:30 183 potential variants identified.
2024-07-08 16:05:30 Generating haplotype fragments from reads...
2024-07-08 16:05:30    10% of variants processed...
2024-07-08 16:05:30    20% of variants processed...
2024-07-08 16:05:30    30% of variants processed...
2024-07-08 16:05:30    40% of variants processed...
2024-07-08 16:05:30    50% of variants processed...
2024-07-08 16:05:30    60% of variants processed...
2024-07-08 16:05:30    70% of variants processed...
2024-07-08 16:05:31    80% of variants processed...
2024-07-08 16:05:31    90% of variants processed...
2024-07-08 16:05:31    100% of variants processed.
2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:31    Round 1 of haplotype assembly...
2024-07-08 16:05:31    (Before HapCUT2) Total phased heterozygous SNVs: 123  Total likelihood (phred): 92420.53
2024-07-08 16:05:31    (After HapCUT2)  Total phased heterozygous SNVs: 123  Total likelihood (phred): 38571.91
2024-07-08 16:05:31    (After Greedy)   Total phased heterozygous SNVs: 123  Total likelihood (phred): 34372.78
2024-07-08 16:05:31    Round 2 of haplotype assembly...
2024-07-08 16:05:31    (Before HapCUT2) Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32    (After HapCUT2)  Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32    (After Greedy)   Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32 Printing VCF file...
[M::mm_idx_gen::0.002*1.79] collected minimizers
[M::mm_idx_gen::0.003*3.00] sorted minimizers
[M::main::0.003*2.99] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.84] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480
[M::worker_pipeline::0.888*2.90] mapped 668 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz
[M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB
downsample ratio is 1 for DQB1
Mean depth {'HLA_DQB1': 95}

2024-07-08 16:05:33 Min read coverage set to 2.
2024-07-08 16:05:33 Max read coverage set to 100000.
2024-07-08 16:05:33 Estimating alignment parameters...
2024-07-08 16:05:33 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.993
                    match -> insertion:      0.003
                    match -> deletion:       0.004
                    deletion -> match:       0.406
                    deletion -> deletion:    0.594
                    insertion -> match:      0.297
                    insertion -> insertion:  0.703

                    Emission Probabilities:
                    match (equal):           0.950
                    match (not equal):       0.017
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:33 Calling potential SNVs using pileup...
2024-07-08 16:05:33 547 potential variants identified.
2024-07-08 16:05:33 Generating haplotype fragments from reads...
2024-07-08 16:05:34    10% of variants processed...
2024-07-08 16:05:34    20% of variants processed...
2024-07-08 16:05:34    30% of variants processed...
2024-07-08 16:05:34    40% of variants processed...
2024-07-08 16:05:34    50% of variants processed...
2024-07-08 16:05:34    60% of variants processed...
2024-07-08 16:05:34    70% of variants processed...
2024-07-08 16:05:34    80% of variants processed...
2024-07-08 16:05:34    90% of variants processed...
2024-07-08 16:05:34    100% of variants processed.
2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:34    Round 1 of haplotype assembly...
2024-07-08 16:05:34    (Before HapCUT2) Total phased heterozygous SNVs: 23  Total likelihood (phred): 757167.63
2024-07-08 16:05:34    (After HapCUT2)  Total phased heterozygous SNVs: 23  Total likelihood (phred): 36533.11
2024-07-08 16:05:34    (After Greedy)   Total phased heterozygous SNVs: 23  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    Round 2 of haplotype assembly...
2024-07-08 16:05:34    (Before HapCUT2) Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    (After HapCUT2)  Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    (After Greedy)   Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34 Printing VCF file...
[M::mm_idx_gen::0.002*1.70] collected minimizers
[M::mm_idx_gen::0.003*2.97] sorted minimizers
[M::main::0.003*2.95] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.81] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229
[M::worker_pipeline::5.751*0.83] mapped 9477 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz
[M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB
downsample ratio is 1 for DRB1
Mean depth {'HLA_DRB1': 45}

2024-07-08 16:05:45 Min read coverage set to 2.
2024-07-08 16:05:45 Max read coverage set to 100000.
2024-07-08 16:05:45 Estimating alignment parameters...
2024-07-08 16:05:45 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.995
                    match -> insertion:      0.003
                    match -> deletion:       0.002
                    deletion -> match:       0.625
                    deletion -> deletion:    0.375
                    insertion -> match:      0.577
                    insertion -> insertion:  0.423

                    Emission Probabilities:
                    match (equal):           0.972
                    match (not equal):       0.009
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:45 Calling potential SNVs using pileup...
2024-07-08 16:05:45 592 potential variants identified.
2024-07-08 16:05:45 Generating haplotype fragments from reads...
2024-07-08 16:05:46    10% of variants processed...
2024-07-08 16:05:46    20% of variants processed...
2024-07-08 16:05:46    30% of variants processed...
2024-07-08 16:05:46    40% of variants processed...
2024-07-08 16:05:46    50% of variants processed...
2024-07-08 16:05:46    60% of variants processed...
2024-07-08 16:05:46    70% of variants processed...
2024-07-08 16:05:46    80% of variants processed...
2024-07-08 16:05:46    90% of variants processed...
2024-07-08 16:05:46    100% of variants processed.
2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:46    Round 1 of haplotype assembly...
2024-07-08 16:05:46    (Before HapCUT2) Total phased heterozygous SNVs: 216  Total likelihood (phred): 224750.42
2024-07-08 16:05:48    (After HapCUT2)  Total phased heterozygous SNVs: 216  Total likelihood (phred): 61012.72
2024-07-08 16:05:48    (After Greedy)   Total phased heterozygous SNVs: 216  Total likelihood (phred): 44938.21
2024-07-08 16:05:48    Round 2 of haplotype assembly...
2024-07-08 16:05:48    (Before HapCUT2) Total phased heterozygous SNVs: 327  Total likelihood (phred): 44938.21
2024-07-08 16:05:51    (After HapCUT2)  Total phased heterozygous SNVs: 327  Total likelihood (phred): 47655.22
2024-07-08 16:05:51    (After Greedy)   Total phased heterozygous SNVs: 327  Total likelihood (phred): 44208.28
2024-07-08 16:05:51 Printing VCF file...
Sequence is reconstructed, start annotation...
parameter:      sample:DNA01    dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown     wxs:tgs G_nom:0
No such file or directory
# version: IPD-IMGT/HLA 3.38.0
Sample  HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1      HLA_DPA1_2      HLA_DPB1_1      HLA_DPB1_2      HLA_DQA1_1      HLA_DQA1_2      HLA_DQB1_1      HLA_DQB1_2      HLA_DRB1_1     HLA_DRB1_2
optimize typing results by balancing alignment length and identity
The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt
Finished.

@alisamatisse alisamatisse changed the title installation on guix complaining about dependencies / empty output files Jul 8, 2024
@wshuai294
Copy link
Collaborator

wshuai294 commented Jul 9, 2024 via email

@alisamatisse
Copy link
Author

Hi Shuai,

Thanks for your patience, it worked! The problem was installing dependencies... maybe you could somehow specify what has to be installed exactly just for the "long_read_typing.py" script. At some point I had an error I did not have "bgzip" and "tabix"... it was not obvious to me as I specifically installed conda for this analysis for the first time. Sorry! Now if I have other questions, I will open a new issue :) grateful for your support!!

@wshuai294
Copy link
Collaborator

So nice to hear it works. Also, it is a good idea to use a specific env for long_read_typing.py. I will add this. Thanks for the advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants