what do you want to fill in the “--chrom” option? I read the instructions and there is no document. #35

oranges7 · 2023-03-17T02:55:06Z

No description provided.

oranges7 · 2023-03-17T06:40:52Z

I want to call on all chromosomes. How should I write them?

umahsn · 2023-03-17T14:03:36Z

Hi,

What version of NanoCaller are you using? From v3 onwards the default it all chromosomes. For versions before that you needed a separate command NanoCaller_WGS for whole genome.

oranges7 · 2023-03-17T14:33:24Z

Hi,

I should be using the latest version, because I installed it with bioconda this morning, but it requires-- chrom

usage: NanoCaller [-h] [-mode MODE] [-seq SEQUENCING] [-cpu CPU] [-mincov MINCOV] [-maxcov MAXCOV] [-keep_bam] [-o OUTPUT] [-prefix PREFIX]
[-sample SAMPLE] [-include_bed INCLUDE_BED] [-exclude_bed EXCLUDE_BED] [-start START] [-end END] [-p PRESET] -bam BAM -ref
REF -chrom CHROM [-snp_model SNP_MODEL] [-min_allele_freq MIN_ALLELE_FREQ] [-min_nbr_sites MIN_NBR_SITES]
[-nbr_t NEIGHBOR_THRESHOLD] [-sup] [-indel_model INDEL_MODEL] [-ins_t INS_THRESHOLD] [-del_t DEL_THRESHOLD]
[-win_size WIN_SIZE] [-small_win_size SMALL_WIN_SIZE] [-impute_indel_phase] [-phase_bam] [-enable_whatshap]
NanoCaller: error: the following arguments are required: -bam/--bam, -ref/--ref, -chrom/--chrom

I want to calling on the whole genome, how should I do it?
Thank you for your reply.

oranges7 · 2023-03-20T09:44:31Z

Hi，

If possible, can you give me the code of the training process? I want to use more data to train the model.
Look forward to your reply

umahsn · 2023-03-22T15:27:19Z

Hi,

It is possible the conda is still installing the older version by default. You can specify a version for conda to install: conda install -c bioconda nanocaller=3.0.1. You can check the version of NanoCaller installed via NanoCaller --version or conda list.

If you are using NanoCaller v3 and above, please use NanoCaller --bam BAM --ref REF with the rest of the options for whole genome variant calling. You do not need to specify any chromosomes or regions for whole genome.

If you want to use the NanoCaller version below 3, which it seems like you have installed, please use NanoCaller_WGS --bam BAM --ref REF with the rest of the options. In version 2 of NanoCaller, we have a command NanoCaller_WGS for whole genome variant calling which calls NanoCaller command on each chromosome.

But in version 3 of NanoCaller, we have combined the functionality into just one entry point so now in v3+, NanoCaller command does whole genome variant calling by default, and will only do chromosome specific variant calling if --regions option is used.

We will release the training code in the next week or two. I will post an update here when I do so.

oranges7 · 2023-03-23T03:06:25Z

Hi,

Thank you very much for your reply. I have updated the version. Thank you for your patient answer. I look forward to your training code.

oranges7 · 2023-03-24T08:07:46Z

Hi，

I encountered another problem, snp calls will be completed directly, generating an empty file. Because I'm running other programs, is it because I don't have enough memory? How can I correct it? Looking forward to your reply.

2023-03-24 15:58:07.022608: Starting NanoCaller.

NanoCaller command and arguments are saved in the following file: /media/usb3/gyc/vl/nano_hg004_30x/args

SNP Calling Progress: 0%| | 0/6362 [00:38<?, ?it/s]

2023-03-24 15:59:01.451540: Combining SNP calls.

2023-03-24 15:59:01.453847: Compressing and indexing SNP calls.
Writing to /tmp/bcftools.fnzF04
Merging 0 temporary files
Cleaning
Done

oranges7 · 2023-03-31T01:38:46Z

Hi,

I changed the server and can run normally, but why is the result compared with the benchmark file with hap.py, the final result will be very bad?

NanoCaller --bam /home/gyc/ont/fastq/HG001_NBT2018_Guppy_4.2.2.fastq.gz.bam
--ref /home/gyc/ont/seq/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
--cpu 40
--output /home/gyc/ont/nano
--preset ont

Looking forward to your reply.

oranges7 · 2023-04-21T14:15:45Z

Hi,

I am still looking forward to your process of raising features and training models. Thank you very much for your hard work.

umahsn · 2023-05-03T15:09:34Z

Hi,

Sorry for the delay, we are working on releasing the training code, and will need a couple of weeks to get it done because we are trying to finish some other projects at the moment.

Best,
Umair

oranges7 · 2023-05-04T01:18:19Z

Hi，

Thank you for your reply in your busy schedule!
I have encountered some questions. I wonder if you can answer them.
“””
(nano) [gyc@six170sever hg004]$ NanoCaller --bam hg004_v4_30x.bam --ref /home/gyc/ont/seq/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --output /media/usb3/gyc/vl/nano_try --cpu 10 --mode snps --phase --sequencing ont

2023-05-04 09:05:21.702487: Starting NanoCaller.

NanoCaller command and arguments are saved in the following file: /media/usb3/gyc/vl/nano_try/args

SNP Calling Progress: 0%| | 0/6362 [00:30<?, ?it/s]

2023-05-04 09:05:54.798367: Combining SNP calls.

2023-05-04 09:05:54.801669: Compressing and indexing SNP calls.
Writing to /tmp/bcftools.CyNo4Y
Merging 0 temporary files
Cleaning
Done

2023-05-04 09:05:54.975464: SNP calling completed. Time taken= 30.9631
“””

The steps “snp calling” when I ran nanocalller did not seem to run and were completed directly. Do you know what the possible reason is?

oranges7 · 2023-05-04T03:53:15Z

Hi,

I changed the server and ran it with the following error.
“””
(nano) [ywenjing@localhost hg004]$ NanoCaller --bam hg004_v4_30x.bam --ref GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --output /mnt/ywenjing/gyc/hg004/out/ --cpu 20 --preset ont

2023-05-04 10:56:38.121256: Starting NanoCaller.

NanoCaller command and arguments are saved in the following file: /mnt/ywenjing/gyc/hg004/out/args

SNP Calling Progress: 39%|????????????????????????????????? | 2493/6362 [29:50<29:41, 2.17it/s]Process Process-8:
Traceback (most recent call last):
File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/snpCaller.py", line 80, in caller
pos, test_ref, x_test, dp, freq, coverage = get_snp_testing_candidates(params, chunk)
File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/generate_SNP_pileups.py", line 136, in get_snp_testing_candidates
r=ref_dict[v_pos]
KeyError: 21950001
“””
Then it will quit snp calling and run indel calling, but will also report an error.

“””
Process Process-31:
Traceback (most recent call last):
File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/indelCaller.py", line 200, in caller
indel_run(params, indel_dict, job_Q, counter_Q, indel_files_list)
File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/indelCaller.py", line 59, in indel_run
pos, x0_test, x1_test, x2_test, alleles_seq, phase = get_indel_testing_candidates(params, chunk)
File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/generate_indel_pileups.py", line 174, in get_indel_testing_candidates
ref_dict={j:s.upper() if s in 'AGTC' else 'N' for j,s in zip(range(max(1,start-200),end+400+1),fastafile.fetch(chrom,max(1,start-200)-1,end+400)) }
File "pysam/libcfaidx.pyx", line 301, in pysam.libcfaidx.FastaFile.fetch
KeyError: "sequence 'chr14' not present"

“””
Is it because of my bam file problem? I use HG004 subsampled to 30x.

umahsn · 2023-05-04T04:54:04Z

It seems like the problem is that the BAM file has different names for chromosomes than the reference FASTA file. Can you check the header of the BAM file and see if the reference chromosome names agree with the ones in FASTA file or the FASTA index file?

…

On Wed, May 3, 2023, 10:53 PM oranges7 ***@***.***> wrote: Hi, I changed the server and ran it with the following error. “”” (nano) ***@***.*** hg004]$ NanoCaller --bam hg004_v4_30x.bam --ref GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --output /mnt/ywenjing/gyc/hg004/out/ --cpu 20 --preset ont 2023-05-04 10:56:38.121256: Starting NanoCaller. NanoCaller command and arguments are saved in the following file: /mnt/ywenjing/gyc/hg004/out/args SNP Calling Progress: 39%|????????????????????????????????? | 2493/6362 [29:50<29:41, 2.17it/s]Process Process-8: Traceback (most recent call last): File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/snpCaller.py", line 80, in caller pos, test_ref, x_test, dp, freq, coverage = get_snp_testing_candidates(params, chunk) File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/generate_SNP_pileups.py", line 136, in get_snp_testing_candidates r=ref_dict[v_pos] KeyError: 21950001 “”” Then it will quit snp calling and run indel calling, but will also report an error. “”” Process Process-31: Traceback (most recent call last): File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/indelCaller.py", line 200, in caller indel_run(params, indel_dict, job_Q, counter_Q, indel_files_list) File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/indelCaller.py", line 59, in indel_run pos, x0_test, x1_test, x2_test, alleles_seq, phase = get_indel_testing_candidates(params, chunk) File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/generate_indel_pileups.py", line 174, in get_indel_testing_candidates ref_dict={j:s.upper() if s in 'AGTC' else 'N' for j,s in zip(range(max(1,start-200),end+400+1),fastafile.fetch(chrom,max(1,start-200)-1,end+400)) } File "pysam/libcfaidx.pyx", line 301, in pysam.libcfaidx.FastaFile.fetch KeyError: "sequence 'chr14' not present" “”” Is it because of my bam file problem? I use HG004 subsampled to 30x. — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIRI4S3TXJOYM5AAYEWFT5LXEMR3LANCNFSM6AAAAAAV57VKMI> . You are receiving this because you commented.Message ID: ***@***.***>

oranges7 · 2023-05-05T03:24:33Z

Yes, I re-checked the index file and found that the file was corrupted. I fixed the file and the program could run normally. But the file on the other server is intact, but it doesn't work properly. I'm looking for a solution.
Thank you for your reply！

oranges7 · 2023-05-10T13:02:58Z

Hi,
I'm sorry to bother you again, but I'd like to know what kind of label you use in your training model. Can you give me a brief description?

oranges7 · 2023-05-10T13:11:30Z

Another question is, is the v_pos in the generate_indel_pileups.py file the last detected site?

umahsn · 2023-05-10T16:57:31Z

Hi,

I have added the training code here: https://github.com/WGLab/NanoCaller/tree/master/misc/training
This code uses tensorflow 1.13 for training, but later we switched to tensorflow 2 and I only move the inference code to tf2 and not the training code. I will make another update with tf2 code for training. Use ground truth SNP VCF file for SNP model training and ground truth indel VCF file for indel training.

In the inference code, v_pos is the current position being scanned, prev is the last candidate site plus some buffer. Please note that the equivalent code for training is a little different.

oranges7 · 2023-05-19T02:33:04Z

Hi，

I used nanocaller to run out of results, using hap.py for analysis, found that the precision is very poor. I am using Guppy4.2.2-based hg004 data, down to 50x, using the following command
NanoCaller --bam hg004_50x.bam --ref GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --output nano --cpu 20 --preset ont
What adjustments or attempts should I make to improve precision? Am I going to add the "--exclude_bed" command?

umahsn · 2023-05-22T02:15:43Z

Hi,

Can you give some information about how you are evaluating the performance, such as: 1) what benchmark dataset are you using? 2) are you using a high confidence interval for evaluation? 3) are you separating SNP and indel performance? If yes, is the precision poor for SNP or indels?

Using --exclude_bed with hg19/hg38 parameter will skip telomere and centromere regions from variant calling so that could improve precision, but usually high confidence intervals in benchmarking dont include these regions for evaluation anyway. So it depends upon whether you are using a high confidence interval for evaluation or not.

You can also try to increase '--min_allele_freq' parameter to 0.2 or 0.25 (instead of default of 0.15), and --mincov parameter to 8 (instead of default of 4); both of these will improve precision.

oranges7 · 2023-05-22T02:40:21Z

Hi,

The benchmark dataset I use is HG004_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.
I didn't use a high confidence interval.
According to hap.py, the accuracy of SNP is about 55%. The accuracy of indel is about 0.06%.
I'll try to run the program again with the '--min_allele_freq' parameter.

Thank you for your reply.

umahsn · 2023-05-22T03:32:48Z

You should use a high confidence interval such as: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG004_NA24143_mother/NISTv4.2.1/GRCh38/HG004_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed. Without high-confidence interval, you can end up with regions that do not have reliable SNP calls, and SNP calls from short-read sequencing for such regions are typically removed from GIAB benchmark datasets (hence you'll get false positives but not a lot of false negatives from NanoCaller).

Whereas for indels, it is ideal to remove homopolymer regions from evaluation since Nanopore sequencing is not reliable for homopolymer region (especially R9.4.1 datasets).

Can you try using the evaluation strategy shown here (https://github.com/WGLab/NanoCaller/blob/master/docs/ONT%20Case%20Study.md) which uses RTG-tool's vcfeval function? You can just replace HG002 links with HG004 links from here: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG004_NA24143_mother/NISTv4.2.1/GRCh38/

oranges7 · 2023-05-23T01:25:14Z

Hi,

I tried to use a high confidence interval. The results of rtgtools and hap.py are similar and are consistent with yours. The key is to look at the data with a quality value above 120. Thank you very much for your guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what do you want to fill in the “--chrom” option? I read the instructions and there is no document. #35

what do you want to fill in the “--chrom” option? I read the instructions and there is no document. #35

oranges7 commented Mar 17, 2023

oranges7 commented Mar 17, 2023

umahsn commented Mar 17, 2023

oranges7 commented Mar 17, 2023

oranges7 commented Mar 20, 2023

umahsn commented Mar 22, 2023 •

edited

Loading

oranges7 commented Mar 23, 2023

oranges7 commented Mar 24, 2023

oranges7 commented Mar 31, 2023

oranges7 commented Apr 21, 2023

umahsn commented May 3, 2023

oranges7 commented May 4, 2023

oranges7 commented May 4, 2023

umahsn commented May 4, 2023 via email

oranges7 commented May 5, 2023

oranges7 commented May 10, 2023

oranges7 commented May 10, 2023

umahsn commented May 10, 2023

oranges7 commented May 19, 2023

umahsn commented May 22, 2023

oranges7 commented May 22, 2023

umahsn commented May 22, 2023

oranges7 commented May 23, 2023

what do you want to fill in the “--chrom” option? I read the instructions and there is no document. #35

what do you want to fill in the “--chrom” option? I read the instructions and there is no document. #35

Comments

oranges7 commented Mar 17, 2023

oranges7 commented Mar 17, 2023

umahsn commented Mar 17, 2023

oranges7 commented Mar 17, 2023

oranges7 commented Mar 20, 2023

umahsn commented Mar 22, 2023 • edited Loading

oranges7 commented Mar 23, 2023

oranges7 commented Mar 24, 2023

oranges7 commented Mar 31, 2023

oranges7 commented Apr 21, 2023

umahsn commented May 3, 2023

oranges7 commented May 4, 2023

oranges7 commented May 4, 2023

umahsn commented May 4, 2023 via email

oranges7 commented May 5, 2023

oranges7 commented May 10, 2023

oranges7 commented May 10, 2023

umahsn commented May 10, 2023

oranges7 commented May 19, 2023

umahsn commented May 22, 2023

oranges7 commented May 22, 2023

umahsn commented May 22, 2023

oranges7 commented May 23, 2023

umahsn commented Mar 22, 2023 •

edited

Loading