-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what do you want to fill in the “--chrom” option? I read the instructions and there is no document. #35
Comments
I want to call on all chromosomes. How should I write them? |
Hi, What version of NanoCaller are you using? From v3 onwards the default it all chromosomes. For versions before that you needed a separate command NanoCaller_WGS for whole genome. |
Hi, I should be using the latest version, because I installed it with bioconda this morning, but it requires-- chrom usage: NanoCaller [-h] [-mode MODE] [-seq SEQUENCING] [-cpu CPU] [-mincov MINCOV] [-maxcov MAXCOV] [-keep_bam] [-o OUTPUT] [-prefix PREFIX] I want to calling on the whole genome, how should I do it? |
Hi, If possible, can you give me the code of the training process? I want to use more data to train the model. |
Hi, It is possible the conda is still installing the older version by default. You can specify a version for conda to install: If you are using NanoCaller v3 and above, please use If you want to use the NanoCaller version below 3, which it seems like you have installed, please use But in version 3 of NanoCaller, we have combined the functionality into just one entry point so now in v3+, We will release the training code in the next week or two. I will post an update here when I do so. |
Hi, Thank you very much for your reply. I have updated the version. Thank you for your patient answer. I look forward to your training code. |
Hi, I encountered another problem, snp calls will be completed directly, generating an empty file. Because I'm running other programs, is it because I don't have enough memory? How can I correct it? Looking forward to your reply. 2023-03-24 15:58:07.022608: Starting NanoCaller. NanoCaller command and arguments are saved in the following file: /media/usb3/gyc/vl/nano_hg004_30x/args SNP Calling Progress: 0%| | 0/6362 [00:38<?, ?it/s] 2023-03-24 15:59:01.451540: Combining SNP calls. 2023-03-24 15:59:01.453847: Compressing and indexing SNP calls. |
Hi, I changed the server and can run normally, but why is the result compared with the benchmark file with hap.py, the final result will be very bad? NanoCaller --bam /home/gyc/ont/fastq/HG001_NBT2018_Guppy_4.2.2.fastq.gz.bam Looking forward to your reply. |
Hi, I am still looking forward to your process of raising features and training models. Thank you very much for your hard work. |
Hi, Sorry for the delay, we are working on releasing the training code, and will need a couple of weeks to get it done because we are trying to finish some other projects at the moment. Best, |
Hi, Thank you for your reply in your busy schedule! 2023-05-04 09:05:21.702487: Starting NanoCaller. NanoCaller command and arguments are saved in the following file: /media/usb3/gyc/vl/nano_try/args SNP Calling Progress: 0%| | 0/6362 [00:30<?, ?it/s] 2023-05-04 09:05:54.798367: Combining SNP calls. 2023-05-04 09:05:54.801669: Compressing and indexing SNP calls. 2023-05-04 09:05:54.975464: SNP calling completed. Time taken= 30.9631 The steps “snp calling” when I ran nanocalller did not seem to run and were completed directly. Do you know what the possible reason is? |
Hi, I changed the server and ran it with the following error. 2023-05-04 10:56:38.121256: Starting NanoCaller. NanoCaller command and arguments are saved in the following file: /mnt/ywenjing/gyc/hg004/out/args SNP Calling Progress: 39%|????????????????????????????????? | 2493/6362 [29:50<29:41, 2.17it/s]Process Process-8: “”” “”” |
It seems like the problem is that the BAM file has different names for
chromosomes than the reference FASTA file. Can you check the header of the
BAM file and see if the reference chromosome names agree with the ones in
FASTA file or the FASTA index file?
…On Wed, May 3, 2023, 10:53 PM oranges7 ***@***.***> wrote:
Hi,
I changed the server and ran it with the following error.
“””
(nano) ***@***.*** hg004]$ NanoCaller --bam hg004_v4_30x.bam
--ref GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --output
/mnt/ywenjing/gyc/hg004/out/ --cpu 20 --preset ont
2023-05-04 10:56:38.121256: Starting NanoCaller.
NanoCaller command and arguments are saved in the following file:
/mnt/ywenjing/gyc/hg004/out/args
SNP Calling Progress: 39%|????????????????????????????????? | 2493/6362
[29:50<29:41, 2.17it/s]Process Process-8:
Traceback (most recent call last):
File
"/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py",
line 315, in _bootstrap
self.run()
File
"/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py",
line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/snpCaller.py",
line 80, in caller
pos, test_ref, x_test, dp, freq, coverage =
get_snp_testing_candidates(params, chunk)
File
"/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/generate_SNP_pileups.py",
line 136, in get_snp_testing_candidates
r=ref_dict[v_pos]
KeyError: 21950001
“””
Then it will quit snp calling and run indel calling, but will also report
an error.
“””
Process Process-31:
Traceback (most recent call last):
File
"/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py",
line 315, in _bootstrap
self.run()
File
"/home/ywenjing/anaconda3/envs/nano/lib/python3.8/multiprocessing/process.py",
line 108, in run
self._target(*self._args, **self._kwargs)
File
"/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/indelCaller.py",
line 200, in caller
indel_run(params, indel_dict, job_Q, counter_Q, indel_files_list)
File
"/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/indelCaller.py",
line 59, in indel_run
pos, x0_test, x1_test, x2_test, alleles_seq, phase =
get_indel_testing_candidates(params, chunk)
File
"/home/ywenjing/anaconda3/envs/nano/bin/nanocaller_src/generate_indel_pileups.py",
line 174, in get_indel_testing_candidates
ref_dict={j:s.upper() if s in 'AGTC' else 'N' for j,s in
zip(range(max(1,start-200),end+400+1),fastafile.fetch(chrom,max(1,start-200)-1,end+400))
}
File "pysam/libcfaidx.pyx", line 301, in pysam.libcfaidx.FastaFile.fetch
KeyError: "sequence 'chr14' not present"
“””
Is it because of my bam file problem? I use HG004 subsampled to 30x.
—
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIRI4S3TXJOYM5AAYEWFT5LXEMR3LANCNFSM6AAAAAAV57VKMI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes, I re-checked the index file and found that the file was corrupted. I fixed the file and the program could run normally. But the file on the other server is intact, but it doesn't work properly. I'm looking for a solution. |
Hi, |
Another question is, is the v_pos in the generate_indel_pileups.py file the last detected site? |
Hi, I have added the training code here: https://github.com/WGLab/NanoCaller/tree/master/misc/training In the inference code, v_pos is the current position being scanned, |
Hi, I used nanocaller to run out of results, using hap.py for analysis, found that the precision is very poor. I am using Guppy4.2.2-based hg004 data, down to 50x, using the following command |
Hi, Can you give some information about how you are evaluating the performance, such as: 1) what benchmark dataset are you using? 2) are you using a high confidence interval for evaluation? 3) are you separating SNP and indel performance? If yes, is the precision poor for SNP or indels? Using --exclude_bed with hg19/hg38 parameter will skip telomere and centromere regions from variant calling so that could improve precision, but usually high confidence intervals in benchmarking dont include these regions for evaluation anyway. So it depends upon whether you are using a high confidence interval for evaluation or not. You can also try to increase '--min_allele_freq' parameter to 0.2 or 0.25 (instead of default of 0.15), and |
Hi,
Thank you for your reply. |
You should use a high confidence interval such as: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG004_NA24143_mother/NISTv4.2.1/GRCh38/HG004_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed. Without high-confidence interval, you can end up with regions that do not have reliable SNP calls, and SNP calls from short-read sequencing for such regions are typically removed from GIAB benchmark datasets (hence you'll get false positives but not a lot of false negatives from NanoCaller). Whereas for indels, it is ideal to remove homopolymer regions from evaluation since Nanopore sequencing is not reliable for homopolymer region (especially R9.4.1 datasets). Can you try using the evaluation strategy shown here (https://github.com/WGLab/NanoCaller/blob/master/docs/ONT%20Case%20Study.md) which uses RTG-tool's vcfeval function? You can just replace HG002 links with HG004 links from here: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG004_NA24143_mother/NISTv4.2.1/GRCh38/ |
Hi, I tried to use a high confidence interval. The results of rtgtools and hap.py are similar and are consistent with yours. The key is to look at the data with a quality value above 120. Thank you very much for your guidance. |
No description provided.
The text was updated successfully, but these errors were encountered: