Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR : Unable to open fasta file for chr22 (mm10 | WGS) #124

Open
Matheo-Lode opened this issue Jan 19, 2023 · 4 comments
Open

ERROR : Unable to open fasta file for chr22 (mm10 | WGS) #124

Matheo-Lode opened this issue Jan 19, 2023 · 4 comments

Comments

@Matheo-Lode
Copy link

Hello,

I am using your tool to estimate CNVs in my data, both Whole exome (WES) and Whole Genome (WGS) from the same sample of the same species.
My WES samples worked just fine, but I am having the same error over and over again for the WGS data.

tail of the log :

..File /MyPath/chrNameLength.txt was read
         total genome size:     2.73087e+09
..samtools should be installed to be able to read BAM files
         read number:   761375917
         coefficientOfVariation:        0.05
         evaluated window size: 1435
..[genomecopynumber] Starting reading /myPath/GL261_luc.chr.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view -@ 1 myPath/GL261_luc.chr.bam
PROFILING [tid=139915466331968]: myPath/GL261_luc.chr.bam read in 1028 seconds [fillMyHash]
761375917 lines read..
39537513 reads used to compute copy number profile
printing counts into myPath/GL261_luc.chr.bam_sample.cpn
..Window size:  1435
..using GC-content to normalize copy number profiles
Unable to open fasta file for chr 22 in folder myPath/chromosomes

Please note, myPath/chromosomes should be a folder, not a file! 

Working with mm10, there is indeed no chr22 (only 1:19, X,Y,M) and I don't understand what I am doing wrong in my config file.

Config file :

[general]

##parameters chrLenFile and ploidy are required.

chrLenFile = myPath/chrNameLength.txt
ploidy = 2

##Parameter "breakPointThreshold" specifies the maximal slope of the slope of residual sum of squares.
##This should be a positive value. The closer it is to Zero, the more breakpoints will be called. Its recommended value is between 0.01 and 0.08.

breakPointThreshold = 0.8

##Either coefficientOfVariation or window must be specified for whole genome sequencing data. Set window=0 for exome sequencing data.

coefficientOfVariation = 0.05

##Either chrFiles or GCcontentProfile must be specified too if no control dataset is available.
##If you provide a path to chromosome files, Control-FREEC will look for the following fasta files in your directory (in this order):
##1, 1.fa, 1.fasta, chr1.fa, chr1.fasta; 2, 2.fa, etc.
## Please ensure that you don't have other files but sequences having the listed names in this directory.
chrFiles = myPath/chromosomes

##if you are working with something non-human, we may need to modify these parameters:
minExpectedGC = 0.37
maxExpectedGC = 0.47

numberOfProcesses = 4
outputDir = myPath

##If the parameter gemMappabilityFile is not specified, then the fraction of non-N nucleotides per window is used as Mappability.

##set BedGraphOutput=TRUE if you want to create a BedGraph track for visualization in the UCSC genome browser:
BedGraphOutput=TRUE

[sample]

mateFile = myPath/GL261_luc.chr.bam
inputFormat = BAM
mateOrientation = 0

##use "mateOrientation=0" for sorted .SAM an .BAM

[BAF]

##use the following options to calculate B allele frequency profiles and genotype status.

[target]

##use a tab-delimited .BED file to specify capture regions (control dataset is needed to use this option):

Finally, I have already checked that :

  • The chrLenFile and the sample .bam have only the good chrNames (No chr22).
  • The folder myPath/chromosomes contains only the .fa matching with chrLenFile
  • my WGS config file is exactly the same that my WES one except for the sample .bam, the target bed (absent for WGS) and the window (window=0 for WES, coefficientOfVariation=0,05 for WGS).

Do you have any idea on how to resolve this error ?

@LeilyR
Copy link

LeilyR commented Mar 21, 2023

hey, did you solve this issue?

@Matheo-Lode
Copy link
Author

Hello, unfortunately not, I ran out of ideas on how to solve this.
I am willing to try out any solution !

@LeilyR
Copy link

LeilyR commented Mar 22, 2023

For me the issue was some missing contig in the list of fasta files, I used the first 2 columns of samtools faidx output to make the chromosome length file , before splitting the ref fasta by chromosome, that solved the issue. Also I removed any other file from the folder I had my .fa files

@DzmitryGB
Copy link

I am seeing a similar issue using v11.6 from bioconda and GRCh38 genome.
I suspect the chromosome names are changed somewhere. In my case, the error is "Unable to open fasta file for chr 1_KI270762v1_alt in folder". However, the contig is named "chr1_KI270762v1_alt" in the original fasta file (note the underscore between 'chr' and '1'. Moreover, the contig is renamed in the .cpn file to "1_KI270762v1_alt". So it could be a wrong assumption about chromosome naming in the freeec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants