Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle chromosome names? #41

Open
janaobsteter opened this issue Nov 9, 2023 · 2 comments
Open

How to handle chromosome names? #41

janaobsteter opened this issue Nov 9, 2023 · 2 comments
Assignees

Comments

@janaobsteter
Copy link
Collaborator

Currently, we get chromosome numbers from the config file - and then we define a loop over range(1:nChromosomes+1). But what if we have non-numeric chromosomes in there, like other contigs or mitochondrial genome?

@gregorgorjanc
Copy link
Member

Maybe follow what stdpopsim does?

@hannesbecher
Copy link
Contributor

I think it would be useful to have a text file with chromosome names and lengths. See the genome file format used by bedtools. This has one chromosome per line, a tab, and the chromosome's length:

$ cat my.genome
chr1  1000
chr2  500

Should a genome file be generated as part of this pipeline?

This would be easy if the entry point was one multi-chromosome VCF file. The file could be parsed and each chromosome's highest variant position could be used as the chromosome length. It would also be easy if a genome FASTA file was available.
But it could be tricky if the entry point is multiple VCF files.

Alternatively, we might require the genome file as an additional input, and we could supply a script to generate such a file from VCF/genome FASTA.

Opinions? @gregorgorjanc @gmafrafortuna @janaobsteter

Generally, Stdpopsim sounds good, but we may want to run this pipeline also on small test datasets and organisms that are not on stdpopsim ATM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants