Species

arabidopsis
1. 1001 Genomes
2. Reference
3. 1001 genomes has the full genomes for 1100 strains so might not need the reference (132 Gb)
c elegans
1. CeNDR
2. reference
3. VCF is about 2 Gb but it is possible to download all alignment data (not sure of the size)
humans
1. 1000 Genomes
2. reference
3. Lots of options - not sure which files to use
4. over 3000 humans in total over the three studies in 1000G
mouse - 17 sequences , Paper for another dataset - dataset , reference
1. Mouse Genome Project
2. ftp download site: ftp://ftp-mouse.sanger.ac.uk/
3. reference
4. 21 Gb for the variants (REL-1505-SNPs_Indels)
yeast

strain issues - Is it a good idea to assume all strains as a single population? If not, we might not have strong results for yeast and arabidopsis

How to use this data - All of these contain VCF files (variant crossed with population), thus the actual sequence is not given. What we can do is: 1. Acquire the promoter regions for each organism 2. Find the variants within the region 3. Get the monomorphic parts of the sequence from the reference genome 4. Calculate H values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

organism_data_sources.md

organism_data_sources.md

Species

Files

organism_data_sources.md

Latest commit

History

organism_data_sources.md

File metadata and controls

Species