Skip to content

Latest commit

 

History

History
25 lines (14 loc) · 3.59 KB

igsr_project.md

File metadata and controls

25 lines (14 loc) · 3.59 KB

#The International Genome Sample Resource Project Description

The International Genome Sample Resource (IGSR) has three main aims.

  1. [Ensure the future usability of the 1000 Genomes reference data;](#Ensuring the future usability of the 1000 Genomes reference data)
  2. [Incorporate published genomic data on the 1000 Genomes samples;](#Incorporate published genomic data on the 1000 Genomes samples)
  3. [Expand the data collection to include new populations.](#Expand the data collection to include new populations)

Here we describe our plan to achieve these aims.

##Ensuring the future usability of the 1000 Genomes reference data

In 2014, the Genome Reference Consortium released an update of the human assembly, GRCh38. This update to the human reference assembly shows a significant improvement the quantity of alternative loci represented. It now contains 178 genomc regions with associated alt loci (2% of chromosome sequence (61.9 Mb)). This is made up from 261 alt loci (containing 3.6 Mb novel sequence relative to chromosomes). The GRC were also able to resolve more than 1000 issues from the previous version of the assembly.

Taking advantage of this alternative loci sequence when identifying variation and calling genotypes is an important step in improving our ability to discover human variation. Currently very few tools can use the alt loc data. IGSR plans to remap the phase 3 1000 Genomes data to GRCh38 in an alt aware manner using the newest version of BWA Mem. This will provide the method development community with a source of alignments that can drive these new methods forward and as well as providing the wider community with upto date alignments ensuring everyone can benefit from the data in the context of the new assembly.

##Incorporate published genomic data on the 1000 Genomes samples

The 1000 Genomes samples have proved a popular resource for molecular phenotyping experiments and investigating ta associations between genetic variation and expression or measurements of epigenetic state. Large datasets have been generated on these samples by project such as GEUVADIS, who generated RNA-Seq data on the 1000 Genomes European samples and the YRI population and ENCODE who have carried out extensive assays on the NA12878 cell line. Many other groups have also conducted assays on the 1000 Genomes samples. The IGSR would like to present all this information in a unified manner so the community can benefit from all the data which exists on these samples.

##Expand the data collection to include new populations

The IGSR recognises that the current 1000 Genomes Project samples do not reflect all populations. An important aim for IGSR is to expand the populations represented in the collection and ensure the available public data represents the maximum possible population diversity. This will ensure the 1000 Genomes dataset remains a valuable open resource for the community over the next five years. The IGSR will work with the groups who were unable to contribute samples to the 1000 Genomes Project before it finished sample collection and investigate collaborations with other groups to ensure the population diversity gaps are filled.

If you have any questions about any of our plans please email [email protected].