Skip to content

Commit

Permalink
Create 1000-genome_project.md
Browse files Browse the repository at this point in the history
  • Loading branch information
bclaremar authored Dec 21, 2023
1 parent ffb3c46 commit 3491734
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions docs/databases/1000-genome_project.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# 1000 genomes project

The 1000-genome project is an international collaboration to sequence the genomes of a large number of people. The complete archive is available from NCBI and EBI but downloading this massive quantity of next-gen data is time- and resource-consuming. UPPMAX now has a local copy of the sequencing and index files (BAM, BAI and BAS) as a shared resource.

The main archive is stored at /sw/data/KGP/central. Within this folder, "low" holds the primary dataset with one individual per folder (eg, "HG00096", "NA11831") holding data files for each sequencing technology applied. In the main folder, "high" holds the high-coverage data for a subset of the individuals.

One level up in the file system, /sw/data/KGP/regional holds sequence data for some individual countries outside the 1000-genome project. So far, very little data has been stored but this may be expanded.

Users interesting in any of this data should request membership in the "KGP" group (via [email protected]). This requirement is not intended to restrict the resource in any way but makes it easier to inform interested users of possible changes. Considering the large storage space used, it is possble that the data would need to be reorganized or possibly even reduced in the future, depending of course on the perceived need for the resource by the members of the KGP group.

0 comments on commit 3491734

Please sign in to comment.