Create 1000-genome_project.md

UPPMAX · Dec 21, 2023 · 3491734 · 3491734
1 parent ffb3c46
commit 3491734
Showing 1 changed file with 9 additions and 0 deletions.
diff --git a/docs/databases/1000-genome_project.md b/docs/databases/1000-genome_project.md
@@ -0,0 +1,9 @@
+# 1000 genomes project
+
+The 1000-genome project is an international collaboration to sequence the genomes of a large number of people. The complete archive is available from NCBI and EBI but downloading this massive quantity of next-gen data is time- and resource-consuming. UPPMAX now has a local copy of the sequencing and index files (BAM, BAI and BAS) as a shared resource.
+
+The main archive is stored at /sw/data/KGP/central. Within this folder, "low" holds the primary dataset with one individual per folder (eg, "HG00096", "NA11831") holding data files for each sequencing technology applied. In the main folder, "high" holds the high-coverage data for a subset of the individuals.
+
+One level up in the file system, /sw/data/KGP/regional holds sequence data for some individual countries outside the 1000-genome project. So far, very little data has been stored but this may be expanded.
+
+Users interesting in any of this data should request membership in the "KGP" group (via [email protected]). This requirement is not intended to restrict the resource in any way but makes it easier to inform interested users of possible changes. Considering the large storage space used, it is possble that the data would need to be reorganized or possibly even reduced in the future, depending of course on the perceived need for the resource by the members of the KGP group.