Skip to content

Latest commit

 

History

History
49 lines (31 loc) · 7.07 KB

phase1-analysis-results-directory.md

File metadata and controls

49 lines (31 loc) · 7.07 KB
layout title permalink
single_section
Phase1 Analysis Results
/phase1-analysis-results-directory/

Phase1 Analysis Results

This page describes the Phase1 analysis results directory [EBI|NCBI]

This directory contains files associated with the variant calling carried out for the phase1 of the 1000 genomes project and other ancillary files associated with the analysis for phase1.

The phase1 analysis results directory contains a number of sub directories with different content. These are listed here.

Ancestry Deconvolution [EBI|NCBI]

This directory contains information about the local ancestry inference which has been carried out on the ad-mixed populations found in the 1000 genomes phase1 samples. These are the African Americans (ASW), Colombians (CLM), Mexicans (MXL) and Puerto Ricans (PUR).

Consensus Call Sets [EBI|NCBI]

These directories contain the consensus call sets and genotype likelihoods which were used to produce the final integrated release. Please note the indel file in this directory still contains indels which were subsequently filtered out of our integrated data release due to validation efforts. These can be identified by looking at the excluded_indel_sites list [EBI|NCBI].

Experimental Validation [EBI|NCBI]

This directory contains information about which sites were validated for the different variant types and the results of the validation processes.

Functional Annotation [EBI|NCBI]

This contains two directories, annotation_sets contains bed and gtf files which describe the gene and non coding annotation which our variant sets were compared with and annotation_vcfs that contains the actual variant annotation in vcf format.

Input Call Sets [EBI|NCBI]

This directory contains all the union call sets for the snps (both low coverage and exome), indels and deletions that make up the integrated release. The directory contains several vcf files in each file any variant whose filter column reads PASS should be part of the integrated release.

Integrated Call Sets [EBI|NCBI]

This directory contains our final variant calls for the phase1 data sets. The majority of the data in this directory is identical to what can be found in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521 but there are also chrY calls for snps and deletions and chrMT calls for snps found here.

Supporting [EBI|NCBI]

  • accessible_genome_masks, Mask files defining which regions of the genome are more or less accessible to the next generation methods used by the 1000 Genomes Project [EBI|NCBI]
  • ancestral_alignments, Ancestral fasta files based on a 32 way alignment from Ensembl 59 based on the Enredo Pecan Ortheus pipeline [EBI|NCBI]
  • axiom_genotypes, Genotypes from the Affymetrix Axiom platform for 1000 genomes samples [EBI|NCBI]
  • cosmic_hgmd_overlap, Ovelaps between the phase1 integrated results and the cosmic and hgmd databases [EBI|NCBI]
  • cryptic_relation_analysis, The results of the Cryptic Relatedness Analysis performed by Jim Nemesh at the Broad Insititute [EBI|NCBI]
  • excluded_indel_sites, The list of indels which were excluded from the v3 integrated variant release [EBI|NCBI]
  • exome_pull_down, The target coordinates used for both variant calling and the downstream analysis of the exome data [EBI|NCBI]
  • omni_haplotypes, Genotypes from the Illumina Omni 2.5M Chip for 1000 genomes individuals [EBI|NCBI]
  • variant_gerp_scores, Conservation scores for all snp and indel variant sites[EBI|NCBI]