Often in bioinformatics we want a list of genes so that we can ask, "are genes in this list more X than other genes?" or "are genes in this list enriched in this other list?" and so on. There are many useful lists out there, but many of them are in an Excel file supplement to a paper, or an XML format with loads of other info you don't need, or use outdated gene symbols. For one reason or another, it often takes a lot of work to wrestle them into a format you can use. This repository is the MacArthur Lab's effort to collect all the lists we find useful into one place, with each formatted as just a single-column text file listing the current gene symbols.
Here is a guide to the lists we currently have in this repo:
List | Count | Description | Please cite |
---|---|---|---|
Universe | 19,194 | Approved symbols for 18,991 protein-coding genes according to HGNC as of Feb 9, 2015. For details see src/create_universe.bash. This list is the "universe" of which all subsequent lists are subsets. | See genenames.org/about/overview. Users are asked to web reference "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" (http://www.genenames.org/) if possible. |
FDA-approved drug targets | 385 | Genes whose protein products are known to be the mechanistic targets of FDA-approved drugs (updated 2018-09-13). For details on the exact criteria we used for inclusion in this list, see src/drug_targets.py | See drugbank.ca/about. Please cite [Law 2014, Knox 2011, Wishart 2008, Wishart 2006, and/or Wishart 2018]. |
Drug targets by Nelson et al 2012 | 201 | Drug targets according to Nelson et al 2012, with reference to Russ & Lampel 2005. | [Nelson 2012, Russ & Lampel 2005] |
Autosomal dominant genes by Blekhman et al 2008 | 307 | OMIM disease genes deemed to follow autosomal dominant inheritance according to extensive manual curation by Molly Przeworski's group. | [Blekhman 2008] |
Autosomal dominant genes by Berg et al 2013 | 631 | OMIM disease genes (as of June 2011) deemed to follow autosomal dominant inheritance according Berg et al, 2013. | [Berg 2013] |
Autosomal recessive genes by Blekhman et al 2008 | 527 | OMIM disease genes deemed to follow autosomal recessive inheritance according to extensive manual curation by Molly Przeworski's group. | [Blekhman 2008] |
Autosomal recessive genes by Berg et al 2013 | 1073 | OMIM disease genes (as of June 2011) deemed to follow autosomal recessive inheritance according Berg et al, 2013. | [Berg 2013] |
X-linked genes by Blekhman et al 2008 | 66 | OMIM disease genes deemed to follow X-linked inheritance (dominant/recessive not specified) according to extensive manual curation by Molly Przeworski's group. | [Blekhman 2008] |
X-linked recessive genes by Berg et al 2013 | 102 | OMIM disease genes (as of June 2011) deemed to follow X-linked recessive inheritance according Berg et al, 2013. | [Berg 2013] |
X-linked dominant genes by Berg et al 2013 | 34 | OMIM disease genes (as of June 2011) deemed to follow X-linked dominant inheritance according Berg et al, 2013. | [Berg 2013] |
X-linked ClinVar genes | 61 | X chromosome genes in the August 6, 2015 ClinVar release that have at least 3 reportedly pathogenic, non-conflicted variants in ClinVar with at least one submitter other than OMIM or GeneReviews. Code here. | Cite the ClinVar paper [Landrum 2014] |
All dominant genes | 709 | Currently the union of the Berg and Blekhman dominant lists, may add more lists later. | [Blekhman 2008, Berg 2013] |
All recessive genes | 1183 | Currently the union of the Berg and Blekhman recessive lists, may add more lists later. | [Blekhman 2008, Berg 2013] |
Homozygous LoF tolerant | 330 | Genes with at least two different high-confidence LoF variants found in a homozygous state in at least one individual in ExAC. By Konrad Karczewski. | Just cite the ExAC paper [Lek 2016] |
Essential in culture | 283 | Genes deemed essential in multiple cultured cell lines based on shRNA screen data | [Hart 2014] |
Essential in culture (CRISPR screening) | 683 | Genes deemed essential in multiple cultured cell lines based on CRISPR/Cas screen data | [Hart 2017] |
Non-essential in culture (CRISPR screening) | 913 | Genes deemed non-essential in multiple cultured cell lines based on CRISPR/Cas screen data | [Hart 2017] |
Essential in mice | 2,454 | Genes where homozygous knockout in mice results in pre-, peri- or post-natal lethality. The mouse phenotypes were reported by Jackson Labs [Blake 2011], then essential gene list was extracted via manual review of phenotypes by [Georgi 2013], and the essential/non-essential flag was put into dbNSFP [Liu 2013]. We extracted the genes from dbNSFP. | [Blake 2011, Georgi 2013, and Liu 2013] |
Genes nearest to GWAS peaks | 6,336 | Closest gene to GWAS hits with P < 5-e8 in the NHGRI GWAS catalog (MAPPED_GENE column) as of Sep 13, 2018 | [MacArthur 2017] |
DNA Repair Genes, WoodRD | 178 | An updated inventory of human DNA repair genes. (Last modified on Tuesday 15th April 2014). For details see src/DRG_WoodRD.R | Cite [Wood 2005] and include a web reference to this URL. |
DNA Repair Genes, KangJ | 151 | Supplementary Table 1. 151 DNA repair genes. DNA repair genes from DNA repair pathways: ATM, BER, FA/HR, MMR, NHEJ, NER, TLS, XLR, RECQ, and other. | Cite [Kang 2012] |
ClinGen haploinsufficient genes | 294 | Genes with sufficient evidence for dosage pathogenicity (level 3) as determined by the ClinGen Dosage Sensitivity Map as of Sep 13, 2018 | Cite [Rehm 2015]. See also ClinGen's TOU |
Olfactory receptors | 371 | Olfactory receptors from the Mainland 2015's data release | Mainland 2015 |
Genes with any disease association reported in ClinVar | 3078 | Using this simple script, downloaded the ClinVar tab-delimited summary as of May 12, 2015, and took all gene symbols for which there is at least one variant with an assertion of pathogenic or likely pathogenic in ClinVar. | Cite the ClinVar paper [Landrum 2014] |
Kinases | 347 | From UniProt's pkinfam list | [UniProt Consortium 2018], and also according to UniProt this list is based on 3 publications: [Hunter 2000, Manning 2002, Miranda-Saavedra & Barton 2007] |
GPCRs from guidetopharmacology | 391 | GPCR list from guidetopharmacology.org | Citing instructions here — for GPCRs, cite [Alexander 2017 & Harding 2018]. |
GPCRs from Uniprot | 756 | This query of the Uniprot database | [UniProt Consortium 2018] |
GPCRs all | 759 | Union of the above two lists | See previous two entries |
Natural product targets | 37 | List of hand-curated targets of natural products from supplement of [Dancik 2010] | [Dancik 2010] |
BROCA - Cancer Risk Panel | 66 | BROCA is useful for the evaluation of patients with a suspected hereditary cancer predisposition, with a focus on syndromes that include breast or ovarian cancer as one of the cancer types. Depending on the causative gene involved, these cancers may co-occur with other cancer types (such as colorectal, endometrial, pancreatic, endocrine, or melanoma). | University of Washington |
ACMG V2.0 | 59 | The minimum list of genes to be reported as incidental or secondary findings as published by the American College of Medical Genetics and Genomics (ACMG) | [Kalia 2017] |
GPI-anchored proteins | 135 | Gene symbols encoding proteins annotated by UniProt as being GPI-anchored. | Cite the latest UniProt paper: [UniProt Consortium 2017] |
We welcome pull requests for adding additional lists, provided they are licensed for redistribution. If possible, please provide the source code used to extract the list from its original source, and an appropriate description for this readme.