Skip to content
View diazale's full-sized avatar

Block or report diazale

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
diazale/README.md

🧮🧬 Alex Diaz-Papkovich, PhD 🧬🧮

I'm a statistician and data scientist. I'm currently at Brown University working as a postdoctoral research associate at the Data Science Institute with Sohini Ramachandran. My PhD work was at McGill University in Quantitative Life Sciences with Simon Gravel, where I studied topological data analysis methods for genetic data. You can find my published research on Google Scholar.

I also enjoy collecting data on a variety of topics. Some of my side-projects include tracking the length of the Rideau Canal skating season and collecting news stories of traffic violence.

Some of my academic research:

Non-linear dimensionality reduction for visualizing population genetic data

UMAP is an efficient method to visualize biobank data. You can find structure in your data (i.e. population structure) related to factors like demographic history or biobank sampling methodology. When you colour in the visualizations with other data, like geography or phenotypic measures, you can see lots of patterns and study them further. You can also work in 3D and get creative, doing stuff like converting UMAP's $(x,y,z)$ coordinates to RGB positions to create colour maps.

Paper: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts, Diaz-Papkovich et al, PLoS Genetics, 2019.

Related Github repositories:

Stratification of biobank data

Though UMAP tends to generate clusters, it is not a clustering algorithm. To extract clusters from UMAP data, we use a density-based method called HDBSCAN. We can use this for stratification to get a better grasp of the population structure in our data, study how methods like polygenic scores transfer between populations, and do QC on biobank data.

Preprint: Topological stratification of continuous genetic variation in large biobanks, Diaz-Papkovich et al, bioRxiv, 2023.

Related Github repositories:

Pinned Loading

  1. topstrat topstrat Public

    Genotype dimension reduction and clustering research. Code for manuscript "Topological stratification of continuous genetic variation in large biobanks"

    Roff 4

  2. 1KGP_dimred 1KGP_dimred Public

    Interactive demonstration of how to use PCA, t-SNE, and UMAP on genotype data from the Thousand Genome Project.

    HTML 19 4

  3. gt-dimred gt-dimred Public

    Genotype dimension reduction research. Code for manuscript "UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts"

    Jupyter Notebook 18 5

  4. hgss_regression_workshop hgss_regression_workshop Public

    A workshop on introductory linear regression in R developed for graduate students in human genetics. Covers the basics of the concept, its statistical foundation, and some R code to illustrate it.

    HTML 2 1

  5. death_by_car death_by_car Public

    Tracking collisions between vehicles and pedestrians/cyclists in Canada.

    R 1

  6. dimension_reduction_workshop dimension_reduction_workshop Public

    A collection of code to teach dimension reduction in the context of biological sciences

    HTML 3