To install the R packages needed for our analysis you can run the following:
"readr", "dplyr", "tidyr", "purrr", "lubridate",
"uwot", "igraph", "mvtnorm", "rio", "stringr",
"ggh4x", "patchwork", "ggraph", "mclust",
"archetypes", "dbscan", "survival", "ggflowchart",
"reshape2", "scales", "lme4", "lmerTest", "meta",
"broom", "broom.mixed", "MGMM", "dcurves", "Rtsne",
"kernlab", "ClustOfVar", "GGally", "ggdensity", "glmnet"
On a typical desktop computer this should take around 1 hour.
A demo showing how to run our clustering analysis on a simulated dataset can be found here. This uses the same functions we used in our analysis. The simulated dataset is derived from the parameters of the profiles we identified, and will have the same format that we used in our original analysis. The parameters of the profiles are stored in the R object validclusmod.RData. We also show in the demo how to calculate the probabilities to have the profiles we identified for a new set of individuals, using the simulated data as an example. This is useful to map new cohorts to these profiles.
Due to data access restrictions, we cannot include the data necessary to reproduce all the quantitative results in the manuscript. However, it is possible to run the same analysis in a new dataset provided it has the same format as we used. The simulated dataset from the demo shows how this should look like. Additionally, see 01_Clustering.Rmd and 06_ValClusOutcomes.Rmd for a detailed explanation of the formats expected and the functions used.