Skip to content

Bayesian nonparametrics

Christopher Paciorek edited this page Jun 28, 2018 · 64 revisions

Overview document describing various BNP models and samplers.

Overview document tex .tex file for document describing various BNP models and samplers.

Todo items for DP-related work:

  • finish nonconjugate dCRP sampler
    • need MCMC configuration to recognize this case and assign sampler (Chris)
    • finalize sampler (Claudia)
    • allow zero intermediate nodes (Claudia / Chris)
    • what BUGS code could a user write that would make the sampler illegitimate (Claudia)
    • allow upper bound on number of components
    • discuss inefficiency of sampling all tilde variables even though most have no observations
    • only compute curLogProb for unique thetas (Claudia, I think this is the same as next bullet, right?)
    • (HIGH) add new (more efficient?) sampler for CRP that iterates over unique labels instead of n labels.
    • discuss 'links' version of CRP - see Johnson and Sinclair - Environmetrics. 2017;28:e2440 "Modeling joint abundance of multiple species using Dirichlet process mixtures"
    • (HIGH) identify conjugacy for variance in a model with non-deterministic nodes (Claudia, could you check if this is working now and if so, just remove this bullet? I forget exactly what this refers to)
  • initial conjugate dCRP sampler (i.e., marginalized with respect to new component)
    • determine conjugacy in setup code, including for zero intermediate nodes (Chris)
    • determine conjugacy for normal-inverse gamma prior in normal model (Claudia, with Chris possibly writing the bivariate distribution as a nimble distribution)
    • any other bivariate priors that are commonly used that we should determine conjugacy for?
    • use conjugacy determination to develop conjugate sampler run code for a few examples (Claudia). List of examples:
      • normal sampling with unknown normal mean and known variance: dCRP_conjugate_dnorm_dnorm
      • Poisson sampling with unknown gamma rate: dCRP_conjugate_dgamma_dpois
      • Bernoulli sampling with unknown Beta probability: dCRP_conjugate_dbeta_dbern
      • Multinomial sampling with unknown Dirichlet probability: dCRP_conjugate_ddirch_dmulti
      • exponential sampling with unknown gamma rate, known shape: dCRP_conjugate_dgamma_dexp
      • gamma sampling with unknown gamma rate, both shapes known: dCRP_conjugate_dgamma_dgamma
    • immediately sample new component parameters when create new component (Claudia)
    • try to reuse setup code and run code so we don't duplicate code (Chris)
    • ask Perry if using model[ something ][i] or model$values(something) makes a difference in efficiency
    • (HIGH) add non-identity-link conjugacy - will likely need variations on 'offset' and 'coeff' from our conjugate sampler setup to allow us to handle things like y~dnorm(c*thetatilde[xi[i]] + x[i]*beta, 1) and correctly account for 'c' and x[i]*beta (Chris to help get this started)
  • stickbreaking approach
    • determine syntax for stickbreaking and write stickbreaking function (Claudia)
      • determine possible NaN situations
    • detect conjugacy in this setting (Chris)
    • clean up conjugacy for this setting (Chris)
  • (HIGH) compare speed and mixing of Nimble BNP to one or two other popular BNP (e.g., DPpackage) packages
  • (HIGH) testing
    • compare results from stick-breaking and CRP on same model (or ideally a couple models) (Claudia, is this already basically done in the testing
    • we might also think of good test cases where we know the right answer (perhaps models Abel has fit previously using his own code)
  • Generalized Dirichlet (Claudia is Pitman-Yor an example of this? Abel and I discussed that Pitman-Yor would be a good extension to add soon)
    • write GenDirichlet distributions (Claudia)
    • add GenDirichlet conjugacy (Chris)
  • CRP distribution
    • determine possible NaN situations (Claudia)
    • write help (Claudia)
  • standardized output for G when using dCRP (input posterior modelValues and augment with columns for weights and atoms). (Claudia / Nick / Chris)
    • figure out how a user will call this (current rough plan is to have an R function that user calls with that R function using a stand-alone nimbleFunction that sets number of columns in a matrix (not a modelValues)) (Chris / Claudia / Abel / other nimble-devs)
    • handle multivariate clusters - e.g. mixtures of dmnorm components
  • write help for BNP sampler with some examples.
  • write a quasi conjugate sampler for the "conc" parameter when a gamma distribution is assumed
    • write the sampler (Claudia)
    • write help with some examples (Claudia)
    • automatically assign this sampler (Claudia / Chris)
  • fully marginalized dCRP sampler for conjugate models
    • write sampler
    • write help
    • add sampling for tilde variables only every 'thin' iterations if monitoring for them requested
    • determine conjugacy for normal-inverse gamma prior in normal model
  • more complicated cases (partial list of possibly high priority - Claudia/Abel, please add to as you think about this)
    • HDP
    • (possibly) NRMs with slice sampler (Italian school); Abel is checking with researchers about whether they have C/C++ code we might use or at least learn from
    • ... Chris is not sure where DDP and other structures fit in this list ...
Clone this wiki locally