Skip to content

GSoC 2024 projects

Tomás Capretto edited this page Feb 9, 2024 · 13 revisions

Getting started

New contributors should first read the contributing guide and read through some of the examples in Bambi's documentation.

To be considered as a GSoC student, you should make a PR to Bambi. It can be something small, like a doc fix or a simple bug fix.

If you are a student interested in participating, please contact us GitHub discussions

Projects

Below is a list of possible topics for your GSoC project, we are also open to other topics, contact us on GitHub discussions

When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project contact us. We will not accept 175h applications from people with whom we haven't discussed their time commitments before applying.

  1. Projection predictive variable selection
  2. Better defaults priors
  3. Support BART

Expected benefits of working on Bambi

Students who work on Bambi can expect their skills to grow in

  • Bayesian Inference libraries such as PyMC
  • Bayesian modeling
  • InfereneData/Xarray usage (depending on the project)
  • PyData stack (NumPy, SciPy, Matplotlib, Pandas, etc.)

Projection predictive variable selection

Projection predictive inference is a method used for variable selection, and it has demonstrated effective performance across various fields. The use provides a reference model, built and fitted with Bambi, with all relevant variables. Then submodels, representing different variable subsets, are automatically created and the reference model's posterior distribution is projected onto these submodels. The smallest submodel with predictions close to the reference model is then selected, providing a balance between simplicity and accuracy in variable selection. Currently, Bambi only supports Projection predictive inference for a few models, for instance, not all families are supported, also hierarchies are not supported. This project aims to expand the width of models that can be used to perform projection predictive inference.

Potential mentors:

  • Osvaldo Martin
  • Tomás Capretto

Info

  • Hours: 350
  • Expected outcome: An expansion of the scope of models suitable for projection predictive inference, not necessarily feature-parity with Bambi's models. One or more notebook examples can be added to Bambi's docs demonstrating the new features and providing useful advice for practitioners.
  • Skills required: Python, statistics
  • Difficulty: Medium

Better defaults priors

Prior distributions for high-dimensional linear regression require specifying a joint distribution for the unobserved regression coefficients, which is inherently difficult. This contrasts with the current default in Bambi that sets independent normals. To overcome these issues, priors such as R2D2 and R2D2M2 have been proposed in the literature. Implementing these prior distributions in Bambi would require the ability of shared priors between terms (issue 687). This project aims to implement these priors in Bambi to allow users to fit high-dimensional linear regression models with more useful default priors. Additionally, there is an open issue 643 to improve default priors for log-link functions with low frequencies. Finally, having the ability to have shared priors would allow Bambi to implement sparsity priors such as the Horseshoe prior and the Regularized Horseshoe prior.

Potential mentors:

  • Tomás Capretto
  • Gabriel Stechschulte
  • Osvaldo Martin
  • Juan Orduz

Info

  • Hours: 350
  • Expected outcome: Better default priors. One or more notebook examples can be added to Bambi's docs demonstrating the new features and providing useful advice for practitioners.
  • Skills required: Python, statistics
  • Difficulty: Medium

Support BART

Bayesian Additive Regression Trees (BART) is a flexible non-parametric approach to linear regression. In a nutshell, an unknown function is approximated by a sum of trees. Prior are used to keep the trees shallow and the leave-values small. In other words each tree a single tree is unable to "learn" the unknown function but the sum of many trees can provide a flexible fit. A probabilistic programming-friendly version of BART is implemented in the package PyMC-BART. This project is about allowing Bambi to fit BART models.

Potential mentors:

  • Osvaldo Martin
  • Tomás Capretto
  • Gabriel Stechschulte

Info

  • Hours: 350
  • Expected outcome: Support for BART-models within Bambi. One or more notebook examples can be added to Bambi's docs demonstrating the new features and providing useful advice for practitioners.
  • Skills required: Python, statistics
  • Difficulty: Medium