Skip to content

Commit

Permalink
Merges in Draft code (#1)
Browse files Browse the repository at this point in the history
* Begin new RandALO API

* Adds examples with API design

* Adds the modeling layer

* Work on reductions

* Environment setup

* Better imports

* Begin ALO impl

* First pass at ALO impl

* Adds precommit

* Fixes examples

* starts work on jacobian

* Bug fixes for RandALO

* Adds first part of the Jacobian expressions

* Adds Jacobian operator

* Added truncnorm tests

* Added utils.py tests

* Basic randalo.py test

* Fixes `diag` property -> method

* More randalo.py tests

* Fixed test rng

* Updated to Ruff formatter

* Fixes up the Jacobian

* Loss functions now reduce themselves

* + cvxpylayers dep

* Begins sklearn->model impl

* scikit-learn integration for regression

* scikit-learn example

* Adds demo to README

* Added logistic regression support

* Added logistic regression example

* Fix ABC

* Cut at cleaning up the code

* Refactor

* Added generic Jacobian example to README

* Adds workflow file

* Misc finishing work

---------

Co-authored-by: Daniel LeJeune <[email protected]>
  • Loading branch information
PTNobel and dlej authored Sep 15, 2024
1 parent 07a76f6 commit 047264b
Show file tree
Hide file tree
Showing 24 changed files with 2,531 additions and 182 deletions.
49 changes: 49 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Publish Python 🐍 distribution 📦 to PyPI

on: push

jobs:
build:
name: Build distribution 📦
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.x"

- name: Install pypa/build
run: >-
python3 -m
pip install
build
--user
- name: Build a binary wheel and a source tarball
run: python3 -m build
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
name: python-package-distributions
path: dist/
publish-to-pypi:
name: >-
Publish Python 🐍 distribution 📦 to PyPI
if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
needs:
- build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/randalo
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing
steps:
- name: Download all the dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/
- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
7 changes: 7 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.1.11'
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
75 changes: 64 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,73 @@
# ALO Library
# RandALO: fast randomized risk estimation for high-dimensional data

This repository contains a software package implementing RandALO, a fast randomized method for risk estimation of machine learning models, as described in the paper,

P. T. Nobel, D. LeJeune, E. J. Candès. RandALO: Out-of-sample risk estimation in no time flat. 2024.

## Installation

In a folder run the following:

```bash
git clone [email protected]:cvxgrp/randalo.git
cd randalo

# create a new environment with Python >= 3.10 (could also use venv or similar)
conda create -n randalo python=3.12

# install requirements and randalo
pip install -r requirements.txt
```
git clone [email protected]:cvxgrp/alo.git
cd alo

# create a new environment with torch & friends (could also use conda or similar)
python -m venv venv
. venv/bin/activate
## Usage

pip install wheel
pip install torch numpy scipy matplotlib
### Scikit-learn

pip install git+ssh://[email protected]/cvxgrp/SURE-CR.git@xtrace
pip install git+ssh://[email protected]/cvxgrp/torch_linops.git
pip install -e .
The simplest way to use RandALO is with linear models from scikit-learn. See a longer demonstration in a notebook [here](examples/scikit-learn.ipynb).

```python
from torch import nn
from sklearn.linear_model import Lasso
from randalo import RandALO

X, y = ... # load data as np.ndarrays as usual

model = Lasso(1.0).fit(X, y) # fit the model
alo = RandALO.from_sklearn(model, X, y) # set up the Jacobian
mse_estimate = alo.evaluate(nn.MSELoss()) # estimate risk
```

We currently support the following models:

- `LinearRegression`
- `Ridge`
- `Lasso`
- `LassoLars`
- `ElasticNet`
- `LogisticRegression`

### Linear models with any solver

If you prefer to use other solvers for fitting your models than scikit-learn, or if you wish to extend to other models than the ones listed above, you can still use RandALO by instantiating the Jacobian yourself. You need only be careful to ensure that you scale the regularizer correctly for your problem formulation.

```python
from torch import nn
from sklearn.linear_model import Lasso
from randalo import RandALO, MSELoss, L1Regularizer, Jacobian

X, y = ... # load data as np.ndarrays as usual

model = Lasso(1.0).fit(X, y) # fit the model

# instantiate RandALO by creating a Jacobian object
loss = MSELoss()
reg = 2.0 * model.alpha * L1Regularizer() # scale the regularizer appropriately
y_hat = model.predict(X)
solution_func = lambda: model.coef_
jac = Jacobian(y, X, solution_func, loss, reg)
alo = RandALO(loss, jac, y, y_hat)

mse_estimate = alo.evaluate(nn.MSELoss()) # estimate risk
```

Please refer to our [scikit-learn integration](randalo/sklearn_integration.py) source code for more examples.
72 changes: 0 additions & 72 deletions alogcv/alo.py

This file was deleted.

13 changes: 0 additions & 13 deletions alogcv/diagonal.py

This file was deleted.

72 changes: 0 additions & 72 deletions alogcv/gcv.py

This file was deleted.

Loading

0 comments on commit 047264b

Please sign in to comment.