Merges in Draft code (#1)

* Begin new RandALO API * Adds examples with API design * Adds the modeling layer * Work on reductions * Environment setup * Better imports * Begin ALO impl * First pass at ALO impl * Adds precommit * Fixes examples * starts work on jacobian * Bug fixes for RandALO * Adds first part of the Jacobian expressions * Adds Jacobian operator * Added truncnorm tests * Added utils.py tests * Basic randalo.py test * Fixes `diag` property -> method * More randalo.py tests * Fixed test rng * Updated to Ruff formatter * Fixes up the Jacobian * Loss functions now reduce themselves * + cvxpylayers dep * Begins sklearn->model impl * scikit-learn integration for regression * scikit-learn example * Adds demo to README * Added logistic regression support * Added logistic regression example * Fix ABC * Cut at cleaning up the code * Refactor * Added generic Jacobian example to README * Adds workflow file * Misc finishing work --------- Co-authored-by: Daniel LeJeune <[email protected]>
cvxgrp · Sep 15, 2024 · 047264b · 047264b
1 parent 07a76f6
commit 047264b
Show file tree

Hide file tree

Showing 24 changed files with 2,531 additions and 182 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -0,0 +1,49 @@
+name: Publish Python 🐍 distribution 📦 to PyPI
+
+on: push
+
+jobs:
+    build:
+      name: Build distribution 📦
+      runs-on: ubuntu-latest
+
+      steps:
+        - uses: actions/checkout@v4
+        - name: Set up Python
+          uses: actions/setup-python@v5
+          with:
+            python-version: "3.x"
+
+        - name: Install pypa/build
+          run: >-
+            python3 -m
+            pip install
+            build
+            --user
+        - name: Build a binary wheel and a source tarball
+          run: python3 -m build
+        - name: Store the distribution packages
+          uses: actions/upload-artifact@v4
+          with:
+            name: python-package-distributions
+            path: dist/
+  publish-to-pypi:
+    name: >-
+      Publish Python 🐍 distribution 📦 to PyPI
+    if: startsWith(github.ref, 'refs/tags/')  # only publish to PyPI on tag pushes
+    needs:
+    - build
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/randalo
+    permissions:
+      id-token: write  # IMPORTANT: mandatory for trusted publishing
+    steps:
+    - name: Download all the dists
+      uses: actions/download-artifact@v4
+      with:
+        name: python-package-distributions
+        path: dist/
+    - name: Publish distribution 📦 to PyPI
+      uses: pypa/gh-action-pypi-publish@release/v1
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,7 @@
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: 'v0.1.11'
+    hooks:
+      - id: ruff
+        args: [--fix, --exit-non-zero-on-fix]
diff --git a/README.md b/README.md
@@ -1,20 +1,73 @@
-# ALO Library
+# RandALO: fast randomized risk estimation for high-dimensional data
+
+This repository contains a software package implementing RandALO, a fast randomized method for risk estimation of machine learning models, as described in the paper,
+
+P. T. Nobel, D. LeJeune, E. J. Candès. RandALO: Out-of-sample risk estimation in no time flat. 2024.
 
 ## Installation
 
 In a folder run the following:
+
+```bash
+git clone [email protected]:cvxgrp/randalo.git
+cd randalo
+
+# create a new environment with Python >= 3.10 (could also use venv or similar)
+conda create -n randalo python=3.12
+
+# install requirements and randalo
+pip install -r requirements.txt
 ```
-git clone [email protected]:cvxgrp/alo.git
-cd alo
 
-# create a new environment with torch & friends (could also use conda or similar)
-python -m venv venv
-. venv/bin/activate
+## Usage
 
-pip install wheel
-pip install torch numpy scipy matplotlib
+### Scikit-learn
 
-pip install git+ssh://[email protected]/cvxgrp/SURE-CR.git@xtrace
-pip install git+ssh://[email protected]/cvxgrp/torch_linops.git
-pip install -e .
+The simplest way to use RandALO is with linear models from scikit-learn. See a longer demonstration in a notebook [here](examples/scikit-learn.ipynb).
+
+```python
+from torch import nn
+from sklearn.linear_model import Lasso
+from randalo import RandALO
+
+X, y = ... # load data as np.ndarrays as usual
+
+model = Lasso(1.0).fit(X, y) # fit the model
+alo = RandALO.from_sklearn(model, X, y) # set up the Jacobian
+mse_estimate = alo.evaluate(nn.MSELoss()) # estimate risk
 ```
+
+We currently support the following models:
+
+- `LinearRegression`
+- `Ridge`
+- `Lasso`
+- `LassoLars`
+- `ElasticNet`
+- `LogisticRegression`
+
+### Linear models with any solver
+
+If you prefer to use other solvers for fitting your models than scikit-learn, or if you wish to extend to other models than the ones listed above, you can still use RandALO by instantiating the Jacobian yourself. You need only be careful to ensure that you scale the regularizer correctly for your problem formulation.
+
+```python
+from torch import nn
+from sklearn.linear_model import Lasso
+from randalo import RandALO, MSELoss, L1Regularizer, Jacobian
+
+X, y = ... # load data as np.ndarrays as usual
+
+model = Lasso(1.0).fit(X, y) # fit the model
+
+# instantiate RandALO by creating a Jacobian object
+loss = MSELoss()
+reg = 2.0 * model.alpha * L1Regularizer() # scale the regularizer appropriately
+y_hat = model.predict(X)
+solution_func = lambda: model.coef_
+jac = Jacobian(y, X, solution_func, loss, reg)
+alo = RandALO(loss, jac, y, y_hat)
+
+mse_estimate = alo.evaluate(nn.MSELoss()) # estimate risk
+```
+
+Please refer to our [scikit-learn integration](randalo/sklearn_integration.py) source code for more examples.
diff --git a/alogcv/alo.py b/alogcv/alo.py
diff --git a/alogcv/diagonal.py b/alogcv/diagonal.py
diff --git a/alogcv/gcv.py b/alogcv/gcv.py