Skip to content

Commit

Permalink
Doc fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
thisjustinh committed Jul 29, 2021
1 parent e6424d6 commit db4085b
Show file tree
Hide file tree
Showing 5 changed files with 195 additions and 22 deletions.
9 changes: 7 additions & 2 deletions Python/sparsepca/HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
# History

## 0.2.2

- Doc fixes
- PyPI metadata fixes

## 0.2.1

- Doc fixes.
- Doc fixes

## 0.2.0

- Initial PyPI release.
- Initial PyPI release
4 changes: 2 additions & 2 deletions Python/sparsepca/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ Returns a dictionary with the following key-value pairs:
| `time` | float | Execution time in seconds |

## Authors
Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue

Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou

## References

Expand Down
15 changes: 12 additions & 3 deletions Python/sparsepca/setup.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,34 @@
import setuptools

with open('README.md', 'r') as f:
readme = f.read()
readme = f.readlines()[4:]
readme = ''.join(readme)

with open('HISTORY.md', 'r') as f:
history = f.read()

setuptools.setup(
name="sparsepca",
version="0.2.1",
author="Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue",
version="0.2.2",
author="Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou",
author_email="[email protected]",
description="Sparse Principal Component Analysis in Python",
long_description=''.join([readme, '\n\n', history]),
long_description_content_type="text/markdown",
# packages=setuptools.find_packages(),
license='MIT',
classifiers=[
"Development Status :: 4 - Beta",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
project_urls={
'Documentation' : 'https://xinging-birds.github.io/AManPG/',
'Source': 'https://github.com/xinging-birds/AManPG',
},
python_requires=">=3.8",
py_modules=["sparsepca"],
install_requires=["numpy"]
Expand Down
32 changes: 18 additions & 14 deletions Python/sparsepca/sparsepca.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -1,26 +1,25 @@
Metadata-Version: 2.1
Name: sparsepca
Version: 0.2.1
Version: 0.2.2
Summary: Sparse Principal Component Analysis in Python
Home-page: UNKNOWN
Author: Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue
Author: Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou
Author-email: [email protected]
License: UNKNOWN
License: MIT
Project-URL: Documentation, https://xinging-birds.github.io/AManPG/
Project-URL: Source, https://github.com/xinging-birds/AManPG
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# SparsePCA

## Description

![pypi version](https://img.shields.io/pypi/v/sparsepca.svg)

![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg)
![pypi version](https://img.shields.io/pypi/v/sparsepca.svg)![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg)

Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix.

Expand Down Expand Up @@ -69,8 +68,8 @@ Returns a dictionary with the following key-value pairs:
| `time` | float | Execution time in seconds |

## Authors
Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue

Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou

## References

Expand Down Expand Up @@ -114,12 +113,17 @@ inf_sprout['loadings']

# History

## 0.2.2

- Doc fixes
- PyPI metadata fixes

## 0.2.1

- Doc fixes.
- Doc fixes

## 0.2.0

- Initial PyPI release.
- Initial PyPI release


157 changes: 156 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,159 @@
# Alternating Manifold Proximal Gradient Method (A-ManPG)

Implementation of the paper "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Coefficient Analysis".
![pypi version](https://img.shields.io/pypi/v/sparsepca.svg) ![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg)

- [Introduction](#introduction)
- [Installation](#installation)
- [Documentation](#documentation)
- [Usage](#usage)
- [Arguments](#arguments)
- [Values](#values)
- [Quick Start](#quick-start)
- [Python Quick Start](#python-quick-start)
- [R Quick Start](#r-quick-start)
- [References](#references)

## Introduction

`sparsepca` and `amanpg` find sparse loadings in principal component analysis (PCA) via an alternating manifold proximal gradient method (A-ManPG). Seeking a sparse basis allows the leading principal components to be easier to interpret when modeling with high-dimensional data. Due to the nonsmoothness and nonconvexity numerical difficulties, A-ManPG is implemented to guarantee convergence.

The package provides a function for performing sparse PCA and a function for normalizing data.

The authors of A-ManPG are Shixiang Chen, Shiqian Ma, Lingzhou Xue, and Hui Zou. The Python and R packages are maintained by Justin Huang and Benjamin Jochem. A MATLAB implementation is maintained by Shixiang Chen.

## Installation

To install the Python package, use `pip` to obtain `sparsepca` from PyPI:

```python
pip3 install sparsepca
```

To install the R package, install `amanpg` directly from CRAN:

```r
install.packages("amanpg")
```

## Documentation

### Usage

#### Python

```python
spca(z, lambda1, lambda2,
x0=None, y0=None, k=0, gamma=0.5, type=0,
maxiter=1e4, tol=1e-5, f_palm=1e5,
normalize=True, verbose=False):
```

#### R

```r
spca.amanpg(z, lambda1, lambda2,
f_palm = 1e5, x0 = NULL, y0 = NULL, k = 0, type = 0,
gamma = 0.5, maxiter = 1e4, tol = 1e-5,
normalize = TRUE, verbose = FALSE)
```

### Arguments

| Name | Python Type | R Type | Description |
| --- | --- | --- |
| `z` | numpy.ndarray | matrix | Either the data matrix or sample covariance matrix |
| `lambda1` | float list | numeric vector | List of parameters of length n for L1-norm penalty |
| `lambda2` | float or numpy.inf | numeric or Inf | L2-norm penalty term |
| `x0` | numpy.ndarray | matrix | Initial x-values for the gradient method, default value is the first n right singular vectors |
| `y0` | numpy.ndarray | matrix | Initial y-values for the gradient method, default value is the first n right singular vectors |
| `k` | int | int | Number of principal components desired, default is 0 (returns min(n-1, p) principal components) |
| `gamma` | float | numeric | Parameter to control how quickly the step size changes in each iteration, default is 0.5 |
| `type` | int | int | If 0, b is expcted to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0 |
| `maxiter` | int | int | Maximum number of iterations allowed in the gradient method, default is 1e4 |
| `tol` | float | numeric | Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5 |
| `f_palm` | float | numeric | Upper bound for the F-value to reach convergence, default is 1e5 |
| `normalize` | bool | logical | Center and normalize rows to Euclidean length 1 if True, default is True |
| `verbose` | bool | logical | Function prints progress between iterations if True, default is False |e

### Values

Python returns a dictionary with the following key-value pairs, while R returns a list with the following elements:

| Key | Python Value Type | R Value Type | Value |
| --- | --- | --- |
| `loadings` | numpy.ndarray | matrix | Loadings of the sparse principal components |
| `f_manpg` | float | numeric | Final F-value |
| `x` | numpy.ndarray | matirx | Corresponding ndarray in subproblem to the loadings |
| `iter` | int | numeric | Total number of iterations executed |
| `sparsity` | float | numeric | Number of sparse loadings (loadings == 0) divided by number of all loadings |
| `time` | float | numeric | Execution time in seconds |

## Quick Start

### Python Quick Start

Note that the Python package depends on numpy.

In the following example, the package function is imported first. The appropriate parameters are defined—in this case, we want four sparse principal components (rank-`k` loadings)—from a 1000x500 data matrix. The L1-penalty terms are set to 0.1, and the L2-penalty term is set to 1. Note that any positive value can be used for the L2-penalty term, up to `np.inf`.

A random 1000x500 matrix is generated from the normal distribution, and then the function is called through `spca()`. A printout of the results follows, along with observing the loadings.

The second example keeps the same parameters except switching `lambda2` with infinity. Again, the results are printed out and the loadings are observed.

```python
import numpy as np
from sparsepca import spca

k = 4 # columns
d = 500 # dimensions
m = 1000 # sample size
lambda1 = 0.1 * np.ones((n, 1))
lambda2 = 1

np.random.seed(10)
a = np.random.normal(0, 1, size=(m, d)) # generate random normal 1000x500 matrix
fin_sprout = spca(a, lambda1, lambda2, k=k)
print(f"Finite: {fin_sprout['iter']} iterations with final value
{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']},
timediff {fin_sprout['time']}.")

fin_sprout['loadings']

inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4)
print(f"Infinite: {inf_sprout['iter']} iterations with final value
{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']},
timediff {inf_sprout['time']}.")

inf_sprout['loadings']
```

### R Quick Start

In the following example, we load the library using `library(amanpg)` and then define a 1000x500 randomly-generated matrix from the normal distribution. We set the L1-penalty term to 0.1 and L2-penalty term to infinity, and seek the first four principal components.

The default initial point are the `k` right singular vectors from SVD, which we can see manually broken down here. In the function call, we pass the parameters in and output our list sprout.

The results are printed out, and then we view the loadings.

```r
d <- 500 # dimension
m <- 1000 # sample size
a <- normalize(matrix(rnorm(m * d), m, d))
lambda1 <- 0.1 * matrix(data=1, nrow=4, ncol=1)
x0 <- svd(a, nv=4)$v
sprout <- spca.amanpg(a, lambda1, lambda2=Inf, x0=x0, y0=x0, k=4)
print(paste(sprout$iter, "iterations,", sprout$sparsity, "sparsity,", sprout$time))

# extract loadings
View(sprout$loadings)
```

## References

Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" INFORMS Journal on Optimization 2:3, 192-208 <[doi:10.1287/ijoo.2019.0032](https://doi.org/10.1287%2Fijoo.2019.0032)>.

Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286 <[doi:10.1198/106186006X113430](https://doi.org/10.1198%2F106186006X113430)>.

Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320 <[doi:10.1109/JPROC.2018.2846588](https://doi.org/10.1109%2FJPROC.2018.2846588)>.


0 comments on commit db4085b

Please sign in to comment.