-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e6424d6
commit db4085b
Showing
5 changed files
with
195 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,14 @@ | ||
# History | ||
|
||
## 0.2.2 | ||
|
||
- Doc fixes | ||
- PyPI metadata fixes | ||
|
||
## 0.2.1 | ||
|
||
- Doc fixes. | ||
- Doc fixes | ||
|
||
## 0.2.0 | ||
|
||
- Initial PyPI release. | ||
- Initial PyPI release |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,34 @@ | ||
import setuptools | ||
|
||
with open('README.md', 'r') as f: | ||
readme = f.read() | ||
readme = f.readlines()[4:] | ||
readme = ''.join(readme) | ||
|
||
with open('HISTORY.md', 'r') as f: | ||
history = f.read() | ||
|
||
setuptools.setup( | ||
name="sparsepca", | ||
version="0.2.1", | ||
author="Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue", | ||
version="0.2.2", | ||
author="Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou", | ||
author_email="[email protected]", | ||
description="Sparse Principal Component Analysis in Python", | ||
long_description=''.join([readme, '\n\n', history]), | ||
long_description_content_type="text/markdown", | ||
# packages=setuptools.find_packages(), | ||
license='MIT', | ||
classifiers=[ | ||
"Development Status :: 4 - Beta", | ||
"Programming Language :: Python :: 3", | ||
"Programming Language :: Python :: 3.8", | ||
"Programming Language :: Python :: 3.9", | ||
"License :: OSI Approved :: MIT License", | ||
"Operating System :: OS Independent", | ||
], | ||
project_urls={ | ||
'Documentation' : 'https://xinging-birds.github.io/AManPG/', | ||
'Source': 'https://github.com/xinging-birds/AManPG', | ||
}, | ||
python_requires=">=3.8", | ||
py_modules=["sparsepca"], | ||
install_requires=["numpy"] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,25 @@ | ||
Metadata-Version: 2.1 | ||
Name: sparsepca | ||
Version: 0.2.1 | ||
Version: 0.2.2 | ||
Summary: Sparse Principal Component Analysis in Python | ||
Home-page: UNKNOWN | ||
Author: Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue | ||
Author: Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou | ||
Author-email: [email protected] | ||
License: UNKNOWN | ||
License: MIT | ||
Project-URL: Documentation, https://xinging-birds.github.io/AManPG/ | ||
Project-URL: Source, https://github.com/xinging-birds/AManPG | ||
Platform: UNKNOWN | ||
Classifier: Development Status :: 4 - Beta | ||
Classifier: Programming Language :: Python :: 3 | ||
Classifier: Programming Language :: Python :: 3.8 | ||
Classifier: Programming Language :: Python :: 3.9 | ||
Classifier: License :: OSI Approved :: MIT License | ||
Classifier: Operating System :: OS Independent | ||
Requires-Python: >=3.8 | ||
Description-Content-Type: text/markdown | ||
License-File: LICENSE | ||
|
||
# SparsePCA | ||
|
||
## Description | ||
|
||
![pypi version](https://img.shields.io/pypi/v/sparsepca.svg) | ||
|
||
![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg) | ||
![pypi version](https://img.shields.io/pypi/v/sparsepca.svg)![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg) | ||
|
||
Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix. | ||
|
||
|
@@ -69,8 +68,8 @@ Returns a dictionary with the following key-value pairs: | |
| `time` | float | Execution time in seconds | | ||
|
||
## Authors | ||
Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue | ||
|
||
Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou | ||
|
||
## References | ||
|
||
|
@@ -114,12 +113,17 @@ inf_sprout['loadings'] | |
|
||
# History | ||
|
||
## 0.2.2 | ||
|
||
- Doc fixes | ||
- PyPI metadata fixes | ||
|
||
## 0.2.1 | ||
|
||
- Doc fixes. | ||
- Doc fixes | ||
|
||
## 0.2.0 | ||
|
||
- Initial PyPI release. | ||
- Initial PyPI release | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,159 @@ | ||
# Alternating Manifold Proximal Gradient Method (A-ManPG) | ||
|
||
Implementation of the paper "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Coefficient Analysis". | ||
![pypi version](https://img.shields.io/pypi/v/sparsepca.svg) ![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg) | ||
|
||
- [Introduction](#introduction) | ||
- [Installation](#installation) | ||
- [Documentation](#documentation) | ||
- [Usage](#usage) | ||
- [Arguments](#arguments) | ||
- [Values](#values) | ||
- [Quick Start](#quick-start) | ||
- [Python Quick Start](#python-quick-start) | ||
- [R Quick Start](#r-quick-start) | ||
- [References](#references) | ||
|
||
## Introduction | ||
|
||
`sparsepca` and `amanpg` find sparse loadings in principal component analysis (PCA) via an alternating manifold proximal gradient method (A-ManPG). Seeking a sparse basis allows the leading principal components to be easier to interpret when modeling with high-dimensional data. Due to the nonsmoothness and nonconvexity numerical difficulties, A-ManPG is implemented to guarantee convergence. | ||
|
||
The package provides a function for performing sparse PCA and a function for normalizing data. | ||
|
||
The authors of A-ManPG are Shixiang Chen, Shiqian Ma, Lingzhou Xue, and Hui Zou. The Python and R packages are maintained by Justin Huang and Benjamin Jochem. A MATLAB implementation is maintained by Shixiang Chen. | ||
|
||
## Installation | ||
|
||
To install the Python package, use `pip` to obtain `sparsepca` from PyPI: | ||
|
||
```python | ||
pip3 install sparsepca | ||
``` | ||
|
||
To install the R package, install `amanpg` directly from CRAN: | ||
|
||
```r | ||
install.packages("amanpg") | ||
``` | ||
|
||
## Documentation | ||
|
||
### Usage | ||
|
||
#### Python | ||
|
||
```python | ||
spca(z, lambda1, lambda2, | ||
x0=None, y0=None, k=0, gamma=0.5, type=0, | ||
maxiter=1e4, tol=1e-5, f_palm=1e5, | ||
normalize=True, verbose=False): | ||
``` | ||
|
||
#### R | ||
|
||
```r | ||
spca.amanpg(z, lambda1, lambda2, | ||
f_palm = 1e5, x0 = NULL, y0 = NULL, k = 0, type = 0, | ||
gamma = 0.5, maxiter = 1e4, tol = 1e-5, | ||
normalize = TRUE, verbose = FALSE) | ||
``` | ||
|
||
### Arguments | ||
|
||
| Name | Python Type | R Type | Description | | ||
| --- | --- | --- | | ||
| `z` | numpy.ndarray | matrix | Either the data matrix or sample covariance matrix | | ||
| `lambda1` | float list | numeric vector | List of parameters of length n for L1-norm penalty | | ||
| `lambda2` | float or numpy.inf | numeric or Inf | L2-norm penalty term | | ||
| `x0` | numpy.ndarray | matrix | Initial x-values for the gradient method, default value is the first n right singular vectors | | ||
| `y0` | numpy.ndarray | matrix | Initial y-values for the gradient method, default value is the first n right singular vectors | | ||
| `k` | int | int | Number of principal components desired, default is 0 (returns min(n-1, p) principal components) | | ||
| `gamma` | float | numeric | Parameter to control how quickly the step size changes in each iteration, default is 0.5 | | ||
| `type` | int | int | If 0, b is expcted to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0 | | ||
| `maxiter` | int | int | Maximum number of iterations allowed in the gradient method, default is 1e4 | | ||
| `tol` | float | numeric | Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5 | | ||
| `f_palm` | float | numeric | Upper bound for the F-value to reach convergence, default is 1e5 | | ||
| `normalize` | bool | logical | Center and normalize rows to Euclidean length 1 if True, default is True | | ||
| `verbose` | bool | logical | Function prints progress between iterations if True, default is False |e | ||
|
||
### Values | ||
|
||
Python returns a dictionary with the following key-value pairs, while R returns a list with the following elements: | ||
|
||
| Key | Python Value Type | R Value Type | Value | | ||
| --- | --- | --- | | ||
| `loadings` | numpy.ndarray | matrix | Loadings of the sparse principal components | | ||
| `f_manpg` | float | numeric | Final F-value | | ||
| `x` | numpy.ndarray | matirx | Corresponding ndarray in subproblem to the loadings | | ||
| `iter` | int | numeric | Total number of iterations executed | | ||
| `sparsity` | float | numeric | Number of sparse loadings (loadings == 0) divided by number of all loadings | | ||
| `time` | float | numeric | Execution time in seconds | | ||
|
||
## Quick Start | ||
|
||
### Python Quick Start | ||
|
||
Note that the Python package depends on numpy. | ||
|
||
In the following example, the package function is imported first. The appropriate parameters are defined—in this case, we want four sparse principal components (rank-`k` loadings)—from a 1000x500 data matrix. The L1-penalty terms are set to 0.1, and the L2-penalty term is set to 1. Note that any positive value can be used for the L2-penalty term, up to `np.inf`. | ||
|
||
A random 1000x500 matrix is generated from the normal distribution, and then the function is called through `spca()`. A printout of the results follows, along with observing the loadings. | ||
|
||
The second example keeps the same parameters except switching `lambda2` with infinity. Again, the results are printed out and the loadings are observed. | ||
|
||
```python | ||
import numpy as np | ||
from sparsepca import spca | ||
|
||
k = 4 # columns | ||
d = 500 # dimensions | ||
m = 1000 # sample size | ||
lambda1 = 0.1 * np.ones((n, 1)) | ||
lambda2 = 1 | ||
|
||
np.random.seed(10) | ||
a = np.random.normal(0, 1, size=(m, d)) # generate random normal 1000x500 matrix | ||
fin_sprout = spca(a, lambda1, lambda2, k=k) | ||
print(f"Finite: {fin_sprout['iter']} iterations with final value | ||
{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']}, | ||
timediff {fin_sprout['time']}.") | ||
|
||
fin_sprout['loadings'] | ||
|
||
inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4) | ||
print(f"Infinite: {inf_sprout['iter']} iterations with final value | ||
{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']}, | ||
timediff {inf_sprout['time']}.") | ||
|
||
inf_sprout['loadings'] | ||
``` | ||
|
||
### R Quick Start | ||
|
||
In the following example, we load the library using `library(amanpg)` and then define a 1000x500 randomly-generated matrix from the normal distribution. We set the L1-penalty term to 0.1 and L2-penalty term to infinity, and seek the first four principal components. | ||
|
||
The default initial point are the `k` right singular vectors from SVD, which we can see manually broken down here. In the function call, we pass the parameters in and output our list sprout. | ||
|
||
The results are printed out, and then we view the loadings. | ||
|
||
```r | ||
d <- 500 # dimension | ||
m <- 1000 # sample size | ||
a <- normalize(matrix(rnorm(m * d), m, d)) | ||
lambda1 <- 0.1 * matrix(data=1, nrow=4, ncol=1) | ||
x0 <- svd(a, nv=4)$v | ||
sprout <- spca.amanpg(a, lambda1, lambda2=Inf, x0=x0, y0=x0, k=4) | ||
print(paste(sprout$iter, "iterations,", sprout$sparsity, "sparsity,", sprout$time)) | ||
|
||
# extract loadings | ||
View(sprout$loadings) | ||
``` | ||
|
||
## References | ||
|
||
Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" INFORMS Journal on Optimization 2:3, 192-208 <[doi:10.1287/ijoo.2019.0032](https://doi.org/10.1287%2Fijoo.2019.0032)>. | ||
|
||
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286 <[doi:10.1198/106186006X113430](https://doi.org/10.1198%2F106186006X113430)>. | ||
|
||
Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320 <[doi:10.1109/JPROC.2018.2846588](https://doi.org/10.1109%2FJPROC.2018.2846588)>. | ||
|
||
|