Doc fixes

thisjustinh · Jul 29, 2021 · db4085b · db4085b
1 parent e6424d6
commit db4085b
Show file tree

Hide file tree

Showing 5 changed files with 195 additions and 22 deletions.
diff --git a/Python/sparsepca/HISTORY.md b/Python/sparsepca/HISTORY.md
@@ -1,9 +1,14 @@
 # History
 
+## 0.2.2
+
+- Doc fixes
+- PyPI metadata fixes
+
 ## 0.2.1
 
-- Doc fixes.
+- Doc fixes
 
 ## 0.2.0
 
-- Initial PyPI release.
+- Initial PyPI release
diff --git a/Python/sparsepca/README.md b/Python/sparsepca/README.md
@@ -51,8 +51,8 @@ Returns a dictionary with the following key-value pairs:
 | `time` | float | Execution time in seconds |
 
 ## Authors
- 
-Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue
+
+Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou
 
 ## References
 

diff --git a/Python/sparsepca/setup.py b/Python/sparsepca/setup.py
@@ -1,25 +1,34 @@
 import setuptools
 
 with open('README.md', 'r') as f:
-    readme = f.read()
+    readme = f.readlines()[4:]
+    readme = ''.join(readme)
 
 with open('HISTORY.md', 'r') as f:
     history = f.read()
 
 setuptools.setup(
     name="sparsepca",
-    version="0.2.1",
-    author="Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue",
+    version="0.2.2",
+    author="Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou",
     author_email="[email protected]",
     description="Sparse Principal Component Analysis in Python",
     long_description=''.join([readme, '\n\n', history]),
     long_description_content_type="text/markdown",
     # packages=setuptools.find_packages(),
+    license='MIT',
     classifiers=[
+        "Development Status :: 4 - Beta",
         "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
         "License :: OSI Approved :: MIT License",
         "Operating System :: OS Independent",
     ],
+    project_urls={
+        'Documentation' : 'https://xinging-birds.github.io/AManPG/',
+        'Source': 'https://github.com/xinging-birds/AManPG',
+    },
     python_requires=">=3.8",
     py_modules=["sparsepca"],
     install_requires=["numpy"]

diff --git a/Python/sparsepca/sparsepca.egg-info/PKG-INFO b/Python/sparsepca/sparsepca.egg-info/PKG-INFO
@@ -1,26 +1,25 @@
 Metadata-Version: 2.1
 Name: sparsepca
-Version: 0.2.1
+Version: 0.2.2
 Summary: Sparse Principal Component Analysis in Python
 Home-page: UNKNOWN
-Author: Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue
+Author: Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou
 Author-email: [email protected]
-License: UNKNOWN
+License: MIT
+Project-URL: Documentation, https://xinging-birds.github.io/AManPG/
+Project-URL: Source, https://github.com/xinging-birds/AManPG
 Platform: UNKNOWN
+Classifier: Development Status :: 4 - Beta
 Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Operating System :: OS Independent
 Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 
-# SparsePCA
-
-## Description
-
-![pypi version](https://img.shields.io/pypi/v/sparsepca.svg)
-
-![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg)
+![pypi version](https://img.shields.io/pypi/v/sparsepca.svg)![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg)
 
 Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix. 
 
@@ -69,8 +68,8 @@ Returns a dictionary with the following key-value pairs:
 | `time` | float | Execution time in seconds |
 
 ## Authors
- 
-Justin Huang, Benjamin Jochem, Shiqian Ma, and Lingzhou Xue
+
+Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou
 
 ## References
 
@@ -114,12 +113,17 @@ inf_sprout['loadings']
 
 # History
 
+## 0.2.2
+
+- Doc fixes
+- PyPI metadata fixes
+
 ## 0.2.1
 
-- Doc fixes.
+- Doc fixes
 
 ## 0.2.0
 
-- Initial PyPI release.
+- Initial PyPI release
 
 
diff --git a/README.md b/README.md
@@ -1,4 +1,159 @@
 # Alternating Manifold Proximal Gradient Method (A-ManPG)
 
-Implementation of the paper "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Coefficient Analysis".
+![pypi version](https://img.shields.io/pypi/v/sparsepca.svg) ![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg)
+
+- [Introduction](#introduction)
+- [Installation](#installation)
+- [Documentation](#documentation)
+	- [Usage](#usage)
+	- [Arguments](#arguments)
+	- [Values](#values)
+- [Quick Start](#quick-start)
+	- [Python Quick Start](#python-quick-start)
+	- [R Quick Start](#r-quick-start)
+- [References](#references)
+
+## Introduction
+
+`sparsepca` and `amanpg` find sparse loadings in principal component analysis (PCA) via an alternating manifold proximal gradient method (A-ManPG). Seeking a sparse basis allows the leading principal components to be easier to interpret when modeling with high-dimensional data. Due to the nonsmoothness and nonconvexity numerical difficulties, A-ManPG is implemented to guarantee convergence. 
+
+The package provides a function for performing sparse PCA and a function for normalizing data.
+
+The authors of A-ManPG are Shixiang Chen, Shiqian Ma, Lingzhou Xue, and Hui Zou. The Python and R packages are maintained by Justin Huang and Benjamin Jochem. A MATLAB implementation is maintained by Shixiang Chen.
+
+## Installation
+
+To install the Python package, use `pip` to obtain `sparsepca` from PyPI:
+
+```python
+pip3 install sparsepca
+```
+
+To install the R package, install `amanpg` directly from CRAN:
+
+```r
+install.packages("amanpg")
+```
+
+## Documentation
+
+### Usage
+
+#### Python
+
+```python
+spca(z, lambda1, lambda2, 
+     x0=None, y0=None, k=0, gamma=0.5, type=0, 
+     maxiter=1e4, tol=1e-5, f_palm=1e5,
+	 normalize=True, verbose=False):
+```
+
+#### R
+
+```r
+spca.amanpg(z, lambda1, lambda2, 
+			f_palm = 1e5, x0 = NULL, y0 = NULL, k = 0, type = 0, 
+			gamma = 0.5, maxiter = 1e4, tol = 1e-5, 
+			normalize = TRUE, verbose = FALSE)
+```
+
+### Arguments
+
+| Name | Python Type | R Type | Description |
+| --- | --- | --- |
+| `z` | numpy.ndarray | matrix | Either the data matrix or sample covariance matrix |
+| `lambda1` | float list | numeric vector | List of parameters of length n for L1-norm penalty |
+| `lambda2` | float or numpy.inf | numeric or Inf | L2-norm penalty term |
+| `x0` | numpy.ndarray | matrix | Initial x-values for the gradient method, default value is the first n right singular vectors |
+| `y0` | numpy.ndarray | matrix | Initial y-values for the gradient method, default value is the first n right singular vectors |
+| `k` | int | int | Number of principal components desired, default is 0 (returns min(n-1, p) principal components) |
+| `gamma` | float | numeric | Parameter to control how quickly the step size changes in each iteration, default is 0.5 |
+| `type` | int | int | If 0, b is expcted to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0 |
+| `maxiter` | int | int | Maximum number of iterations allowed in the gradient method, default is 1e4 |
+| `tol` | float | numeric | Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5 |
+| `f_palm` | float | numeric | Upper bound for the F-value to reach convergence, default is 1e5 |
+| `normalize` | bool | logical | Center and normalize rows to Euclidean length 1 if True, default is True |
+| `verbose` | bool | logical | Function prints progress between iterations if True, default is False |e
+
+### Values
+
+Python returns a dictionary with the following key-value pairs, while R returns a list with the following elements:
+
+| Key | Python Value Type | R Value Type | Value |
+| --- | --- | --- |
+| `loadings` | numpy.ndarray | matrix | Loadings of the sparse principal components |
+| `f_manpg` | float | numeric | Final F-value |
+| `x` | numpy.ndarray | matirx | Corresponding ndarray in subproblem to the loadings |
+| `iter` | int | numeric | Total number of iterations executed |
+| `sparsity` | float | numeric | Number of sparse loadings (loadings == 0) divided by number of all loadings |
+| `time` | float | numeric | Execution time in seconds |
+
+## Quick Start
+
+### Python Quick Start
+
+Note that the Python package depends on numpy.
+
+In the following example, the package function is imported first. The appropriate parameters are defined&mdash;in this case, we want four sparse principal components (rank-`k` loadings)&mdash;from a 1000x500 data matrix. The L1-penalty terms are set to 0.1, and the L2-penalty term is set to 1. Note that any positive value can be used for the L2-penalty term, up to `np.inf`. 
+
+A random 1000x500 matrix is generated from the normal distribution, and then the function is called through `spca()`. A printout of the results follows, along with observing the loadings. 
+
+The second example keeps the same parameters except switching `lambda2` with infinity. Again, the results are printed out and the loadings are observed.
+
+```python
+import numpy as np
+from sparsepca import spca
+
+k = 4  # columns
+d = 500  # dimensions
+m = 1000  # sample size
+lambda1 = 0.1 * np.ones((n, 1))
+lambda2 = 1
+
+np.random.seed(10)
+a = np.random.normal(0, 1, size=(m, d))  # generate random normal 1000x500 matrix
+fin_sprout = spca(a, lambda1, lambda2, k=k)
+print(f"Finite: {fin_sprout['iter']} iterations with final value 
+		{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']}, 
+		timediff {fin_sprout['time']}.")
+
+fin_sprout['loadings']
+
+inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4)
+print(f"Infinite: {inf_sprout['iter']} iterations with final value 
+		{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']}, 
+		timediff {inf_sprout['time']}.")
+
+inf_sprout['loadings']
+```
+
+### R Quick Start
+
+In the following example, we load the library using `library(amanpg)` and then define a 1000x500 randomly-generated matrix from the normal distribution. We set the L1-penalty term to 0.1 and L2-penalty term to infinity, and seek the first four principal components.
+
+The default initial point are the `k` right singular vectors from SVD, which we can see manually broken down here. In the function call, we pass the parameters in and output our list sprout. 
+
+The results are printed out, and then we view the loadings.
+
+```r
+d <- 500  # dimension
+m <- 1000 # sample size
+a <- normalize(matrix(rnorm(m * d), m, d))
+lambda1 <- 0.1 * matrix(data=1, nrow=4, ncol=1)
+x0 <- svd(a, nv=4)$v
+sprout <- spca.amanpg(a, lambda1, lambda2=Inf, x0=x0, y0=x0, k=4) 
+print(paste(sprout$iter, "iterations,", sprout$sparsity, "sparsity,", sprout$time))
+
+# extract loadings
+View(sprout$loadings)
+```
+
+## References
+
+Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" INFORMS Journal on Optimization 2:3, 192-208 <[doi:10.1287/ijoo.2019.0032](https://doi.org/10.1287%2Fijoo.2019.0032)>.
+
+Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286 <[doi:10.1198/106186006X113430](https://doi.org/10.1198%2F106186006X113430)>.
+
+Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320 <[doi:10.1109/JPROC.2018.2846588](https://doi.org/10.1109%2FJPROC.2018.2846588)>.
+