To Do List (by Method)

Jump to bottom Edit New page

fschopp edited this page Jan 26, 2011 · 17 revisions

Madpack:

Add info output about target db and schema during installation/updates etc.

K-means clustering:

DONE: Allow dense arrays as input (not SVEC only)
Allow the method to start processing a source table w/o a "PointID" column (now it needs both PID and POSITION columns)
Implement other distance measures for k-means.
Fix the "goodness of fit" test to be scale, rotation and transition invariant
Rewrite the following plpgsql stored procedure into C:
- udf: __kmeans_bestCentroid
- uda: __kmeans_meanPosition
Overload main kmeans_run pl/python function with a 2nd version that takes additional dictionary argument to overwrite the algorithm constants (like: sampling_size, max_iterations, etc)

SVD Matrix Factorisation:

Add multi-user / multi-session support (output tables based on RUN_ID parameter).
Modify API to take a table/view name with predefined columns, instead of table and column names as parameters.
Add validation for all parameters.
Overload main svdmf_run pl/python function with a 2nd version that takes additional dictionary argument to overwrite the algorithm constants (like: original_step, num_iterations, etc)
Convert all support tables to temporary tables.

SVEC

Support for other base types.
Indexing on svecs. This requires indexing capability on arrays, which is currently unsupported in GP.

Kernel-machines:

Adjust documentation to our standards.
Add support for sparse vectors (now it's only array of float8s).
Prefix support functions with "__"
Review API, add support for multi-user/session environment.

Quantile:

Convert to Python
Add parameter validation

Logistic Regression:

Add measures for the goodness of fit