Skip to content

To Do List (by Method)

fschopp edited this page Jan 26, 2011 · 17 revisions

Madpack:

  • Add info output about target db and schema during installation/updates etc.

K-means clustering:

  • DONE: Allow dense arrays as input (not SVEC only)
  • Allow the method to start processing a source table w/o a "PointID" column (now it needs both PID and POSITION columns)
  • Implement other distance measures for k-means.
  • Fix the "goodness of fit" test to be scale, rotation and transition invariant
  • Rewrite the following plpgsql stored procedure into C:
    • udf: __kmeans_bestCentroid
    • uda: __kmeans_meanPosition
  • Overload main kmeans_run pl/python function with a 2nd version that takes additional dictionary argument to overwrite the algorithm constants (like: sampling_size, max_iterations, etc)

SVD Matrix Factorisation:

  • Add multi-user / multi-session support (output tables based on RUN_ID parameter).
  • Modify API to take a table/view name with predefined columns, instead of table and column names as parameters.
  • Add validation for all parameters.
  • Overload main svdmf_run pl/python function with a 2nd version that takes additional dictionary argument to overwrite the algorithm constants (like: original_step, num_iterations, etc)
  • Convert all support tables to temporary tables.

SVEC

  • Support for other base types.
  • Indexing on svecs. This requires indexing capability on arrays, which is currently unsupported in GP.

Kernel-machines:

  • Adjust documentation to our standards.
  • Add support for sparse vectors (now it's only array of float8s).
  • Prefix support functions with "__"
  • Review API, add support for multi-user/session environment.

Quantile:

  • Convert to Python
  • Add parameter validation

Logistic Regression:

  • Add measures for the goodness of fit