forked from madlib/archived_madlib
-
Notifications
You must be signed in to change notification settings - Fork 0
Proposed Methods
eugenefratkin edited this page Mar 23, 2011
·
2 revisions
Daisy Wang has been doing information extraction (text labeling) stuff using Conditional Random Fields in Postgres that Joe would like to port to MADlib.
-
Monte Carlo inference methods.
-
Viterbi algorithm for for HMMs and CRFs.
Inference and learning methods for graphical models (Bayes Nets):
-
Belief Propagation and Junction Tree. We have overlog implementations of these that could be translated pretty directly to SQL.
-
Parameter Learning (e.g. EM) and Structure Learning
One-pass approximate quantiles: We should either invent an extension to the countmin approach for discrete domains, or look into one of these algorithms:
- Manku's algorithm, which is also used in Mahout
- Greenwald/Khanna
- Hsiao's FM-sketch trick. This is tempting given that we already have FM-sketch implemented.
Graph algorithms (e.g. for social network analysis)
- cluster coefficients (Joe has a naive SQL implementation, but one can do much better)
- PageRank (we have a Greenplum MapReduce implementation)
- centrality metrics
Sampling methods.
- vector operations: assume v and w to be vectors, M to be matrix and a to be scalar. We need hash(v), element wise operations - vw, v+w, v/w, v-w, va, v-a, dot(v,v), v^a, versions of same where null=0, sum(v), ditance(v,w) with some options, covariance(v,w)
- expose functionality of Lapack to SQL, including: matrix transforms, decompositions, inversions and so on.
- Sets of statistical functions. For each distribution (Uniform, Gaussian, Poisson, Chi Sq, Exp, Binomial, Multinomial, T) provide number generator, PDF, CDF, inverse CDF (if applicable).