forked from madlib/archived_madlib
-
Notifications
You must be signed in to change notification settings - Fork 0
Proposed Methods
agorajek edited this page Feb 8, 2011
·
2 revisions
Daisy Wang has been doing information extraction (text labeling) stuff using Conditional Random Fields in Postgres that Joe would like to port to MADlib.
-
Monte Carlo inference methods.
-
Viterbi algorithm for for HMMs and CRFs.
Inference and learning methods for graphical models (Bayes Nets):
-
Belief Propagation and Junction Tree. We have overlog implementations of these that could be translated pretty directly to SQL.
-
Parameter Learning (e.g. EM) and Structure Learning
One-pass approximate quantiles: We should either invent an extension to the countmin approach for discrete domains, or look into one of these algorithms:
- Manku's algorithm, which is also used in Mahout
- Greenwald/Khanna
- Hsiao's FM-sketch trick. This is tempting given that we already have FM-sketch implemented.
Graph algorithms (e.g. for social network analysis)
- cluster coefficients (Joe has a naive SQL implementation, but one can do much better)
- PageRank (we have a Greenplum MapReduce implementation)
- centrality metrics
Sampling methods.