Skip to content
agorajek edited this page Feb 8, 2011 · 2 revisions

Daisy Wang has been doing information extraction (text labeling) stuff using Conditional Random Fields in Postgres that Joe would like to port to MADlib.

Inference and learning methods for graphical models (Bayes Nets):

One-pass approximate quantiles: We should either invent an extension to the countmin approach for discrete domains, or look into one of these algorithms:

Graph algorithms (e.g. for social network analysis)

  • cluster coefficients (Joe has a naive SQL implementation, but one can do much better)
  • PageRank (we have a Greenplum MapReduce implementation)
  • centrality metrics

Sampling methods.