This is a PLSA (Probabilistic Latent Semantic Analysis) implementation for large corpora using the EM (Expectation-Maximization) algorithm and multiprocessing.
When modeling large corpora, memory consumption can become a severe bottleneck. This project addresses that problem by using PyTables.
- Python 2.7
- PyTables
- Numpy
This software is available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (CC BY-NC-SA 3.0), allowing for any non-commercial reuse with appropriate attribution and similar licensing.
erick [dot] peirson [at] asu [dot] edu
This project is run by Erick Peirson and the Digital Innovation Group (DigInG) in the Center for Biology at Arizona State University. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 2011131209.
This project is based on a PLSA implementation by Liangjie Hong.