-
Notifications
You must be signed in to change notification settings - Fork 87
Add sample ML-based topic modeling support #170
base: master
Are you sure you want to change the base?
Commits on Jun 29, 2017
-
Configuration menu - View commit details
-
Copy full SHA for e24f3b7 - Browse repository at this point
Copy the full SHA e24f3b7View commit details
Commits on Jul 3, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 9535b81 - Browse repository at this point
Copy the full SHA 9535b81View commit details -
Configuration menu - View commit details
-
Copy full SHA for 934da4b - Browse repository at this point
Copy the full SHA 934da4bView commit details -
1. Two LDA model (with different package, not sure which one is bette…
…r yet) 2. A path helper to assit import 3. modified token_pool to make it compatible with LDA model
Configuration menu - View commit details
-
Copy full SHA for 2a8a0f2 - Browse repository at this point
Copy the full SHA 2a8a0f2View commit details -
Merge branch 'topic_modelling' of github.com:berkmancenter/mediacloud…
… into topic_modelling
Configuration menu - View commit details
-
Copy full SHA for e888805 - Browse repository at this point
Copy the full SHA e888805View commit details -
Configuration menu - View commit details
-
Copy full SHA for a23aa13 - Browse repository at this point
Copy the full SHA a23aa13View commit details
Commits on Jul 10, 2017
-
1. Made every variable and method priavte if possible 2. Reformatted code with Pycharm shortcut 3. Added tests for TokenPool (works well) and ModelGensim (does work due to 'no module named XXX' problem when model_gensim is calling its abstract parent) 4. Decoupled token_pool and model_* 5. Used if __name__ == '__main__' to give a simple demonstration on how to use each mehtod Model_* 1. Renamed mode_lda.py and model_lda2.py to model_gensim.py (which uses the Gensim package) and model_lda.py (which uses the LDA package) 2. Added a abstract parent class TopicModel.py 3. Moved some code from summarise() to add_stories() (a. better structure of code; b. improved performance) 4. Changed some constants to function arguments (e.g. total_topic_num, iteration_num, etc.) TokenPool 1. Added mc_root_path() when locating the stopwords file 2. Modified query in token pool: 1. added "INNER JOIN stories WHERE language='en'" to guarantee all stories are in English 2. added "LIMIT" and corresponding "SELECT DISTINCT ... ORDER BY..." to guarantee only fetch the required number of stroies (thus improves performance) 3. added "OFFSET" 3. Restructured token_pool.py, so that the stories are traversed only once (thus improves performance) 4. Decoupled DB from token_pool.py 5. Replace regex tokenization with nltk.tokenizer 6. Added nltk.stem.WordNetLemmatizer to lemmatize (which gives a better result than stemming) tokens
Configuration menu - View commit details
-
Copy full SHA for bc462ba - Browse repository at this point
Copy the full SHA bc462baView commit details
Commits on Jul 11, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 83a31a7 - Browse repository at this point
Copy the full SHA 83a31a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for ced8bb4 - Browse repository at this point
Copy the full SHA ced8bb4View commit details
Commits on Jul 17, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 943c696 - Browse repository at this point
Copy the full SHA 943c696View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3db49ee - Browse repository at this point
Copy the full SHA 3db49eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 06d1d37 - Browse repository at this point
Copy the full SHA 06d1d37View commit details -
Configuration menu - View commit details
-
Copy full SHA for e027dad - Browse repository at this point
Copy the full SHA e027dadView commit details -
Configuration menu - View commit details
-
Copy full SHA for 336c0d8 - Browse repository at this point
Copy the full SHA 336c0d8View commit details
Commits on Jul 18, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 178226b - Browse repository at this point
Copy the full SHA 178226bView commit details -
Configuration menu - View commit details
-
Copy full SHA for ebc4715 - Browse repository at this point
Copy the full SHA ebc4715View commit details
Commits on Jul 20, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 39c5e8c - Browse repository at this point
Copy the full SHA 39c5e8cView commit details
Commits on Jul 24, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 716fe91 - Browse repository at this point
Copy the full SHA 716fe91View commit details -
Configuration menu - View commit details
-
Copy full SHA for f66ead6 - Browse repository at this point
Copy the full SHA f66ead6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d6c12d - Browse repository at this point
Copy the full SHA 2d6c12dView commit details -
added model_nmf.py to model topics with the NMF algorithm
The result of this algorithm is similar but slightly different from the LDA model + It allows multiple topics for each story
Configuration menu - View commit details
-
Copy full SHA for 6c50ed2 - Browse repository at this point
Copy the full SHA 6c50ed2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 679fef0 - Browse repository at this point
Copy the full SHA 679fef0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3ab2124 - Browse repository at this point
Copy the full SHA 3ab2124View commit details -
Merge branch 'topic_modelling' of github.com:berkmancenter/mediacloud…
… into topic_modelling
Configuration menu - View commit details
-
Copy full SHA for 025dece - Browse repository at this point
Copy the full SHA 025deceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 61517d1 - Browse repository at this point
Copy the full SHA 61517d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 36817b9 - Browse repository at this point
Copy the full SHA 36817b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for b5562ad - Browse repository at this point
Copy the full SHA b5562adView commit details -
Configuration menu - View commit details
-
Copy full SHA for e6b126c - Browse repository at this point
Copy the full SHA e6b126cView commit details -
Configuration menu - View commit details
-
Copy full SHA for c93fe63 - Browse repository at this point
Copy the full SHA c93fe63View commit details -
1. removed josn serialization after fetching sentences from database
2. renamed a few methods/variables due to the change of functionalities
Configuration menu - View commit details
-
Copy full SHA for 730a4e9 - Browse repository at this point
Copy the full SHA 730a4e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b38dff - Browse repository at this point
Copy the full SHA 3b38dffView commit details -
Configuration menu - View commit details
-
Copy full SHA for 154f96d - Browse repository at this point
Copy the full SHA 154f96dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ea449a - Browse repository at this point
Copy the full SHA 5ea449aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 34fdcbc - Browse repository at this point
Copy the full SHA 34fdcbcView commit details -
Configuration menu - View commit details
-
Copy full SHA for baca56c - Browse repository at this point
Copy the full SHA baca56cView commit details -
Configuration menu - View commit details
-
Copy full SHA for fe78de8 - Browse repository at this point
Copy the full SHA fe78de8View commit details -
1. Change the SQL query to be the same as suggested in previous PR re…
…view, leave the alternative query and related code as comments 2. Allowing TokenPool to take either a DBHandler or a TextIOWrapper
Configuration menu - View commit details
-
Copy full SHA for 91d725e - Browse repository at this point
Copy the full SHA 91d725eView commit details -
Seperated test cases for three models from db_connection
they are now taking the stories in the sample file as input
Configuration menu - View commit details
-
Copy full SHA for 0ca1eca - Browse repository at this point
Copy the full SHA 0ca1ecaView commit details -
Configuration menu - View commit details
-
Copy full SHA for dc0b73b - Browse repository at this point
Copy the full SHA dc0b73bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 96f566c - Browse repository at this point
Copy the full SHA 96f566cView commit details -
Configuration menu - View commit details
-
Copy full SHA for c488c08 - Browse repository at this point
Copy the full SHA c488c08View commit details
Commits on Jul 26, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 6d8555e - Browse repository at this point
Copy the full SHA 6d8555eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6182c4f - Browse repository at this point
Copy the full SHA 6182c4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9c68669 - Browse repository at this point
Copy the full SHA 9c68669View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e04ff1 - Browse repository at this point
Copy the full SHA 0e04ff1View commit details -
Configuration menu - View commit details
-
Copy full SHA for d995cb8 - Browse repository at this point
Copy the full SHA d995cb8View commit details
Commits on Jul 27, 2017
-
Configuration menu - View commit details
-
Copy full SHA for a361b01 - Browse repository at this point
Copy the full SHA a361b01View commit details -
Configuration menu - View commit details
-
Copy full SHA for db1c584 - Browse repository at this point
Copy the full SHA db1c584View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a88eab - Browse repository at this point
Copy the full SHA 2a88eabView commit details -
Configuration menu - View commit details
-
Copy full SHA for b62e71d - Browse repository at this point
Copy the full SHA b62e71dView commit details -
Install only WordNet data from NLTK data
1) Faster (Travis doesn't have all day) 2) We only use WordNet at the moment
Configuration menu - View commit details
-
Copy full SHA for 7922d3c - Browse repository at this point
Copy the full SHA 7922d3cView commit details -
Revert "added COMMAND_PREFIX to use sudo on linux"
This reverts commit db1c584.
Configuration menu - View commit details
-
Copy full SHA for 7ce27cc - Browse repository at this point
Copy the full SHA 7ce27ccView commit details -
Revert "turn on -n switch of unzip gh-pages.zip, preventing rewrite e…
…xisting files" This reverts commit a361b01.
Configuration menu - View commit details
-
Copy full SHA for 29d460c - Browse repository at this point
Copy the full SHA 29d460cView commit details -
Revert "adding more echos and comments"
This reverts commit d995cb8.
Configuration menu - View commit details
-
Copy full SHA for 4008366 - Browse repository at this point
Copy the full SHA 4008366View commit details -
This reverts commit 0e04ff1.
Configuration menu - View commit details
-
Copy full SHA for c1da604 - Browse repository at this point
Copy the full SHA c1da604View commit details -
Revert "Use wget instead of nltk.download() to avoid 405 error"
This reverts commit 9c68669.
Configuration menu - View commit details
-
Copy full SHA for 7b6beaf - Browse repository at this point
Copy the full SHA 7b6beafView commit details -
Configuration menu - View commit details
-
Copy full SHA for bf2c962 - Browse repository at this point
Copy the full SHA bf2c962View commit details -
Install only WordNet data from NLTK data
1) Faster (Travis doesn't have all day) 2) We only use WordNet at the moment
Configuration menu - View commit details
-
Copy full SHA for 482f01e - Browse repository at this point
Copy the full SHA 482f01eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 00633aa - Browse repository at this point
Copy the full SHA 00633aaView commit details
Commits on Aug 1, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 6f09e31 - Browse repository at this point
Copy the full SHA 6f09e31View commit details
Commits on Aug 7, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 179da05 - Browse repository at this point
Copy the full SHA 179da05View commit details -
1. make use of sample_handler.py to access sample file
2. fix newly occurred pycharm warnings (expect iterator get list)
Configuration menu - View commit details
-
Copy full SHA for 1cf5601 - Browse repository at this point
Copy the full SHA 1cf5601View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d3ad5e - Browse repository at this point
Copy the full SHA 1d3ad5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 81d6892 - Browse repository at this point
Copy the full SHA 81d6892View commit details
Commits on Aug 8, 2017
-
Temporarily disable unit tests for Travis to cache dependencies
Before running unit tests, Travis installs all Perl and Python dependency modules which takes up a lot of time and doesn't always leave enough time (of the available 50 minutes) to complete all the unit tests. After a successful unit test run, Travis caches all the installed dependencies so that it doesn't have to install anymore and can get to running unit tests themselves faster. So, we temporarily disable the unit tests (replace them with a simple "echo" statement) for Travis to be able to install the dependencies and cache them. Subsequent Travis runs (with actual unit tests reenabled) will then be able to use the pre-cached dependencies.
Configuration menu - View commit details
-
Copy full SHA for 8861d9e - Browse repository at this point
Copy the full SHA 8861d9eView commit details -
This reverts commit 36817b9. Caching fails because Travis is unable to find /usr/share/nltk_data for whatever reason: https://travis-ci.org/berkmancenter/mediacloud#L3361 ...and so nothing gets cached (including Perl dependencies which take a long time to install), and so builds time out.
Configuration menu - View commit details
-
Copy full SHA for c732a50 - Browse repository at this point
Copy the full SHA c732a50View commit details -
Revert "Temporarily disable unit tests for Travis to cache dependencies"
This reverts commit 8861d9e.
Configuration menu - View commit details
-
Copy full SHA for 65c505b - Browse repository at this point
Copy the full SHA 65c505bView commit details
Commits on Aug 9, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 73f7e2e - Browse repository at this point
Copy the full SHA 73f7e2eView commit details -
unify the name of models used in each class to self._model as in the …
…abstract class added method named evaluate as in the abstract class
Configuration menu - View commit details
-
Copy full SHA for ef35923 - Browse repository at this point
Copy the full SHA ef35923View commit details -
Configuration menu - View commit details
-
Copy full SHA for 89882cd - Browse repository at this point
Copy the full SHA 89882cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 73e518c - Browse repository at this point
Copy the full SHA 73e518cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e2d6655 - Browse repository at this point
Copy the full SHA e2d6655View commit details -
Merge branch 'topic_modelling' of github.com:berkmancenter/mediacloud…
… into topic_modelling
Configuration menu - View commit details
-
Copy full SHA for 5289a85 - Browse repository at this point
Copy the full SHA 5289a85View commit details -
Configuration menu - View commit details
-
Copy full SHA for 00831af - Browse repository at this point
Copy the full SHA 00831afView commit details
Commits on Aug 12, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 59bcb50 - Browse repository at this point
Copy the full SHA 59bcb50View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c8e6eb - Browse repository at this point
Copy the full SHA 2c8e6ebView commit details
Commits on Aug 13, 2017
-
a finder that can identify the max/min points of a polynomial compute…
…d based on a few points
Configuration menu - View commit details
-
Copy full SHA for d1129a6 - Browse repository at this point
Copy the full SHA d1129a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d5b9e4 - Browse repository at this point
Copy the full SHA 4d5b9e4View commit details
Commits on Aug 14, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 8e77ed4 - Browse repository at this point
Copy the full SHA 8e77ed4View commit details -
added more test cases on checking the accuracy of the model via likel…
…ihood comparisons
Configuration menu - View commit details
-
Copy full SHA for 809aad7 - Browse repository at this point
Copy the full SHA 809aad7View commit details
Commits on Aug 19, 2017
-
Configuration menu - View commit details
-
Copy full SHA for f819366 - Browse repository at this point
Copy the full SHA f819366View commit details -
no longer test tune_with_iteration as polynomial has a sigificant bet…
…ter efficiency and performance I will combine these two later
Configuration menu - View commit details
-
Copy full SHA for 9869ca8 - Browse repository at this point
Copy the full SHA 9869ca8View commit details -
Configuration menu - View commit details
-
Copy full SHA for e185dd0 - Browse repository at this point
Copy the full SHA e185dd0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3545e0e - Browse repository at this point
Copy the full SHA 3545e0eView commit details
Commits on Aug 20, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 7816ec8 - Browse repository at this point
Copy the full SHA 7816ec8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 94ebc24 - Browse repository at this point
Copy the full SHA 94ebc24View commit details -
Configuration menu - View commit details
-
Copy full SHA for c1c257e - Browse repository at this point
Copy the full SHA c1c257eView commit details -
removed uncessary tune_with_iteration as its advantage/feature has be…
…en combined with tune_with_polynomial
Configuration menu - View commit details
-
Copy full SHA for 6d09265 - Browse repository at this point
Copy the full SHA 6d09265View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2479107 - Browse repository at this point
Copy the full SHA 2479107View commit details -
Configuration menu - View commit details
-
Copy full SHA for 51dd0ec - Browse repository at this point
Copy the full SHA 51dd0ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 620afb4 - Browse repository at this point
Copy the full SHA 620afb4View commit details -
Disable unit tests temporarily for Travis to have a chance to compile…
… and cache dependencies
Configuration menu - View commit details
-
Copy full SHA for 5ead4f2 - Browse repository at this point
Copy the full SHA 5ead4f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0fb4e4a - Browse repository at this point
Copy the full SHA 0fb4e4aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 87efd01 - Browse repository at this point
Copy the full SHA 87efd01View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6ea203b - Browse repository at this point
Copy the full SHA 6ea203bView commit details
Commits on Aug 21, 2017
-
Configuration menu - View commit details
-
Copy full SHA for b675559 - Browse repository at this point
Copy the full SHA b675559View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8753442 - Browse repository at this point
Copy the full SHA 8753442View commit details -
Configuration menu - View commit details
-
Copy full SHA for e39415b - Browse repository at this point
Copy the full SHA e39415bView commit details -
Configuration menu - View commit details
-
Copy full SHA for a674d26 - Browse repository at this point
Copy the full SHA a674d26View commit details -
this sample file has been replaced by 3 files with different size
This allows more flexibility in Travis (i.e. use larger samples if we can run tests longer in Travis)
Configuration menu - View commit details
-
Copy full SHA for 6267f72 - Browse repository at this point
Copy the full SHA 6267f72View commit details -
Configuration menu - View commit details
-
Copy full SHA for d4e9d48 - Browse repository at this point
Copy the full SHA d4e9d48View commit details -
1. break large block of codes up to more funcitons
2. improve performance based on empirical results
Configuration menu - View commit details
-
Copy full SHA for 0c3f7ee - Browse repository at this point
Copy the full SHA 0c3f7eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4c12748 - Browse repository at this point
Copy the full SHA 4c12748View commit details -
Configuration menu - View commit details
-
Copy full SHA for 720dd7a - Browse repository at this point
Copy the full SHA 720dd7aView commit details
Commits on Aug 22, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 97afc48 - Browse repository at this point
Copy the full SHA 97afc48View commit details -
Configuration menu - View commit details
-
Copy full SHA for 016d01c - Browse repository at this point
Copy the full SHA 016d01cView commit details
Commits on Sep 1, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 9ff15ff - Browse repository at this point
Copy the full SHA 9ff15ffView commit details