-
Notifications
You must be signed in to change notification settings - Fork 1
September 30, 2021
-
Postprocessor update
-
any compute canada issues -- Colin SSH?
-
Action items from last day:
kill the running processes of crawler
re-run postprocessor with full output of both NYT & twitter what to do with the interest? Alejandro will communicate with Amy about the "output" & "interest-output" distinction Colin will attempt to stream (or wheatever its called) the interest.json if time allows, Colin will attempt a visualization
- Alejandro question: can we start a large crawl soon?
- John sent it to Colin, and
New Scope Document that we should use in future: https://docs.google.com/spreadsheets/d/1oYA1dkNvvsz_J5xlhl0_1NrayHbVo2E3W-FTZccL8nA/edit#gid=1838968997
-
John to feed the post-processor the same data again (the output from the twitter and domain crawler) and feed it a different scope based on the tab "formatted_for_Mediacat" in this spreadsheet: https://docs.google.com/spreadsheets/d/1oYA1dkNvvsz_J5xlhl0_1NrayHbVo2E3W-FTZccL8nA/edit#gid=1838968997. Please let us know what data won't transfer to the post-processing scope. This output should be passed to Colin, or communicate roadblocks.
-
John to review the python script here: https://github.com/UTMediaCAT/mediacat-frontend/blob/master/utils/postprocessing_stacked_area_chart_single_domain_crawl.ipynb and try to run it agains the output from the twitter and domain crawler. We assume that June has built an alternative to the post-processor, and want to see the output. This output should be passed to Colin, or communicate roadblocks.
-
Colin to try making graphs based on output.json (force vector diagram) and interest-output.json (??). Just the twitter output might be an interesting diagram.
-
John is also re-running a full crawl with the full scope and taking down some data points about how fast (slow) it is.