September 30, 2021

Agenda

Postprocessor update
any compute canada issues -- Colin SSH?
Action items from last day:

kill the running processes of crawler

re-run postprocessor with full output of both NYT & twitter
what to do with the interest?
Alejandro will communicate with Amy about the "output" & "interest-output" distinction
Colin will attempt to stream (or wheatever its called) the interest.json
if time allows, Colin will attempt a visualization

Alejandro question: can we start a large crawl soon?

Postprocessor update

John sent it to Colin, and

New Scope Document that we should use in future: https://docs.google.com/spreadsheets/d/1oYA1dkNvvsz_J5xlhl0_1NrayHbVo2E3W-FTZccL8nA/edit#gid=1838968997

Action Items

John to feed the post-processor the same data again (the output from the twitter and domain crawler) and feed it a different scope based on the tab "formatted_for_Mediacat" in this spreadsheet: https://docs.google.com/spreadsheets/d/1oYA1dkNvvsz_J5xlhl0_1NrayHbVo2E3W-FTZccL8nA/edit#gid=1838968997. Please let us know what data won't transfer to the post-processing scope. This output should be passed to Colin, or communicate roadblocks.
John to review the python script here: https://github.com/UTMediaCAT/mediacat-frontend/blob/master/utils/postprocessing_stacked_area_chart_single_domain_crawl.ipynb and try to run it agains the output from the twitter and domain crawler. We assume that June has built an alternative to the post-processor, and want to see the output. This output should be passed to Colin, or communicate roadblocks.
Colin to try making graphs based on output.json (force vector diagram) and interest-output.json (??). Just the twitter output might be an interesting diagram.
John is also re-running a full crawl with the full scope and taking down some data points about how fast (slow) it is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

September 30, 2021

Agenda

Postprocessor update

Action Items

MediaCat Wiki

Clone this wiki locally