-
Notifications
You must be signed in to change notification settings - Fork 1
September 23, 2021
alejandropaz edited this page Sep 23, 2021
·
2 revisions
- update on Compute Canada: both John and Colin understanding? Follow up questions to Raiyan?
- update on postprocessor: "John will run the postprocessor, and record the size of data from TWINNT & Domain Crawler, and how long it takes, and size of output file"
- single-site post-processor?
- coop hiring
- finalizing paperwork for Colin?
- output of Twinnt wasn't what postprocessor was expecting, so John wrote a bridging function
- memory issue: crawled JSON were too big to read, not complete results
- 10,000 JSONs small ones, 8 were skipped, and all the TWINNT
- total time: 3151 seconds, 52 minutes
- Output format does seem to meet the expectations of the spreadsheet output format
- Problem: can't open the largest JSON (3Gb)
- question: output (regular output) & interest-output (outside of scope):
- John followed up with Raiyan and Raiyan said that the chrome browser should be killed when the crawler terminates, not sure not why happening; suggested reboot (John will test this)
- there are some new files created in the crawler, but we will focus on the output that
- kill the running processes of crawler
- re-run postprocessor with full output of both NYT & twitter
- what to do with the interest?
- Alejandro will communicate with Amy about the "output" & "interest-output" distinction
- Colin will attempt to stream (or wheatever its called) the interest.json
- if time allows, Colin will attempt a visualization