-
Notifications
You must be signed in to change notification settings - Fork 1
January 20, 2022
alejandropaz edited this page Jan 20, 2022
·
7 revisions
- move to issues to organize tasks
- make Com Canada resource map private?
- finalize any loose ends on Com Canada updates
- benchmarking
- look at txt for scope
- Colin's suggestions about making spreadsheets
- Twitter crawler estimate
- If time: small site crawl? needs post-processing?
- Graham Cloud running on latest OS as far as Shengsong can make out.
- prepare email to CC cloud IT about whether any action needs to be taken to update OS for Graham and Arbutus instances? (seems like we're up to date)
- figure out if SSH keys will be affected by a change to my CC password
- problem with storage -- Nat had a workaround suggestion with a soft linking
- Colin will move some of the folders
- comment on TWINT #1295 that the fix from early December isn't working
- research new Twitter crawlers:
- question 1: how long would it take for you to integrate a new twitter crawler?
- is the current twitter config sufficiently modular to easily swap a new (python?) crawler in?
- question 2: would a javascript crawler that simulates human work better? (Is Apify a javascript crawler?)
- others that Danhua considered: Twarc & Getoldtweets (see here, scroll down)
- also look at Apify & Twitter API
- does the Twitter API have a cost associated?
- Other new twitter crawlers out there?
- question 1: how long would it take for you to integrate a new twitter crawler?
- currently, it seems that none of these non-API twitter crawlers are working
- academic research account:
- file much smaller without plain text
- we're not sure whether the plain text error is produced by crawler or postprocessor
- Shengsong: will move Compute Canada resources map to another space
- Alejandro: send email to Compute Canada
- Alejandro: look at scope text file
- Alejandro: update server notes in new google doc & change password for Graham
- Colin: delete SSH keys from old developers
- Shengsong: set up firewall for our instance, link here
- make sure to enable all ports that we are using, for example, for Jupyter & SSH
- Nat recommends using USW
- Colin: move folders as agreed
- Shengsong: will try to re-start the benchmarking once the re-organization and soft-linking is done
- Colin: will look to see if what's in the postprocessed spreadsheet is the same as in the JSON for plain text to see whether the error is produced by the postprocessor