-
Notifications
You must be signed in to change notification settings - Fork 1
June 23, 2022
alejandropaz edited this page Jun 23, 2022
·
3 revisions
- the guardian crawl: filter out comments urls
- NYT Mid E archive: test on new postprocessor
- postprocessor: adding twitter counts to data structure
- update to metascraper to include db to deal with errors and with re-starting after being stopped
- visualizations: figure out jupyter
- Alejandro: need more examples of embedded tweet issue, and send list of visualizations
- small domain: same, pause
- the guardian: same, pause
- NYT Politics Archive postprocessing? not yet started
- Twitter: embedded tweet issue:
- still working on it
- NYT Mid E archive: test on new postprocessor
- KPP postprocessor: missing a few thousand, maybe small code error, troubleshooting
- postprocessor: adding twitter counts to data structure
- done
- update to metascraper to include db to deal with errors and with re-starting after being stopped
- Shengsong will work on this in Arbutus while we await Graham to return
- trying to add multithreading to postprocessor but then ended up slower -- why?, DASK supported
- Nat will look at the video of the postprocessor changes
- DASK method to loop through the data was not very fast, for-loop is faster
- making jupyter notebook work
- some of the earlier work from Alice could be helpful hear, by domain and url for vector diagrams
- Shengsong will send the stacked area charts and one way vector diagrams
- can use D3 for visualizations -- can use it in local server
- send the stacked area charts and one way vector diagrams from KPP data
- troubleshooting errors on postprocessor discovered with KPP testing
- update to metascraper to include db to deal with errors and with re-starting after being stopped - use arbutus
- dask multithreading for postprocessor - trouble-shooting why slower
- consider D3 or similar for visualizing vector diagrams
- Twitter: embedded tweet issue:
- when Graham back on: * the guardian crawl: filter out comments urls
- for next week: consider borealis to store datasets
- Apify pre-navigation: probably need a blacklist for each domain, but could look into it in the future
- using crawler proxies
- adding to regular postprocessor output:
- any non-scope domain hyperlink that ends in .co.il
- any link to a tweet or twitter handle
- This is a bit outside our normal functionality, so I will put it on the backburner for now.
- what to do with htz.li
- finding language function
- image_reference function