-
Notifications
You must be signed in to change notification settings - Fork 1
July 21, 2023
alejandropaz edited this page Jul 21, 2023
·
1 revision
- postprocessor: 2 issues: (1) new line character need to be deleted; (2) pandas errors - Gy/Fr
- meet Monday at 1pm
- revisit having a meeting with Nat to go over pandas error - everyone
- rebuild Graham instance - Gy
- check whether storage is being filled with tmp file or actual results - Gy
- look at current NYT archive "Israel" keyword results for date range, and if possible, re-start using only uncrawled date range - Gy
- potentially use new IP and monitor for 400 errors, slowing down or pausing for day if getting too many
- re-structure query for "Middle East" to ensure only relevant results are obtained - Gy
- look at small domain crawls to check for corruption - Gy
- vizualization: trouble shoot error where stacked area graph isn't filled in - Fr
- visualization: add buttons to simplify using different types of charts and graphs and simplify jupyterlab files for users - Fr
- D3: looking at how to convert csv to JSON - Fr
- look at Colin's and Shengsong's instructions about how Alejandro (user) connects to server and decide whether still best way - Fr
- troubleshooting:
- deleted new line character but then different errors about columns
- problem with python file processor_twitter.py: twitter data was organized differently than what we have now, structured differently
- NYT archive crawl is still having errors, maybe stealth mode will help, need to look at whether it is possible to integrate
- Israel domain crawls is still going
- small domain crawl is having trouble due to apify errors, perhaps run crawls separately?
- tmp folder isn't taking up a lot of memory
- check crawl every 2 days - Gy
- update the MVP esp wrt format of data going into postprocessor and coming out, and then as input to the visualization environment - Gy/Fr
- push corrected postprocessor code to master - Gy/Fr
- postprocessor: document with instructions the order of utilities and steps to use the postprocessor - Gy
- backburner: figure out corruption in small domain crawl
- take a look at the brake in the domain crawler and read through - Gy
- look at date range for NYT - Israel & Palestine and send email to Alejandro about what is included - Gy
- look at which folder is taking up most memory on server - Gy
- attempt to combine stealthy crawl on NYT archive - Gy
- continue rebuild Graham instance - Gy
- vizualization: trouble shoot error where stacked area graph isn't filled in - Fr
- visualization: add buttons to simplify using different types of charts and graphs and simplify jupyterlab files for users - Fr
- D3: looking at how to convert csv to JSON - Fr
- look at Colin's and Shengsong's instructions about how Alejandro (user) connects to server and decide whether still best way - Fr