-
Notifications
You must be signed in to change notification settings - Fork 1
Nov 2, 2023
alejandropaz edited this page Nov 2, 2023
·
1 revision
- no errors in processing of scraped results
- Wa/Po: trying to clean dataset and getting errors
- IA results: problem with postprocessor expecting CSV not JSON
- finalizing Mondoweiss in IA: 36,147 successful (not landing page, photos, etc); about 500 not relevant
- develop unit testing for foxnews postprocessed rsults, for example, on text alias - Ar
- Wa/Po twitter data set: look for lines producing errors - Fr
- look for converter for CSV/JSON - Ar
- add debugging to IA crawler like total crawled - Ra
- add documentation about filtering out irrelevant URLs for IA crawler - Ra
- start crawling electronicintifada and nytimes - Ra
- sending email Gy asking about multiple crawlers running at the same time - Ra
- sending email to Nat about difference b/w URLs and new URLs in archive.org data - Ra