-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZA] Restore plenary and committee appearances for MPs #2458
Comments
Committee appearances restored, plenary deprioritised |
We had another request for the plenary appearances from Megan yesterday:
|
@jacksonj04 I've assigned this ticket to me for now, hope you don't mind! I think it's going to be closely related to the work I'm doing on #2554. |
Parliament’s website has been redesigned and the proceedings are now published as PDFs, but our scraper only handles Microsoft Word documents, since that’s what they used to be published as. Re-writing the scraper and testing it against the new Hansard PDFs will take more time than we have remaining in the current grant period. Perhaps a solution in the meantime would be to link to the PDFs directly from the Hansard pages on pa.org.za? That way at least people can still read the transcripts. |
The new plan is to switch za-hansard to get the Word documents from the PMG website. These documents should be in the same format as the ones we were previously getting from parliament. I've asked PMG for an example of the Word docs, and for details on where we'll be able to get a list of these docs. |
I've switched the We're not quite done though, because the files that PMG have been producing are in newer ".docx" format, but the Hansard parser expects ".doc" files. So I've asked PMG to produce files in the older Word format and upload them to their site, at which point they'll appear in the API. So this issue is on hold until we've tested that the changes are working with the new documents that PMG produce. |
PMG are now uploading old-style Word documents to their API, which means the Hansard transcripts are appearing on the site once again. |
No description provided.
The text was updated successfully, but these errors were encountered: