You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The wikifarm I use only offers full gzip xml dumps of all revisions. I do a lot of pywikibot scripting, so I prefer to test offline to avoid constantly slamming the API. Because all the revisions are included in the dump, it takes a while to retrieve the current version of a page. While I was able to use the API with this library to generate a --curonly dump (thanks for making it so easy to set up and use!), I wonder if that procedure could be adapted to make a current versions dump from an existing full local dump.
I started trying to script it myself and it wasn't hard to write a little etree-based function to iterate through a single <page> node and remove all <revision> elements but the latest one, determined by revision_id. However XML is pretty persnickety and I've struggled with incrementally writing to a new file while preserving all other xml data outside of <page> elements so that the pywikibot xmlreader can still interpret it.
I wouldn't mind working on a PR for a new arg like --from-local if folks could point me in the right direction in this code base. Relatedly, an ability to make a test dump with e.g. --max-pages=3 would help me work on my code with a validly constructed compressed xml file.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The wikifarm I use only offers full gzip xml dumps of all revisions. I do a lot of pywikibot scripting, so I prefer to test offline to avoid constantly slamming the API. Because all the revisions are included in the dump, it takes a while to retrieve the current version of a page. While I was able to use the API with this library to generate a
--curonly
dump (thanks for making it so easy to set up and use!), I wonder if that procedure could be adapted to make a current versions dump from an existing full local dump.I started trying to script it myself and it wasn't hard to write a little etree-based function to iterate through a single
<page>
node and remove all<revision>
elements but the latest one, determined byrevision_id
. However XML is pretty persnickety and I've struggled with incrementally writing to a new file while preserving all other xml data outside of<page>
elements so that the pywikibot xmlreader can still interpret it.I wouldn't mind working on a PR for a new arg like
--from-local
if folks could point me in the right direction in this code base. Relatedly, an ability to make a test dump with e.g.--max-pages=3
would help me work on my code with a validly constructed compressed xml file.Beta Was this translation helpful? Give feedback.
All reactions