Skip to content
John Cummings edited this page Oct 28, 2015 · 7 revisions

Process:

  • Pdfs are copied to Github, either from Wikimedia Commons or UNESDOC
  • Images are extracted from the pdfs
  • Images are uploaded to Wikimedia Commons linking back to the original document they were taken from
  • Project pages are created to ask people to add title, descriptions and categories for files based on the text of the file.
  • Files are used on Wikimedia projects including Wikipedia

Requirements:

  • A spreadsheet with: names of the files extracted, the names of original document, the date of publication, a description of the original document and the url of the original document. Document description, date of publication and source URL can be hacked together using a spreadsheet supplied by UNESCO if difficult to get from metadata.

  • Images produced need to be in a Wikimedia compatible file format e.g .jpg

  • Name of the original document should be included either in the filename of the new images or each document has its own directory. This is needed as a link back to the source document in Wikimedia Commons, the document could either be linked back to UNESDOC or Wikimedia Commons.

Questions:

  • How can we upload the pdf files to Github?
  • Is there an automated way to upload the files from Github to Commons?

Resources:

  • John Cummings has a spreadsheet containing all the metadata for the open access files on UNESDOC and can transfer them all to Wikimedia Commons using GLAMwiki Toolset.

  • John Cummings may be able supply all pdfs offline if they cannot be scraped from UNESDOC or Wikimedia Commons.

Clone this wiki locally