Home

Pdfs are copied to Github, either from Wikimedia Commons or UNESDOC
Images are extracted from the pdfs
Images are uploaded to Wikimedia Commons linking back to the original document they were taken from
Project pages are created to ask people to add title, descriptions and categories for files based on the text of the file.
Files are used on Wikimedia projects including Wikipedia

A spreadsheet with: names of the files extracted, the names of original document, the date of publication, a description of the original document and the url of the original document. Document description, date of publication and source URL can be hacked together using a spreadsheet supplied by UNESCO if difficult to get from metadata.
Images produced need to be in a Wikimedia compatible file format e.g .jpg
Name of the original document should be included either in the filename of the new images or each document has its own directory. This is needed as a link back to the source document in Wikimedia Commons, the document could either be linked back to UNESDOC or Wikimedia Commons.

John Cummings has a spreadsheet containing all the metadata for the open access files on UNESDOC and can transfer them all to Wikimedia Commons using GLAMwiki Toolset.
John Cummings may be able supply all pdfs offline if they cannot be scraped from UNESDOC or Wikimedia Commons.

Provide feedback