-
Notifications
You must be signed in to change notification settings - Fork 1
Home
- Pdfs are copied to Github, either from Wikimedia Commons or UNESDOC
- Images are extracted from the pdfs
- Images are uploaded to Wikimedia Commons linking back to the original document they were taken from
- Project pages are created to ask people to add title, descriptions and categories for files based on the text of the file.
- Files are used on Wikimedia projects including Wikipedia
-
A spreadsheet with: names of the files extracted, the names of original document, the date of publication, a description of the original document and the url of the original document. Document description, date of publication and source URL can be hacked together using a spreadsheet supplied by UNESCO if difficult to get from metadata.
-
Images produced need to be in a Wikimedia compatible file format e.g .jpg
-
Name of the original document should be included either in the filename of the new images or each document has its own directory. This is needed as a link back to the source document in Wikimedia Commons, the document could either be linked back to UNESDOC or Wikimedia Commons.
- How can we upload the pdf files to Github?
- Is there an automated way to upload the files from Github to Commons?
-
John Cummings has a spreadsheet containing all the metadata for the open access files on UNESDOC and can transfer them all to Wikimedia Commons using GLAMwiki Toolset.
-
John Cummings may be able supply all pdfs offline if they cannot be scraped from UNESDOC or Wikimedia Commons.