-
Notifications
You must be signed in to change notification settings - Fork 313
Processing attachments and corrections
This page describes the process of handling attachments, corrections, and errata. This is a time-consuming task that is ideally done at least quarterly.
-
Create an ingestion folder in the shared Dropbox folder, e.g.,
Dropbox/Anthology/ingests/2019/2019-12-18-attachments
. -
Download the Microsoft form data used to gather the attachments. Use Microsoft Excel or a compatible program to export it to a UTF-8-encoded CSV file. Save this file as
attachments.csv
-
Process the attachments using the script
acl-anthology/bin/add_attachments.py attachments.csv
This will go through each of the attachments, download them, do some minimal verification, and log everything to
add_attachments.log
. For successful attachments, it will edit the XML in the Anthology repo and put the file in a local mirror of the Anthology attachments, under~/anthology-files/attachments/
. For failed attachments, it will create a fileadd_attachments.log.$ANTH_ID.txt
. This file contains an email you can manually send to the person (first line is email, second is subject, rest is body). -
Commit the repo changes after manually checking them and create a PR.
-
Sync the locally mirrored files to the Anthology:
cd ~/anthology-files rsync -azve ssh attachments/ aclweb:anthology-files/attachments/
Where
aclweb
is an ssh alias to the Anthology host.
-
Start with the CSV-converted Excel Spreadsheet as above.
-
Run the script
bin/extract_corrections_for_processing.py CSV_FILE
. This will create acorrections
directory, with a file for each correction that was submitted. Ideally, this is one line, with three arguments that can be passed to the revision script. -
Manually inspect each file. Correct the explanation to a short, neutral, third-person, scientific account of the changes. Ensure that the file downloads correctly via
wget
. -
Run the script
bin/add_revision.py ANTH_ID "DOWNLOAD_PATH" "EXPLANATION"
. Both DOWNLOAD_PATH and EXPLANATION may have shell meta-characters so quote them. The script attempts to validate that downloaded files are PDFs but the checking may not be perfect. -
Files are again written to
~/anthology-files/pdf/...
. The original is downloaded, copied to v1, and overwritten. The revision is saved as a revision and overwrites the original so it is served by default. -
Double-check and then commit the XML changes that were made. Create a PR. Once it is cleared, rsync the files as above. Then clean out
~/anthology-files/pdf
.
When you're finished, use the script bin/summarize_additions.py
to produce a list of changes.
This script takes the git diff with the corrections on STDIN
, and writes a formatted list of changes to STDOUT
.
Assuming you are on a branch and have committed, I suggest:
git diff master | ./bin/summarize_additions.py | pbcopy
(pbcopy
available only on a Mac).
These should then be announced in the newsgroup.