-
Notifications
You must be signed in to change notification settings - Fork 313
Workload
Running the Anthology as the editor requires a significant amount of time, estimated to be about 4/6 hours of work a week, without accounting for development. Currently, there are a significant amount of manual processes, which could be somewhat more automated. The job commits the editor to the oversight and intervention of the server and its upkeep (like system administration) constantly, and is also peaky in workload (as new venues and conference get delivered for ingestion at particular times of the year).
We currently ingest about 4K articles per year, in approximately 75 different venues (15 conferences, 60 workshops). An ingestion of a venue varies but on average takes about 2-3 hours. This is why I approximated that it takes roughly 4 hours a week to maintain the Anthology.
-
Ingestion of materials
-
Overview: Assigning IDs. Differences between START versus non-START.
We first assign IDs to the Anthology materials. These are on a public spreadsheet that is accessible from the Anthology’s footer, but also has a shortlink. This step is done pre-production so that IDs are unique to a venue. (per ingestion 10 minutes)
-
Material Types
- Conference
- Workshop
- Journals
Actual ingestion takes the process of receiving the tarball, uploading it to the aclweb.org website and ingesting the metadata on the frontend tech stack (Ruby on Rails, Nginx, Solr). Contributors often do not name sources correctly and their XML has validation problems. Troubleshooting these can take several hours per venue. Also need to generate the reference files (BibTeX), update the single BibTeX file and re-index the database to facilitate search. (per ingestion: 3 hours)
-
Copyright Notices
We archive copyright notices into a local drive that is backed up as part of the ingestion log for the Anthology. (per ingestion: 10 minutes)
-
Supplemental Links
-
Naming Convention
-
Attachments: Datasets, Software, Notes and (general) Attachments
-
Videos
-
Post-Publication
- Posters
- Versions and Errata
- Retractions
Post publication is usually batched but can be one off for urgent matters. I service about 50 of these per year.
(per edit/fix: 30 minutes; batch edits (inclusive of original ingestion of supplemental materials: 1 hour)
-
-
DOI assignment
Assign DOIs based on ACL Anthology ID. Need to create the IDs into a CrossRef uploadable XML file and upload to CrossRef. Then need to import the IDs back into the ACL Anthology so that the DOIs show up in the Anthology web site. We don’t do this service for Non-ACL venues. There are scripts that mostly work well for this. (per venue ingestion: 30 minutes)
-
Anthology Group
Broadcasts Anthology ingestions, occasional service announcements (per venue ingestion: 15 minutes)
-
Volunteer Coordination
-
Software Development
- Presence and Organization on Github
- Docker Image
- Git Issues
- Volunteers’ Meetings
We opportunistically try to clear issues from our Github issues queue, but we depend on volunteers. Without the ACL Exec’s help to pull volunteers and assign them to the Anthology, we are helpless. Issues can take 1-5 hours each of development time, given someone is familiar with the codebase, which itself can take a few days of dedicated time to absorb.
-
ARC Development
This is work we’d love to do but depends on the development time. Creating a new ARC is distributed work and takes a few months and coordination time.
-
-
Reporting / Liasing
-
Requests for copyright clearance
We are CC BY 4.0 so just to reply that it is ok and log the record. (per request: 15 minutes)
-
Indexing (Scopus, Google Scholar)
-
ACL Exec
We have to file reports to the ACL Exec in the AdminWiki (per report: 2 hours; twice a year if requested)
-
Interface with ACL Information Office
- Server Hosting
-