Introduction The discovery and meaningful reuse of research software depends, in part, on accurate metadata. There are many different community standards for describing the contents of a software package, a codebase, or a library package - but these are often substantively different in the attributes that are described, and the details in which they are described - making it difficult to facilitate the discovery of potentially valuable software through APIs or search engines.
Research in a number of domains also shows that a lack of time, incentive, and training in the creation of metadata hampers progress in sharing, and reusing research objects [e.g. Edwards et al., 2011].
This project proposes the exploration of an emerging science software "minimal metadata" standard in JSON-LD - which offers a simple, core set of attributes to uniformly describe scientific software in lightweight semantic format. The JSON-LD minimal standard will ease the burden of creating metadata, and also improves the discoverability of software.
This proposal builds off of the the "code as a research object" project by Mozilla Sciene, Github, and Figshare - and directly contributes to the ongoing work of Matt Jones at NESCENT.
This project will contribute to the development of this standard through:
-
A cross walking exercise that maps attribute value pairs from many different existing standards. As the project matures, these categories will converge on a minimum set.
-
Use Cases from ESIP community
Developing a descriptive metadata standard is a first step in achieving a robust network of software archives. Tools like Fidgit - which are developed to lower the barrier to archiving and obtaining a persistent identifier for code can then be leveraged and used by this network.
Some questions and some (preliminary answers):
Doesn't the DOAP project propose to do exactly this?
The Description Of A Project (DOAP) ontology is very relevant to this project- but in short, No.
DOAP proposes an "XML/RDF vocabulary to describe software projects, and in particular open source projects"
Here, we're interested in exploring a number of different standards, including DOAP - and finding a minimal - or 'core' set of attribute value pairs (in the Dublin Core / Darwin Coresense) for describing scientific software - notably Earth System Science software that is developed and used in the ESIP community.
Two limitations to depending - solely- on DOAP for doing this:
-
It is aimed at RDF - and well, RDF gets complicated quickly. We want to do something lightweight and easily adoptable.
-
Part of the ambition in using JSON-LD is so that metadata creation can be automated in the future... and that future seems a lot more distant in an RDF / DOAP world.
Why JSON-LD?
JSON-LD is a lightweight format that offers semantic meaning at a substantially lower barrier of creation than RDF, it has emerged over the last 2-3 years as the standard for serving data via APIs, and more importantly for our work here, it can leverage existing ontologies, like 'creative works' in schema.org, for describing a codebase.
Why not XML
XML can play too.
Major work of the use cases will be on the following:
-
Testing and exploring the use of different subject categories [i.e., we believe we can do better than the original proposal for using PLoS's taxonomy of academic subjects]
-
Coming up with guidelines and suggest practices for describing the function of the software.
Name | Role | Description |
---|---|---|
Nic Weber | Role | PhD candidate in Information Science at University of Illinois. Experience in developing standards and policy for Data Conservancy, and linked data applications. |
You | Your Role | Your qualifications |
-
Our initial short term work will complete a community scan / cross-walking of existing standards. We'll build off of the existing work from Mozilla Science and contribute back to the community that is driving this work.
-
A set of ESIP community use cases.
-
A white paper recommendation [?]
-
A formal publication - in an open access journal - that describes the use cases in detail, as well as the progress that we've made.
Edwards, P., Mayernik, M. S., Batcheller, A., Bowker, G., & Borgman, C. (2011). Science friction: Data, metadata, and collaboration. Social Studies of Science, DOI: 10.1177/0306312711413314