Skip to content

ORCID Overview

Nuno Macedo edited this page Jun 12, 2017 · 10 revisions

ORCID Service Overview

ORCID is a community-based service that aims to provide a registry of unique researcher identifiers (an ORCID iD) and a method of linking research outputs to these identifiers, based on data collected from external sources. Since an ORCID user profile is populated by these different external sources automatically, a user profile typically contains different works that actually describe the same research output (possibly containing different or even contradictory meta-data). The distinctive feature of the ORCID service is that, to ease the management of the profile, works that describe the same output are grouped together, showing only the preferred one in the web interface overview (more on this process below).

Data model

The ORCID data model is described by an XSD schema. This data model supports the registration of various relevant research activities, including research outputs, funding projects and profissional career. As of PTCRISync v1.0, version 2.0 of the ORCID data model is supported. Moreover, PTCRISync v1.0 handles solely the synchronization of research productions (i.e., ORCID works). The synchronization of the remainder activities is work in progress. Thus, for the purpose of PTCRISync, an ORCID user profile is simply a set of works.

The ORCID schema supports both the definition of (full) works and work summaries. The latter contain information regarding external identifiers, titles, production type and publication dates. The former may contain additional meta-data like contributors, venue information and a short description. The synchronization algorithms will often rely solely on work summaries to avoid the retrieval of full complete works, which would have a toll on performance.

ORCID uses putcodes to uniquely identify activities within a user profile. These are automatically generated for newly created works, and can be used to update existing works.

Each activity is also assigned with a visibility setting, restricting who can access that information. Visibility may be public (visible everyone), limited to trusted parties (visible to parties to which the user has granted read privileges) or private (is visible only to the user).

External identifiers, production matching and groups

The main conceptual difference between ORCID and typical CRIS services is that ORCID groups together works that are considered to represent the same production. The grouping mechanism is quite simple, and just assumes that two works are similar if, and only if, they share an external identifier or there is another work that is similar to both. Essentially, this recursive definition considers two works to be similar if, and only if, they directly or indirectly (via transitivity) share some external identifier. External identifiers are standard identifiers (e.g., DOI codes) that are assumed to uniquely identify productions. As of version 2.0 of the ORCID schema, these groups are intrinsic to the ORCID data model and used in the API.

Each group is comprised by a set of work summaries and a set of external identifiers, aggregating the external identifiers from all the works in the group. The collection of work summaries within a group is ordered: the first work of the collection is considered to be the one preferred by the user. To retrieve the complete meta-data of a work, the user must first find its putcode (the internal identifier) and then explicitly request it (more on the ORCID API below).

The ORCID service forbids works from the same source to share external identifiers: if an external source tries to add a work with an external identifier that is shared with a previous work owned by that source, a conflict error is returned. The ORCID team also expects most of the works in its database to have at least one external identifier associated, so the API forces every work that is introduced to have some external identifier assigned (even if it is an identifier that only makes sense for the external source). As of ORCID 2.0, this is still not enforced in user the web interface.

PTCRISync builds on this notion of matching local CRIS productions with remote ORCID works based solely on these external identifiers. The set of supported identifiers is the same as the one supported by ORCID. The PTCRISync synchronization procedures rely on this matching both to identify local works that need to be updated and to identify productions that are yet to be represented locally.

ORCID Member API

ORCID provides two distinguished APIs: one that is public and another that is reserved for members. The public API allows any user or service to read the public profile of a user, while the Member API allows clients to add and remove information from the user’s ORCID profile. If allowed by the ORCID user, it is also able to read information from the user’s profile set as semi-private (Trusted Parties), unlike the public API that reads only public items. As of version 2.0 of the ORCID API, the member API allows services to add, update and delete works. Interested services can register to Member credentials here.

The Member API relies on a 3-legged OAuth authentication protocol between the ORCID service, the user and the Member client (i.e., the interested CRIS service). Once the CRIS service is registered as a Member, the following "dance" is performed to access the data from a user's ORCID profile: get an authorization code from the user, use the authorization code to request an access token from the ORCID service, and use the tokens to perform calls to the Member API. Authorization codes are requested for specific scopes that restrict the permissions of the client. These scopes involve permissions to read, update activities and update the user's biographic information. As of version 1.0, PTCRISync does not require permissions to update biographic information.

Communication with the ORCID service is performed through its RESTful API on the ORCID iD of a user (resource /[orcid_id] of the request URL). For the purpose of PTCRISync 1.0, we focus on API calls over works (i.e., resource /works of the request URL). GET requests can be used either to retrieve every work summary (if called on /[orcid_id]/works) or, once the putcode of a particular activity is known, to retrieve the complete work record (if called on /[orcidid]/works/[putcode]. POST requests can be used to create new works (/[orcid_id]/works), while PUT and DELETE requests can be used to update and delete a particular work, respectively (/[orcid_id]/works/[putcode]). PTCRISync provides a client that encapsulates this communication layer. As of version 2.0 of the ORCID API, full productions can be read (GET) and added (POST) in bulk to reduce communication overheads, which exploited by PTCRISync v1.0.

Works added by a member service will automatically have their source set to that member; a fundamental constraint is that a member service may only update and delete works whose source is itself. The service is also unable to directly modify the set of preferred works selected by the user, although that may happen indirectly if a preferred work is deleted from the ORCID profile, or if a new work unifies two groups of similar works, which must have exactly one preferred work. Due to this fact, the impact of an update is not the same as a delete/create sequence of operations. In the latter case, it is not always clear how the new preferred work of the group is selected by the ORCID service (i.e., how the grouped works are ordered).