Creators: find CERN people #135

ntarocco · 2024-03-27T13:51:04Z

The current names vocab currently has the ORCID dump. We need to add CERN users too.
CERN users can add their ORCID in their computing account profile (via the users-portal): we are modifying the LDAP sync to get this info, see here).

Expected behaviour

When searching and selecting creators in the upload form, I should be able to find all CERN users and other researchers.
Let's take an example: I am searching for John Doe.

Possible scenario

John Doe has an ORCID, and he is in the CERN db. However, he did not add the ORCID to the CERN account.
When searching for him in the upload form, I will find him 2 times, showing the ORCID and CERN logo and links, as we have no way to deduplicate the user.
John Doe does not have an ORCID, and he is in the CERN db.
When searching, I will find him 1 time, showing the CERN logo and links.
John Doe does not have an ORCID, and he is in the CERN db. However, he now creates an ORCID and adds it to his CERN account.
After all the syncing (ORCID and LDAP) completed, I should find him 1 time, showing both the ORCID and CERN logo and links.

Privacy considerations

We cannot expose the CERN db, not the person id of the user. To achieve this, we should:

Change the vocabularies endpoints and make the protected, login required. This has been internally discussed, and everyone agrees. Names vocabulares endpoints require to be logged in to see them #191
The person id cannot be exposed and therefore we might want to generate an hash from the person id, generated with a secret, and use it as name id. Names vocab: Evaluate generate an hash from the person id, generated with a secret, and use it as name id. #192

This level of protection should match the one enforced by the CERN Phonebook. We might want to contact them and ask this.

Technical design

Issue here: #193
EDIT: When creating the names from cern we will use the user id as the id for hte names and hash the person id in the props

There will be 2 types of objects in the names vocab: the ORCID objects, and the CERN objects.

The ORCID objects are inserted or updated via the automatic jobs/dump. Its IDs cannot really be changed as all the code relies on having the ORCID as the id:
- affiliations:
  - name: Northwestern University
  family_name: Findler
  given_name: Robby
  id: 0000-0002-4245-2000
  identifiers:
  - identifier: 0000-0002-4245-2000
    scheme: orcid
The CERN objects could look like this (check props definition in other vocabs):
- affiliations:
  - name: <affiliations from CERN db>
  family_name: Findler
  given_name: Robby
  id: <encrypted-person-id>
  props:
    e-mail: <email>
    username: <username>
    is_cern: True
  identifiers:
  - identifier: <encrypted-person-id>
    scheme: CERN
In general, the vocabularies schemas and features should be enhanced to support:

extra fields for the names. It looks like that it is not the case today, see here.

vocabularies should support deprecation.

First, import all ORCIDs. Then, import the CERN db. When importing the CERN db, we should check if we have any user with an ORCID. If that's the case, 2 options:

John Doe ORCID is not there yet: create it as the one merged, see below

John Does ORCID is already there: merged it, see below.

Merging

Edit: Instead of deprecating we delete the old value on merge.

Issue here: #194

We should mark the CERN object as deprecated (adding a field deprecated: True or similar), and update the ORCID field with the missing info: username, e-mail, encrypted person id and the identifier.
We need to make sure that, with the next ORCID dump, these extra props are kept when updating the object.
The deprecated object should not be findable when searching in OpenSearch.
We should also add an extra prop, to mark what is the new object. Example:
  props:
    deprecated: True
    superseeded_by: <orcid>
The deprecation mechanism (but not the merging) should work for any vocab. The superseeded_by might be a list in case it becomes 2 objects. For example, CERN departments might be split in 2 others in the future. It could be interesting to keep track of this.

Deprecated items should not be found by normal users, however they can be found in the admin panel.

It might be also useful to add another field, for example called managed, to mark that this item is managed by a script and should not be changed by humans. This could block manual edition.

Retention period

We should take into account deletions. When a CERN user is still in the names vocab, but not anymore in the latest LDAP dump, we should remove it from the names vocab. If merged, we should only remove the CERN object.

The text was updated successfully, but these errors were encountered:

ntarocco · 2024-08-30T16:31:03Z

@jrcastro2 I have discussed this with Jose and Lars and updated the issue, have a look ;) We will need to create subtasks out of this.

* closes CERNDocumentServer/cds-rdm#135

ntarocco added the bug Something isn't working label Mar 27, 2024

ntarocco added this to the Migrate Summer Student Project Notes milestone Mar 27, 2024

ntarocco added this to 2024/Q4 Mar 28, 2024

ntarocco moved this to Todo dev in 2024/Q4 Mar 28, 2024

kpsherva added this to Sprint Q4/2024 🍂 Jul 26, 2024

kpsherva moved this to Ready in Sprint Q4/2024 🍂 Jul 26, 2024

anikachurilova assigned anikachurilova and sakshamarora1 and unassigned anikachurilova Aug 15, 2024

ntarocco moved this from Todo dev to In Progress in 2024/Q4 Aug 20, 2024

ntarocco moved this from Ready to In progress in Sprint Q4/2024 🍂 Aug 20, 2024

ntarocco assigned sakshamarora1 and jrcastro2 and unassigned sakshamarora1 Aug 21, 2024

jrcastro2 moved this from Backlog to In progress in Sprint Q4/2024 🍂 Aug 29, 2024

ntarocco added epic and removed bug Something isn't working labels Aug 30, 2024

jrcastro2 added a commit to jrcastro2/invenio-vocabularies that referenced this issue Sep 2, 2024

resource: add permission check to names search

0650d8c

* closes CERNDocumentServer/cds-rdm#135

jrcastro2 mentioned this issue Sep 2, 2024

resource: add permission check to names search inveniosoftware/invenio-vocabularies#396

Merged

jrcastro2 added a commit to jrcastro2/invenio-vocabularies that referenced this issue Sep 2, 2024

resource: add permission check to names search

eb0e339

* closes CERNDocumentServer/cds-rdm#135

ntarocco mentioned this issue Oct 23, 2024

names vocab: allow names vocab to have 2 types of objects #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creators: find CERN people #135

Creators: find CERN people #135

ntarocco commented Mar 27, 2024 •

edited by jrcastro2

Loading

Retention period

ntarocco commented Aug 30, 2024 •

edited

Loading

Creators: find CERN people #135

Creators: find CERN people #135

Comments

ntarocco commented Mar 27, 2024 • edited by jrcastro2 Loading

Expected behaviour

Possible scenario

Privacy considerations

Technical design

Merging

Retention period

ntarocco commented Aug 30, 2024 • edited Loading

ntarocco commented Mar 27, 2024 •

edited by jrcastro2

Loading

ntarocco commented Aug 30, 2024 •

edited

Loading