Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creators: find CERN people #135

Open
ntarocco opened this issue Mar 27, 2024 · 1 comment
Open

Creators: find CERN people #135

ntarocco opened this issue Mar 27, 2024 · 1 comment
Assignees
Labels
Milestone

Comments

@ntarocco
Copy link
Contributor

ntarocco commented Mar 27, 2024

The current names vocab currently has the ORCID dump. We need to add CERN users too.
CERN users can add their ORCID in their computing account profile (via the users-portal): we are modifying the LDAP sync to get this info, see here).

Expected behaviour

When searching and selecting creators in the upload form, I should be able to find all CERN users and other researchers.
Let's take an example: I am searching for John Doe.

Possible scenario

  1. John Doe has an ORCID, and he is in the CERN db. However, he did not add the ORCID to the CERN account.
    When searching for him in the upload form, I will find him 2 times, showing the ORCID and CERN logo and links, as we have no way to deduplicate the user.

  2. John Doe does not have an ORCID, and he is in the CERN db.
    When searching, I will find him 1 time, showing the CERN logo and links.

  3. John Doe does not have an ORCID, and he is in the CERN db. However, he now creates an ORCID and adds it to his CERN account.
    After all the syncing (ORCID and LDAP) completed, I should find him 1 time, showing both the ORCID and CERN logo and links.

Privacy considerations

We cannot expose the CERN db, not the person id of the user. To achieve this, we should:

  1. Change the vocabularies endpoints and make the protected, login required. This has been internally discussed, and everyone agrees. Names vocabulares endpoints require to be logged in to see them #191
  2. The person id cannot be exposed and therefore we might want to generate an hash from the person id, generated with a secret, and use it as name id. Names vocab: Evaluate generate an hash from the person id, generated with a secret, and use it as name id. #192

This level of protection should match the one enforced by the CERN Phonebook. We might want to contact them and ask this.

Technical design

Issue here: #193
EDIT: When creating the names from cern we will use the user id as the id for hte names and hash the person id in the props

There will be 2 types of objects in the names vocab: the ORCID objects, and the CERN objects.

The ORCID objects are inserted or updated via the automatic jobs/dump. Its IDs cannot really be changed as all the code relies on having the ORCID as the id:

- affiliations:
  - name: Northwestern University
  family_name: Findler
  given_name: Robby
  id: 0000-0002-4245-2000
  identifiers:
  - identifier: 0000-0002-4245-2000
    scheme: orcid

The CERN objects could look like this (check props definition in other vocabs):

- affiliations:
  - name: <affiliations from CERN db>
  family_name: Findler
  given_name: Robby
  id: <encrypted-person-id>
  props:
    e-mail: <email>
    username: <username>
    is_cern: True
  identifiers:
  - identifier: <encrypted-person-id>
    scheme: CERN

In general, the vocabularies schemas and features should be enhanced to support:

  1. extra fields for the names. It looks like that it is not the case today, see here.
  2. vocabularies should support deprecation.

First, import all ORCIDs. Then, import the CERN db. When importing the CERN db, we should check if we have any user with an ORCID. If that's the case, 2 options:

  1. John Doe ORCID is not there yet: create it as the one merged, see below
  2. John Does ORCID is already there: merged it, see below.

Merging

Edit: Instead of deprecating we delete the old value on merge.

Issue here: #194

We should mark the CERN object as deprecated (adding a field deprecated: True or similar), and update the ORCID field with the missing info: username, e-mail, encrypted person id and the identifier.
We need to make sure that, with the next ORCID dump, these extra props are kept when updating the object.
The deprecated object should not be findable when searching in OpenSearch.
We should also add an extra prop, to mark what is the new object. Example:

  props:
    deprecated: True
    superseeded_by: <orcid>

The deprecation mechanism (but not the merging) should work for any vocab. The superseeded_by might be a list in case it becomes 2 objects. For example, CERN departments might be split in 2 others in the future. It could be interesting to keep track of this.

Deprecated items should not be found by normal users, however they can be found in the admin panel.

It might be also useful to add another field, for example called managed, to mark that this item is managed by a script and should not be changed by humans. This could block manual edition.

Retention period

We should take into account deletions. When a CERN user is still in the names vocab, but not anymore in the latest LDAP dump, we should remove it from the names vocab. If merged, we should only remove the CERN object.

@ntarocco ntarocco added the bug Something isn't working label Mar 27, 2024
@ntarocco ntarocco added this to 2024/Q4 Mar 28, 2024
@ntarocco ntarocco moved this to Todo dev in 2024/Q4 Mar 28, 2024
@kpsherva kpsherva moved this to Ready in Sprint Q4/2024 🍂 Jul 26, 2024
@ntarocco ntarocco moved this from Todo dev to In Progress in 2024/Q4 Aug 20, 2024
@ntarocco ntarocco moved this from Ready to In progress in Sprint Q4/2024 🍂 Aug 20, 2024
@jrcastro2 jrcastro2 moved this from Backlog to In progress in Sprint Q4/2024 🍂 Aug 29, 2024
@ntarocco
Copy link
Contributor Author

ntarocco commented Aug 30, 2024

@jrcastro2 I have discussed this with Jose and Lars and updated the issue, have a look ;) We will need to create subtasks out of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

4 participants