Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaf term not used for library_preparation_protocol.library_construction_method.ontology #334

Closed
NoopDog opened this issue May 19, 2021 · 21 comments
Assignees
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Epic operations This issue is an operational task Release 14 for datasets targeted at DCP data release 14

Comments

@NoopDog
Copy link
Collaborator

NoopDog commented May 19, 2021

Child terms were added to these ontology terms after the data was ingested which specifies the end bias explicitly.

It is now possible to use the more specific term for the ontology_ids below.

For context see:

HumanCellAtlas/dcp2#13 and
https://docs.google.com/spreadsheets/d/1Wk7SGxEz00AkNokYv3YlFHJVF9U6THKcrvgHARsnau8/edit#gid=0

Project Detail Page project_id library_preparation_protocol_id ontology_id ontology_label update made exported
https://data.humancellatlas.org/explore/projects/cc95ff89-2e68-4a08-a234-480eca21ce79 cc95ff89-2e68-4a08-a234-480eca21ce79 0882881d-bc39-4b85-b557-e874b93124eb EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/116965f3-f094-4769-9d28-ae675c1b569c 116965f3-f094-4769-9d28-ae675c1b569c 2945bb1f-90de-42a3-afa1-f57a62c853f0 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/4e6f083b-5b9a-4393-9890-2a83da8188f1 4e6f083b-5b9a-4393-9890-2a83da8188f1 58df9607-ab66-48e0-a47b-1f897baae139 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/c4077b3c-5c98-4d26-a614-246d12c2e5d7 c4077b3c-5c98-4d26-a614-246d12c2e5d7 6f399c41-797f-4f69-8719-cbd468478e68 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/abe1a013-af7a-45ed-8c26-f3793c24a1f4 abe1a013-af7a-45ed-8c26-f3793c24a1f4 71efd7ce-0ec0-4423-9eb2-9bd42f40a33f EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/88ec040b-8705-4f77-8f41-f81e57632f7d 88ec040b-8705-4f77-8f41-f81e57632f7d 8b4f9b9b-a1c1-40ec-aab9-ebb61918c01c EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/005d611a-14d5-4fbf-846e-571a1f874f70 005d611a-14d5-4fbf-846e-571a1f874f70 910266c3-64b1-4a3d-a4fe-844be494ffd1 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/f83165c5-e2ea-4d15-a5cf-33f3550bffde f83165c5-e2ea-4d15-a5cf-33f3550bffde 952603f3-cf07-46cf-a439-299f0e71dbca EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/4d6f6c96-2a83-43d8-8fe1-0f53bffd4674 4d6f6c96-2a83-43d8-8fe1-0f53bffd4674 ac399b75-3cc1-4bf0-8d19-8c29c2545402 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/74b6d569-3b11-42ef-b6b1-a0454522b4a0 74b6d569-3b11-42ef-b6b1-a0454522b4a0 c2dacc52-da61-49a6-ac4f-6a684ae45d4f EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/091cf39b-01bc-42e5-9437-f419a66c8a45 091cf39b-01bc-42e5-9437-f419a66c8a45 dc19bb22-ae7b-431b-9b8b-7b49799a8fcd EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/4a95101c-9ffc-4f30-a809-f04518a23803 4a95101c-9ffc-4f30-a809-f04518a23803 e953a093-f5ab-46df-9223-a492f4775d44 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/7b947aa2-43a7-4082-afff-222a3e3a4635 7b947aa2-43a7-4082-afff-222a3e3a4635 f151bb74-b149-4992-9728-923f1943968f EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/8185730f-4113-40d3-9cc3-929271784c2b 8185730f-4113-40d3-9cc3-929271784c2b fa99959f-faa2-4d69-a092-48333e59f5f3 EFO:0009310 10X v2 sequencing
https://data.humancellatlas.org/explore/projects/88ec040b-8705-4f77-8f41-f81e57632f7d 88ec040b-8705-4f77-8f41-f81e57632f7d 88bb0331-4a61-4268-b17b-2310fb47bcb8 EFO:0009898 10x v3 sequencing
https://data.humancellatlas.org/explore/projects/4bec484d-ca7a-47b4-8d48-8830e06ad6db 4bec484d-ca7a-47b4-8d48-8830e06ad6db ab819eae-9eb3-4f12-8b0e-cd4204702512 EFO:0009898 10x v3 sequencing
@clairerye clairerye added the operations This issue is an operational task label May 19, 2021
@rays22 rays22 self-assigned this May 24, 2021
@clairerye
Copy link
Contributor

We agreed we should make these updates, I believe we should be able to update these through the UI but we should time box it, there is a risk as these are likely to be DCP1 projects, many of which are on old schema and have never had updates applied to them.

@rays22 rays22 added the dataset All dataset tickets should have this label, only one ticket per dataset label May 28, 2021
@rays22
Copy link
Contributor

rays22 commented May 28, 2021

We need to break this issue up into 16 manual update in UI --> export sub-tasks (for each project in the table).

@aaclan-ebi aaclan-ebi self-assigned this Jun 23, 2021
@aaclan-ebi
Copy link

aaclan-ebi commented Jun 23, 2021

These DCP1 datasets need to be manually reexported since they're already in Complete state. They will no longer go back to the submittable state (valid) once updates are done either via UI or API. For doing the updates, we could simply just create a script for convenience.

@aaclan-ebi
Copy link

@clairerye the oldest library_preparation_protocol schema version we have in the prod data is https://schema.humancellatlas.org/type/protocol/sequencing/6.1.0/library_preparation_protocol , it has library_construction_method.

If this is urgent, we could create a script to update the schema version to https://schema.humancellatlas.org/type/protocol/sequencing/6.2.0/library_preparation_protocol and the library_contruction_method's new value. Then trigger the reexport.

Or we could wait for bulk updates support: ebi-ait/dcp-ingest-central#147

@ofanobilbao ofanobilbao added the blocked: bulk updates This dataset is blocked by needing bulk updates label Jul 7, 2021
@Wkt8 Wkt8 removed the blocked: bulk updates This dataset is blocked by needing bulk updates label Jul 14, 2021
@ofanobilbao ofanobilbao added the blocked: bulk updates This dataset is blocked by needing bulk updates label Aug 31, 2021
@ofanobilbao ofanobilbao removed the blocked: bulk updates This dataset is blocked by needing bulk updates label Sep 8, 2021
@Wkt8 Wkt8 added the next to be worked on next label Jan 26, 2022
@idazucchi idazucchi self-assigned this Jan 26, 2022
@Wkt8 Wkt8 added the Epic label Jan 26, 2022
@Wkt8
Copy link
Collaborator

Wkt8 commented Jan 26, 2022

Converted this to an epic to capture all dcp1 updates.
According to alegria - we can update and export this in ingest, but this requires deleting several files from the staging area gs bucket.

  • descriptors.json
  • links.json
  • metadata/sequence_file.json

Best to work together with a dev to do this via a script!

@ipediez
Copy link
Contributor

ipediez commented Feb 1, 2022

@jacobwindsor to have a meeting with the wranglers today, including discussion about this. @Wkt8 to send an invite

@Wkt8
Copy link
Collaborator

Wkt8 commented Feb 1, 2022

Had a chat with jacob about using different ontology terms depending on combination of 'method' and 'end bias' - hopefully unblocked!

@jacobwindsor
Copy link
Contributor

jacobwindsor commented Feb 1, 2022

This PR contains the scripts for updating projects. It is broken into two stages: update.py and fix_terra.py. The last stage should only be ran when all submissions are exported.

I have ran the first stage on the first submission and it is currently exporting: https://contribute.data.humancellatlas.org/submissions/detail?uuid=85e72912-9f91-4489-8169-3b43cc65a16a

@jacobwindsor
Copy link
Contributor

jacobwindsor commented Feb 1, 2022

The above submission is now exported and I have removed the appropriate files from the staging area.

Waiting to confirm if this is correct and everything seems okay with this project before proceeding with others. @Wkt8 @aaclan-ebi

Project UUID: cc95ff89-2e68-4a08-a234-480eca21ce79

@jacobwindsor
Copy link
Contributor

@Wkt8 confirmed that the new metadata is correct. @ESapenaVentura confirmed that the terra bucket is correct

@jacobwindsor
Copy link
Contributor

Protocol f151bb74-b149-4992-9728-923f1943968f doesn't exist - replacing with 349b3863-74ca-491f-bc1a-a0f1aeb334bb

@jacobwindsor
Copy link
Contributor

jacobwindsor commented Feb 2, 2022

Please find a CSV below of the applied patches to each protocol.

protocol uuid,text,ontology,label
0882881d-bc39-4b85-b557-e874b93124eb,10X v2 sequencing,EFO:0009899,10x 3' v2
2945bb1f-90de-42a3-afa1-f57a62c853f0,10X v2 sequencing,EFO:0009899,10x 3' v2
58df9607-ab66-48e0-a47b-1f897baae139,10X v2 sequencing,EFO:0009899,10x 3' v2
6f399c41-797f-4f69-8719-cbd468478e68,10X 3' v2 sequencing,EFO:0009899,10x 3' v2
71efd7ce-0ec0-4423-9eb2-9bd42f40a33f,10X 3' v2 sequencing,EFO:0009899,10x 3' v2
8b4f9b9b-a1c1-40ec-aab9-ebb61918c01c,10x Chromium (v2),EFO:0009899,10x 3' v2
910266c3-64b1-4a3d-a4fe-844be494ffd1,Chromium 3' Single Cell v2,EFO:0009899,10x 3' v2
952603f3-cf07-46cf-a439-299f0e71dbca,Chromium 3' Single Cell v2,EFO:0009899,10x 3' v2
ac399b75-3cc1-4bf0-8d19-8c29c2545402,10x v2 sequencing,EFO:0009899,10x 3' v2
c2dacc52-da61-49a6-ac4f-6a684ae45d4f,10x_v2,EFO:0009899,10x 3' v2
dc19bb22-ae7b-431b-9b8b-7b49799a8fcd,10X 3' v2 sequencing,EFO:0009899,10x 3' v2
e953a093-f5ab-46df-9223-a492f4775d44,10X v2 sequencing,EFO:0009899,10x 3' v2
349b3863-74ca-491f-bc1a-a0f1aeb334bb,10X v2 sequencing,EFO:0009899,10x 3' v2
fa99959f-faa2-4d69-a092-48333e59f5f3,10X v2 sequencing,EFO:0009899,10x 3' v2
88bb0331-4a61-4268-b17b-2310fb47bcb8,10x Chromium (v3),EFO:0009922,10x 3' v3
ab819eae-9eb3-4f12-8b0e-cd4204702512,10X v3 sequencing,EFO:0009922,10x 3' v3

The following 15 submissions are now exporting:

fce97270-fce0-4744-8a4e-a93d95521852
85e72912-9f91-4489-8169-3b43cc65a16a
d5410c6e-612d-421a-a66f-2de5e04dd050
7dbcf5ae-f8d7-487c-a3d4-794d8639a1e2
0fb44736-c50f-49ab-bcdc-0985e596b955
064d36ca-ea4d-428f-b30f-0cf5e5350a9d
46fe8bd7-329b-4f09-b227-5ee48c109c16
cad1dcdd-f49a-4f9c-9417-359b0e5bdd12
003e1b7d-9829-4401-b0b9-d819754a6bf3
fd52efcc-6924-4c8a-b68c-a299aea1d80f
5c438071-abc3-4b34-8f55-d90bfe29c404
60ec310d-5f5c-4f09-996b-7555e1bb7d12
1e2601a5-8938-446c-bbb3-1f37a84b11da
668791ed-deec-4470-b23a-9b80fd133e1c
d1610c4a-76c6-4b69-af63-c74af869fa75

@jacobwindsor
Copy link
Contributor

jacobwindsor commented Feb 2, 2022

I have cleaned up the terra staging area for all but 5 submissions.

Three submissions (064d36ca-ea4d-428f-b30f-0cf5e5350a9d, 46fe8bd7-329b-4f09-b227-5ee48c109c16, d1610c4a-76c6-4b69-af63-c74af869fa75) are still exporting.

The below two projects are not DCP1 projects so no terra cleanup needed AFAIK.

4bec484d-ca7a-47b4-8d48-8830e06ad6db
7b947aa2-43a7-4082-afff-222a3e3a4635

@jacobwindsor
Copy link
Contributor

@idazucchi
Copy link
Collaborator

@jacobwindsor to look at the project that's still exporting it might be stuck

@Wkt8
Copy link
Collaborator

Wkt8 commented Feb 4, 2022

Wrangler action to send product import forms about the 15 update tickets

@jacobwindsor
Copy link
Contributor

Only 2616 of the 2618 assays were exported by this job. I will force this job to EXPORTED and the submission and then retry later

@jacobwindsor
Copy link
Contributor

All submissions have now been exported and cleaned up if DCP1

@Wkt8 Wkt8 removed the next to be worked on next label Feb 8, 2022
@Wkt8
Copy link
Collaborator

Wkt8 commented Feb 9, 2022

Worth checking with Alegria and Jeff if there is a faster way of doing the import forms for the updated proejcts.

@Wkt8 Wkt8 added the next to be worked on next label Feb 15, 2022
@ESapenaVentura
Copy link
Collaborator

ESapenaVentura commented Feb 16, 2022

@Wkt8
Copy link
Collaborator

Wkt8 commented Feb 17, 2022

All of the export forms have now been sent!

@Wkt8 Wkt8 removed the next to be worked on next label Feb 17, 2022
@amnonkhen amnonkhen added the Release 14 for datasets targeted at DCP data release 14 label Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Epic operations This issue is an operational task Release 14 for datasets targeted at DCP data release 14
Projects
None yet
Development

No branches or pull requests