Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset update - scRNAseqSystemicComparison #672

Closed
5 tasks
Wkt8 opened this issue Mar 4, 2022 · 6 comments
Closed
5 tasks

dataset update - scRNAseqSystemicComparison #672

Wkt8 opened this issue Mar 4, 2022 · 6 comments
Labels
dataset All dataset tickets should have this label, only one ticket per dataset DCP1.0 Label for datasets in DCP1.0 Release 15 Release 18 DCP Data Release 18 @ 27/6 Release 19 DCP Data Release 19 @ 25/7 task A wrangler task

Comments

@Wkt8
Copy link
Collaborator

Wkt8 commented Mar 4, 2022

Project uuid: 88ec040b-8705-4f77-8f41-f81e57632f7d
Links

Team

  • Wrangler: Wei / Lattice
  • Contributor ??
  • dev: Jacob

we ran into difficulties importing this project into Release 14: 88ec040b-8705-4f77-8f41-f81e57632f7d Specifically, the sequence_file descriptor files are missing values for the content_type and size”properties, which are both required. In order to continue with the Release 14 import process, this project has been removed from the release. Once the missing values are added, could you please resubmit it for Release 15? Sorry again for the inconvenience.

Background
This project was originally exported by lattice in Release 12. Not sure how it made it past validation then. In Release 14 we updated it as part of the 'Updates Release', see ticket #334, and now Jeff has reported that the sequence_files are missing values for content_type and size

Acceptance criteria for the task:

  • the project has been updated and exported and is shown in the data portal as part of release 15
  • update manually the values in the db
  • understand why the files had missing attributes (probably a separate ticket)
  • export project
  • fill import form
@Wkt8 Wkt8 added dataset All dataset tickets should have this label, only one ticket per dataset operations This issue is an operational task task A wrangler task and removed operations This issue is an operational task labels Mar 4, 2022
@ofanobilbao
Copy link
Contributor

@Wkt8 to create a dev ticket to investigate this

@Wkt8
Copy link
Collaborator Author

Wkt8 commented May 18, 2022

This needs to be pushed forwards

@Wkt8
Copy link
Collaborator Author

Wkt8 commented Jun 9, 2022

See product dev ticket ebi-ait/dcp-ingest-central#793

@amnonkhen amnonkhen added the Release 18 DCP Data Release 18 @ 27/6 label Jun 24, 2022
@ESapenaVentura
Copy link
Collaborator

Summary of this ticket:

  • A DCP1 dataset was updated as part of our effort to standardize 10x labels.
  • It was exported with the only metadata flag on
  • Since this is a DCP1 dataset, the size, type etc was never filled in the file_descriptors

This has an easy solution to choose from:

  1. We finally fix the DCP1 datasets to be up to date and exportable
    • This will require re-wiring the way we update datasets. When we are not making data file changes, we should not be generating file metadata (especially descriptors); the importer will start looking for them if those pieces of metadata are available.
  2. We delete the links, file_descriptors and sequence_file metadata from the staging area
    • This is a workaround, we will keep finding this issue

@idazucchi idazucchi added the DCP1.0 Label for datasets in DCP1.0 label Jun 29, 2022
@ESapenaVentura
Copy link
Collaborator

gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/descriptors/ && gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/links/ && gsutil -m rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/metadata/sequence_file/
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/descriptors/sequence_file/0ec5de9f-ef36-40a6-89ff-2c4d6f55bc14_2019-10-09T15:31:16.771000Z.json#1645718832886154...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/descriptors/sequence_file/213d7c08-ee14-488b-9dcc-a3f7d5c942d4_2019-10-09T15:31:16.779000Z.json#1645718834283709...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/descriptors/sequence_file/bdeaabfd-d5ad-4f5b-9620-753115956c83_2019-10-09T15:31:16.763000Z.json#1645718915292756...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/descriptors/sequence_file/de0cfc78-6d03-4ad1-8e3a-6caaefdcb746_2019-10-09T15:31:16.755000Z.json#1645718914001830...
/ [4/4 objects] 100% Done
Operation completed over 4 objects.
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/links/4e2f3838-5282-442c-9a50-dd68fbfb2d1d_2019-10-09T15:32:19.580000Z_88ec040b-8705-4f77-8f41-f81e57632f7d.json#1645718838044689...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/links/71e1a800-327e-4749-a33e-1e088d9054d1_2019-10-09T15:32:19.574000Z_88ec040b-8705-4f77-8f41-f81e57632f7d.json#1645718919723340...
/ [2/2 objects] 100% Done
Operation completed over 2 objects.
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/metadata/sequence_file/0ec5de9f-ef36-40a6-89ff-2c4d6f55bc14_2019-10-09T15:31:16.771000Z.json#1645718831927915...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/metadata/sequence_file/213d7c08-ee14-488b-9dcc-a3f7d5c942d4_2019-10-09T15:31:16.779000Z.json#1645718833637895...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/metadata/sequence_file/bdeaabfd-d5ad-4f5b-9620-753115956c83_2019-10-09T15:31:16.763000Z.json#1645718914570348...
Removing gs://broad-dsp-monster-hca-prod-ebi-storage/prod/88ec040b-8705-4f77-8f41-f81e57632f7d/metadata/sequence_file/de0cfc78-6d03-4ad1-8e3a-6caaefdcb746_2019-10-09T15:31:16.755000Z.json#1645718912940592...
/ [4/4 objects] 100% Done
Operation completed over 4 objects.

@ESapenaVentura
Copy link
Collaborator

Project is ready to be exported. I will fill out the import form

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset All dataset tickets should have this label, only one ticket per dataset DCP1.0 Label for datasets in DCP1.0 Release 15 Release 18 DCP Data Release 18 @ 27/6 Release 19 DCP Data Release 19 @ 25/7 task A wrangler task
Projects
None yet
Development

No branches or pull requests

5 participants