Release data 26/03/2021 #8

ESapenaVentura · 2021-03-26T11:40:45Z

Versions updated:

system/links.json - v3.0.0
module/protocol/matrix.json - v1.0.0
type/protocol/analysis/analysis_protocol.json - v9.2.0
type/file/analysis_file.json - v6.3.0
core/file/file_core.json - v6.2.0
type/file/sequence_file.json - v9.3.0
type/file/supplementary_file.json - v2.3.0
type/file/reference_file.json - v3.3.0
type/file/image_file.json - v2.3.0
module/ontology/data_use_ontology.json - v1.0.0
type/project/project.json - v14.2.0

Issues addressed:

New subgraphs?:

tests/links/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json: Cell suspension --> Analysis files

hannes-ucsc

The path to the test data should not encode a date or time or any other version signifier. We want Git to do the version tracking for us. We can't release a new set of test data for every schema release. That would defeat the purpose of letting the Git history reflect how the metadata evolves. The schema evolution should be reflected in actual commit differences to existing files, not just in the accumulation of new metadata files, leaving in place the old files that comply with the old schema.

aaclan-ebi · 2021-03-31T12:57:44Z

The path to the test data should not encode a date or time or any other version signifier. We want Git to do the version tracking for us. We can't release a new set of test data for every schema release. That would defeat the purpose of letting the Git history reflect how the metadata evolves. The schema evolution should be reflected in actual commit differences to existing files, not just in the accumulation of new metadata files, leaving in place the old files that comply with the old schema.

Hi @hannes-ucsc, do you mean the version timestamp in the filenames? All timestamp values are using 2021-01-01T00:00:00.000000Z. Instead of 0, we used a "zero" timestamp value which will never change. Is this a problem?

Also, there are 2 commits in the PR, 1st commit contains the base test data, the 2nd commit contains the diff which is what we want.

hannes-ucsc · 2021-04-06T00:18:31Z

Sorry, I made a mistake. Disregard the my previous comment. I took the branch name for a directory.

hannes-ucsc

Regarding the two commits: It seems that each commit adds a h5ad file. Not sure we need to structure the history that way. We want to be able to correlate a diff in this repository to a diff in the schema repository. So if a schema PR renames an property in a schema, there should be a one or two line diff in the schema PR and as many one line changes in the test data PR as there are instances of that schema in the test data repo.

Anyways, I think the two commits can be squashed simply because the test data project is made up exclusively of matrix files. There are no other sequence of analysis files. There is no initial commit that adds anything worth looking at.

The second commit adds a links.json but also renames the links.json from the first commit. Why does it do that? Shouldn't the second commit simply add a links.json and leave the other one as is? I worry that we have a non-determinism in the UUID generation for the links.json files.

...criptors/analysis_file/cf5d9300-3b81-52eb-a02f-25fb1364419e_2021-01-01T00:00:00.000000Z.json

ESapenaVentura · 2021-04-12T16:15:43Z

We have corrected the problem with the links.json (It was an issue with the processes IDs being auto-generated each time) and re-deployed the test data to match the last changes we pushed to staging (rolling back the project version)\

About the files: I don't think I get the request. Comparing both commits shows the difference with the updated schemas (https://github.com/HumanCellAtlas/schema-test-data/compare/dd0553a98cd4d2eff216a8b9e567a04be85b3e75..7213226d250126a1ac327142663b7221d8506849). In the future, there will only be 1 commit that will compare against master and showcase the updates in the schema, but in this first iteration we also needed to add the baseline data.

Let me know if you have more concerns/queries!

hannes-ucsc · 2021-04-13T17:30:42Z

About the files: I don't think I get the request. Comparing both commits shows the difference with the updated schemas

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.

I am writing this after I wrote

#8 (comment)

It looks like you already did exactly what I proposed?

hannes-ucsc · 2021-04-13T17:36:07Z

If the answer is yes, then Daniel and I can re-review. He's made good progress on the code that reads this staging area:

DataBiosphere/hca-metadata-api#42

ESapenaVentura · 2021-04-15T14:43:52Z

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.

I am writing this after I wrote

#8 (comment)

It looks like you already did exactly what I proposed?

Sorry about the missunderstanding, I meant the comment about the h5ad files!

About the other question, yes that's what we did!

We have fixed the newlines but there's been a small problem squashing the commits. Apologies, I take full responsibility about this one, I squashed Alegria's fix of the script into the commit with the regenerated test data. If you could please ignore the post_process.py file, everything else should be in place as requested so you can re-review the test data.

We have solved the problem with the squash so now only the updates are showing

Many thanks!

hannes-ucsc · 2021-04-15T17:14:03Z

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.
I am writing this after I wrote
#8 (comment)
It looks like you already did exactly what I proposed?

Sorry about the missunderstanding, I meant the comment about the h5ad files!

Which comment? Maybe post a link to it?

...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json

...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json

ESapenaVentura · 2021-04-19T10:08:45Z

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.
I am writing this after I wrote
#8 (comment)
It looks like you already did exactly what I proposed?

Sorry about the missunderstanding, I meant the comment about the h5ad files!

Which comment? Maybe post a link to it?

This comment #8 (review)

dsotirho-ucsc

The file tests/links/37e91a9f-b04e-5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json lists process_id for files that appear to be missing from schema-test-data/tests/metadata/process/

...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json

...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json

...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json

hannes-ucsc · 2021-04-20T17:01:14Z

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.
I am writing this after I wrote
#8 (comment)
It looks like you already did exactly what I proposed?

Sorry about the missunderstanding, I meant the comment about the h5ad files!

Which comment? Maybe post a link to it?

This comment #8 (review)

That comment does not mention h5ad files.

ESapenaVentura · 2021-04-20T18:08:28Z

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.
I am writing this after I wrote
#8 (comment)
It looks like you already did exactly what I proposed?

Sorry about the missunderstanding, I meant the comment about the h5ad files!

Which comment? Maybe post a link to it?

This comment #8 (review)

That comment does not mention h5ad files.

It does, in the very first line

aaclan-ebi · 2021-04-26T16:33:07Z

@danielsotirhos I am not sure why there were missing metadata files and incorrect links.json filenames. But I've just re-run the post-processor against the original test data and I believe that corrected the issue.

ESapenaVentura · 2021-04-28T15:53:40Z

@danielsotirhos as alegria pointed out, she has added the missing files and we have corrected the regex.

It should be fine now! Please let us know if you find any further issues

hannes-ucsc

#8 (comment)

wasn't addressed. The UUID of two processes is reused as that of links files:

$ find . | grep -E '4da04038-adab-59a9-b6c4-3a61242cc972|d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7'
./tests/metadata/process/4da04038-adab-59a9-b6c4-3a61242cc972_2021-01-01T00:00:00.000000Z.json
./tests/metadata/process/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z.json
./tests/links/4da04038-adab-59a9-b6c4-3a61242cc972_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
./tests/links/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json

Each entity must have a universally unique ID (hence the name UUID), even if they are of different types.

...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json

hannes-ucsc · 2021-04-29T14:37:48Z

Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history.
I am writing this after I wrote
#8 (comment)
It looks like you already did exactly what I proposed?

Sorry about the missunderstanding, I meant the comment about the h5ad files!

Which comment? Maybe post a link to it?

This comment #8 (review)

That comment does not mention h5ad files.

It does, in the very first line

Ahh, I stand corrected. That comment was more about the history than the actual files, hence my confusion with your reference to my comment as being about h5ad files. But, technically you are correct.

aaclan-ebi · 2021-04-29T15:12:14Z

#8 (comment)

wasn't addressed. The UUID of two processes is reused as that of links files:

$ find . | grep -E '4da04038-adab-59a9-b6c4-3a61242cc972|d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7'
./tests/metadata/process/4da04038-adab-59a9-b6c4-3a61242cc972_2021-01-01T00:00:00.000000Z.json
./tests/metadata/process/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z.json
./tests/links/4da04038-adab-59a9-b6c4-3a61242cc972_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
./tests/links/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json

Each entity must have a universally unique ID (hence the name UUID), even if they are of different types.

Replied here #8 (comment)

ruchim

haven't done a deep dive, but approving for fullness.

hannes-ucsc

~~@danielsotirhos could you also please verify whether the reused~~

aherbst-broad

Looks good.

ESapenaVentura added the Test data update PR related to test data generation label Mar 26, 2021

ESapenaVentura self-assigned this Mar 26, 2021

ESapenaVentura requested review from aaclan-ebi and Wkt8 March 26, 2021 11:47

ESapenaVentura mentioned this pull request Mar 26, 2021

Release from staging to master - 2021-03-17 HumanCellAtlas/metadata-schema#1371

Merged

hannes-ucsc requested changes Mar 30, 2021

View reviewed changes

hannes-ucsc requested changes Apr 6, 2021

View reviewed changes

...criptors/analysis_file/cf5d9300-3b81-52eb-a02f-25fb1364419e_2021-01-01T00:00:00.000000Z.json Outdated Show resolved Hide resolved

aaclan-ebi mentioned this pull request Apr 7, 2021

post process links.json to always have same process uuids ebi-ait/dcp-ingest-central#273

Closed

ESapenaVentura force-pushed the release-data-26/03/2021 branch from 491ab2f to dd0553a Compare April 12, 2021 15:48

aaclan-ebi mentioned this pull request Apr 12, 2021

Added newline when dumping json files to conform to POSIX line #10

Merged

ESapenaVentura force-pushed the release-data-26/03/2021 branch from 7213226 to 803d0c4 Compare April 15, 2021 14:07

Added baseline test data

a5d0eab

ESapenaVentura force-pushed the release-data-26/03/2021 branch 2 times, most recently from e3f0355 to bd3a9d2 Compare April 15, 2021 15:31

Updated test data with latest schemas

883e8c1

ESapenaVentura force-pushed the release-data-26/03/2021 branch from bd3a9d2 to 883e8c1 Compare April 15, 2021 16:07

dsotirho-ucsc reviewed Apr 15, 2021

View reviewed changes

...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json Show resolved Hide resolved

dsotirho-ucsc mentioned this pull request Apr 16, 2021

Add ability to use canned staging areas for unit tests (#38, #39) DataBiosphere/hca-metadata-api#42

Merged

dsotirho-ucsc reviewed Apr 16, 2021

View reviewed changes

...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json Outdated Show resolved Hide resolved

dsotirho-ucsc requested changes Apr 19, 2021

View reviewed changes

hannes-ucsc requested changes Apr 20, 2021

View reviewed changes

...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json Show resolved Hide resolved

...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json Outdated Show resolved Hide resolved

Reran preprocessor against original test data

2c7fbd0

ESapenaVentura mentioned this pull request Apr 27, 2021

Fixes regex issue in CGMs test data HumanCellAtlas/metadata-schema#1384

Merged

hannes-ucsc requested changes Apr 29, 2021

View reviewed changes

...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json Show resolved Hide resolved

hannes-ucsc mentioned this pull request Apr 29, 2021

Document that process IDs are reused for subgraphs HumanCellAtlas/dcp2#20

Open

ruchim approved these changes Apr 29, 2021

View reviewed changes

hannes-ucsc requested a review from kbergin April 29, 2021 17:29

clairerye requested review from aherbst-broad and NoopDog April 29, 2021 19:47

hannes-ucsc self-requested a review April 29, 2021 20:08

hannes-ucsc approved these changes Apr 29, 2021

View reviewed changes

hannes-ucsc requested a review from dsotirho-ucsc April 29, 2021 20:08

dsotirho-ucsc approved these changes Apr 29, 2021

View reviewed changes

aherbst-broad approved these changes Apr 30, 2021

View reviewed changes

NoopDog approved these changes Apr 30, 2021

View reviewed changes

ESapenaVentura merged commit de355ca into master May 5, 2021

ESapenaVentura deleted the release-data-26/03/2021 branch May 5, 2021 13:46

hannes-ucsc mentioned this pull request May 21, 2021

Evaluate and respond to proposed metadata schema changes DataBiosphere/azul#2821

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release data 26/03/2021 #8

Release data 26/03/2021 #8

ESapenaVentura commented Mar 26, 2021 •

edited

Loading

hannes-ucsc left a comment

aaclan-ebi commented Mar 31, 2021

hannes-ucsc commented Apr 6, 2021

hannes-ucsc left a comment

ESapenaVentura commented Apr 12, 2021

hannes-ucsc commented Apr 13, 2021

hannes-ucsc commented Apr 13, 2021

ESapenaVentura commented Apr 15, 2021 •

edited

Loading

hannes-ucsc commented Apr 15, 2021

ESapenaVentura commented Apr 19, 2021

dsotirho-ucsc left a comment

hannes-ucsc commented Apr 20, 2021

ESapenaVentura commented Apr 20, 2021

aaclan-ebi commented Apr 26, 2021

ESapenaVentura commented Apr 28, 2021

hannes-ucsc left a comment

hannes-ucsc commented Apr 29, 2021

aaclan-ebi commented Apr 29, 2021

ruchim left a comment

hannes-ucsc left a comment •

edited

Loading

aherbst-broad left a comment

Release data 26/03/2021 #8

Release data 26/03/2021 #8

Conversation

ESapenaVentura commented Mar 26, 2021 • edited Loading

hannes-ucsc left a comment

Choose a reason for hiding this comment

aaclan-ebi commented Mar 31, 2021

hannes-ucsc commented Apr 6, 2021

hannes-ucsc left a comment

Choose a reason for hiding this comment

ESapenaVentura commented Apr 12, 2021

hannes-ucsc commented Apr 13, 2021

hannes-ucsc commented Apr 13, 2021

ESapenaVentura commented Apr 15, 2021 • edited Loading

hannes-ucsc commented Apr 15, 2021

ESapenaVentura commented Apr 19, 2021

dsotirho-ucsc left a comment

Choose a reason for hiding this comment

hannes-ucsc commented Apr 20, 2021

ESapenaVentura commented Apr 20, 2021

aaclan-ebi commented Apr 26, 2021

ESapenaVentura commented Apr 28, 2021

hannes-ucsc left a comment

Choose a reason for hiding this comment

hannes-ucsc commented Apr 29, 2021

aaclan-ebi commented Apr 29, 2021

ruchim left a comment

Choose a reason for hiding this comment

hannes-ucsc left a comment • edited Loading

Choose a reason for hiding this comment

aherbst-broad left a comment

Choose a reason for hiding this comment

ESapenaVentura commented Mar 26, 2021 •

edited

Loading

ESapenaVentura commented Apr 15, 2021 •

edited

Loading

hannes-ucsc left a comment •

edited

Loading