-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release data 26/03/2021 #8
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path to the test data should not encode a date or time or any other version signifier. We want Git to do the version tracking for us. We can't release a new set of test data for every schema release. That would defeat the purpose of letting the Git history reflect how the metadata evolves. The schema evolution should be reflected in actual commit differences to existing files, not just in the accumulation of new metadata files, leaving in place the old files that comply with the old schema.
Hi @hannes-ucsc, do you mean the version timestamp in the filenames? All timestamp values are using Also, there are 2 commits in the PR, 1st commit contains the base test data, the 2nd commit contains the diff which is what we want. |
Sorry, I made a mistake. Disregard the my previous comment. I took the branch name for a directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the two commits: It seems that each commit adds a h5ad file. Not sure we need to structure the history that way. We want to be able to correlate a diff in this repository to a diff in the schema repository. So if a schema PR renames an property in a schema, there should be a one or two line diff in the schema PR and as many one line changes in the test data PR as there are instances of that schema in the test data repo.
Anyways, I think the two commits can be squashed simply because the test data project is made up exclusively of matrix files. There are no other sequence of analysis files. There is no initial commit that adds anything worth looking at.
The second commit adds a links.json
but also renames the links.json
from the first commit. Why does it do that? Shouldn't the second commit simply add a links.json
and leave the other one as is? I worry that we have a non-determinism in the UUID generation for the links.json
files.
...criptors/analysis_file/cf5d9300-3b81-52eb-a02f-25fb1364419e_2021-01-01T00:00:00.000000Z.json
Outdated
Show resolved
Hide resolved
491ab2f
to
dd0553a
Compare
We have corrected the problem with the links.json (It was an issue with the processes IDs being auto-generated each time) and re-deployed the test data to match the last changes we pushed to staging (rolling back the project version)\ About the files: I don't think I get the request. Comparing both commits shows the difference with the updated schemas (https://github.com/HumanCellAtlas/schema-test-data/compare/dd0553a98cd4d2eff216a8b9e567a04be85b3e75..7213226d250126a1ac327142663b7221d8506849). In the future, there will only be 1 commit that will compare against master and showcase the updates in the schema, but in this first iteration we also needed to add the baseline data. Let me know if you have more concerns/queries! |
Generally speaking, you can't force push a PR branch and then use the NEW history to question/contradict a claim that I make about the OLD history. I am writing this after I wrote It looks like you already did exactly what I proposed? |
If the answer is yes, then Daniel and I can re-review. He's made good progress on the code that reads this staging area: |
7213226
to
803d0c4
Compare
Sorry about the missunderstanding, I meant the comment about the h5ad files! About the other question, yes that's what we did!
We have solved the problem with the squash so now only the updates are showing Many thanks! |
e3f0355
to
bd3a9d2
Compare
bd3a9d2
to
883e8c1
Compare
Which comment? Maybe post a link to it? |
...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json
Show resolved
Hide resolved
...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
Outdated
Show resolved
Hide resolved
This comment #8 (review) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file tests/links/37e91a9f-b04e-5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
lists process_id
for files that appear to be missing from schema-test-data/tests/metadata/process/
...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
Outdated
Show resolved
Hide resolved
...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
Outdated
Show resolved
Hide resolved
...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
Outdated
Show resolved
Hide resolved
...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json
Show resolved
Hide resolved
...5313-bbc0-4c394406247e_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
Outdated
Show resolved
Hide resolved
That comment does not mention h5ad files. |
It does, in the very first line |
@danielsotirhos I am not sure why there were missing metadata files and incorrect links.json filenames. But I've just re-run the post-processor against the original test data and I believe that corrected the issue. |
@danielsotirhos as alegria pointed out, she has added the missing files and we have corrected the regex. It should be fine now! Please let us know if you find any further issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't addressed. The UUID of two processes is reused as that of links files:
$ find . | grep -E '4da04038-adab-59a9-b6c4-3a61242cc972|d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7'
./tests/metadata/process/4da04038-adab-59a9-b6c4-3a61242cc972_2021-01-01T00:00:00.000000Z.json
./tests/metadata/process/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z.json
./tests/links/4da04038-adab-59a9-b6c4-3a61242cc972_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
./tests/links/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
Each entity must have a universally unique ID (hence the name UUID), even if they are of different types.
...criptors/analysis_file/f0c8ac7c-1c4a-5c9c-9286-c8b16208f9ae_2021-01-01T00:00:00.000000Z.json
Show resolved
Hide resolved
Ahh, I stand corrected. That comment was more about the history than the actual files, hence my confusion with your reference to my comment as being about h5ad files. But, technically you are correct. |
Replied here #8 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't done a deep dive, but approving for fullness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielsotirhos could you also please verify whether the reused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Versions updated:
system/links.json - v3.0.0
module/protocol/matrix.json - v1.0.0
type/protocol/analysis/analysis_protocol.json - v9.2.0
type/file/analysis_file.json - v6.3.0
core/file/file_core.json - v6.2.0
type/file/sequence_file.json - v9.3.0
type/file/supplementary_file.json - v2.3.0
type/file/reference_file.json - v3.3.0
type/file/image_file.json - v2.3.0
module/ontology/data_use_ontology.json - v1.0.0
type/project/project.json - v14.2.0
Issues addressed:
protocol_type
in links schema should be an enum metadata-schema#1333New subgraphs?:
tests/links/d7b8cbff-aee9-5a05-a4a1-d8f4e720aee7_2021-01-01T00:00:00.000000Z_90bf705c-d891-5ce2-aa54-094488b445c6.json
: Cell suspension --> Analysis files