Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associate model with data version #11

Open
jcohenadad opened this issue Apr 23, 2021 · 1 comment
Open

Associate model with data version #11

jcohenadad opened this issue Apr 23, 2021 · 1 comment

Comments

@jcohenadad
Copy link
Member

context

once #9 is merged, the created model should refer to the git-annex data and mention the version.

how to do it?

git commit? tag? any other idea how we can do this @kousu?

once we have a plan, @alexfoias can you pls implement it, thx

@kousu
Copy link

kousu commented Apr 25, 2021

I think you can just paste https://github.com/spine-generic/data-multi-subject/tree/r20201130 in a comment or README somewhere.

Unfortunately I don't know a good universal URL scheme for git repos with versions pinned. git accepts git clone https://whatever.com/repo.git, git clone git+ssh://whatever.com/repo, git clone [email protected]:repo.git, git clone git+https://whatever.com/repo`, but to specify a version you need to use -b and you can only give branches or tags.

To pin a more specific version, you have to use submodules; which is what datalad recommends. But my 3am flippant summary is that submodules are like everything confusing about Git's UI multiplied by 7. And anyway they're kind of an awkward fit here

python extended the URL formats to git+https://whatever.com/repo.git@version, and version can be a branch, tag, or arbitrary commit ID (and it looks like unity copied them too), but that only works under pip.

What about this: write a training script whose first step is git clone -b $PINNED_VERSION git+ssh://data.neuro.polymtl.ca/datasets/model_seg_exvivo_gm-wm_t2_unet2d-multichannel-softseg, and include that script as part of the model. If/when you update the model, first update the training script, and commit that change, before running it, and committing its results. It won't be reproducible by anyone outside the lab but at least it will be, you know, written down. I'd also suggest writing the training script to call script or tee to keep a log of the most recent model training, and committing that file along with it. This is a pretty similar workflow to what I did over in https://github.com/neuropoly/spinalcordtoolbox/blob/5c117ac349eef90528ee7be0edf42c21e31645f2/dev/docs/testimonials2rst come to think of it, except that script isn't smart enough to get its own source material.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants