-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use_data_repository #15
Comments
Great Idea!
Because this way we would not need any additional dependencies on the local computer, but we would be dependent on dockerhub to process the files for us!? |
indeed! You can login to Zenodo using github; an access token is easily generated under "applications". @MartinHinz why using the file from dockerhub? when the docker container is successfully build on travis one can directly use this one. Or did I understood some parts of the workflow wrong? This way one does not need to store further environmental variables on travis. |
Right, damn, correct. So it is accessing from travis, not from dockerhub. My mistake |
Here is a nice blog post for Zenodo and github [though without API]: http://computationalproteomic.blogspot.de/2014/08/making-your-code-citable.html --> basically there is more in archiving then just uploading. since we want a DOI etc. perhaps one should put the detailed instructions in how to connect to Zenodo and create a function that creates a container ready for upload to Zenodo? btw: ropensci package causes an error when trying to create a repo [my zenodo token is chosen per default]:
|
Ignore my last post, still not fully familiar with the docker concept, I suppose |
I was imagining this being an infrequent, deliberate, action, not part of the continuous integration cycle. For example when you submit your article for peer review, you Then, after peer review and your paper is accepted 🎉, you |
From what I read, Zenodo archiving something from Github is connected to making a release. And if the connection exists, Zenodo than makes a snapshot from every released version. So is this our way to go? |
I think we can do it directly from our console to zenodo. But it looks like the zenodo package actually does not have any functions we can use yet (karthik/zenodo#14). So let's put this on hold until that pkg gets a bit more love. |
Just a minor comment independent of the actual implementation: In my opinion functions like |
Yes, that is an excellent suggestion, I agree. I guess that @karthik has something like that already in mind for zenodo::zen_file_publish |
Wouldn't it be in our case the most convenient way a two step process, with
|
Yes, that could work. It seems more natural to me to connect from the local repo on my computer directly to zenodo, without counting on GitHub in the middle. That would be simpler and more flexible, to me at least. But let's see what direction they take with the zenodo pkg as it develops further. |
I give you that. The whole thing is centered around Github, so it seemed to me natural to use the existing link Zenodo <-> Github to make it happened. But when thinking about it, at least you could use So surely you are right and we should not make that dependent on Github repo being in existence. |
And the plan is to take advantage of all of that. So the most recent project I worked on with Kirill Muller is Travis and Tic (both of which are available as beta release on ropenscilabs). In short, you can set up a recipe for Zenodo (leveraging the Zenodo package) to create a release for software and data at whatever interval (with versioning support). Once you've set it up and authorized a token, it should just run for any project. |
Thanks Karthik! Are there any of these recipes around for us to take a look at? We should also consider here: |
Hi @benmarwick! Few recipes to consider: A tic file for automatic packagedown docs: https://github.com/krlmlr/tic.package Automatically deploying to drat: https://github.com/krlmlr/tic.drat A rmarkdown site: https://github.com/krlmlr/tic.website An automatic bookdown book: Great idea to add figshare and dataverse. figshare might be a challenge because they have never prioritized their API and are solely focused on enterprise customers. But we can try. |
Thanks again, those are useful to see. Currently I think we want to be pushing our compendium to a data repo independently of our actions on GitHub and Travis. As @nevrome notes above, making a deposit to a data repo should be a deliberate, infrequent action in the life of a project, so we want it to be separate from the push-to-github-trigger-travis process. I'm imagining this just happens 1-3 times in the life of a project. My guess is that we could have something like a What does everyone think? |
I was just reminded by @steko about https://frictionlessdata.io/ and pkgs https://github.com/ropenscilabs/datapkg and https://github.com/christophergandrud/dpmr These look neat, though I've not seen them is use in the wild anywhere, and their download stats are modest. Has anyone else come across these in the research literature? Worth mentioning in the readme? |
Never seen this. But it seems to be a thing. So why not? |
Bookmarking this rOpenSci discussion that just appeared: ropensci-archive/doidata#1 Seems like they might be about to develop a pkg that will answer many of our needs here. Some discussion on twitter at https://twitter.com/noamross/status/948340525492555776 Hopefully that pkg will contain a function to deposit data and obtain a DOI (to a variety of repositories), although I guess that task might be much more complex than getting data using a DOI. |
@benmarwick Right now we're just thinking about downloading the data given a DOI |
Has there been progress on this front? What are your current recommendations for getting a DOI associated with the state of a compendium when the corresponding manuscript was submitted/revised/published? Thanks! |
We haven't seen any recent developments that have made automating this step simpler or more obvious to implement, I mean there is so much variation in current practice it's hard to know what defaults make the most sense. My current recommendations are to use a hook provided by the data repository service (e.g. Zenodo, OSF, Figshare have this) to connect to the GitHub repo with the compendium, and then make a snapshot a version of the GH repo on the data repo at key points (in OSF this is called 'registering' or freezing a version of the repo). I usually snapshot the repo at the point of submission to the journal, and get the DOI of the repo to include in the text of the manuscript. Then snapshot it again after peer review, and again after final acceptance. The DOI stays the same throughout this process, on OSF at least, and any user can see that the data repository has multiple versions and can browse them easily. The repo versions can be tagged with keywords to indicate what part of the process they relate to. This all happens outside of R. And for me, at least, it's something I do infrequently, just a few times per year. So it's not urgent for me to automate or highly streamline these steps at the moment. But I'm keen to know more about how others might imagine how these steps could be incorporated into a function! |
@benmarwick Thank you so much for your detailed explanations. I agree that this process is probably something that can/should be done deliberately and "manually". |
Yes, for now manual handling seems like the best option for this step, at least as far as I can see. I'm curious to see what might pop up in the future to change my mind! |
@benmarwick Sorry to follow up so late, but I just tested out registration/freezing of an OSF project with associated GitHub repository. From what I can see, the DOI of different registrations is not the same. Instead, the project has a fixed DOI and each registration have different ones. I understood you in the way that you publish the DOI of the first registration that you create. Did you mean that you share the project's DOI (which stays the same) and people can then navigate to the "Registrations" tab and see the different registrations/snapshots that exist for the project? Thanks! |
Let's include the approaches discussed here in an informational final step in the readme to suggest how the user can archive their compendiumn on a data repo of their choosing, cf. #56 |
There might be a place for a
use_zenodo
https://github.com/ropensci/zenodo/blob/master/README.mdThe text was updated successfully, but these errors were encountered: