Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example which should use only MRAN #8

Merged
merged 3 commits into from
Nov 19, 2016
Merged

example which should use only MRAN #8

merged 3 commits into from
Nov 19, 2016

Conversation

cboettig
Copy link
Contributor

Don't feel compelled to merge; just an example of the fix I mentioned in #7.

Also I switched from the git clone command to using Docker COPY, which will build the Docker image relative to the local sources instead of the github repo; (e.g. Circle-CI build should pass before merging now, unless I messed something else up).

Not sure if you have a preference one way or the other; normally having git clone in a run command can be nice if you like Docker's ability to cache build layers (having a COPY command breaks the cache; since Docker doesn't know if the file being copied has changed), but if you aren't building locally then it probably doesn't matter. Also I deleted .dockerignore since you actually want the .git dir available for those git2r commands at the end; (wasn't a problem when using git clone since that gave you .git anyhow).

Also I switched from tidyverse to verse since it includes bookdown and friends already and thus the image will build faster; feel free to drop that back to tidyverse though. Either way it should pull all R packages from MRAN now.

@benmarwick benmarwick merged commit 0263f62 into benmarwick:master Nov 19, 2016
@benmarwick
Copy link
Owner

Thanks again, I'm really grateful for your help with these experiments, happy to try anything that makes this process simpler! Looks like the COPY method had a complication, in https://circleci.com/gh/benmarwick/mjbtramp/154 I see Error: Could not find package root. That stumped me, have you seen that before?

I've gone back to git clone to explore further how to get all the dependencies from MRAN, and even with the repos options set to $MRAN, I still see some packages coming from CRAN, which is a bit puzzling. I've posted a question to the Microsoft forum, but not sure if anyone on the MRAN project is watching that, so I've posted also on their checkpoint github repo

@cboettig
Copy link
Contributor Author

The error Could not find package root is very probably the Dockerfile calling devtools::install('.') from outside of the package directory. You'll note in my COPY example I dropped the cd mjbtramp and changed the install path to devtools::install('/mjbtramp'), but if somehow if only one of these things changed and not the other you would get that error.

(It's hard to be sure, because the error isn't on the most recent commit and the circle.yml is pulling the Dockerfile from the master github, so I don't know what Dockerfile that circle build used...) It would be a bit easier to debug if we use the COPY and edit the circle.yml to just build from the local Dockerfile and not pull git (at least I think it would).

@cboettig
Copy link
Contributor Author

Ah, re the installing from CRAN, you need to do:

&& R -e "options(repos='$MRAN'); devtools::install('mjbtramp', dep=TRUE)" 

(like so), or otherwise you need to echo the options(repos='$MRAN') into a .Rprofile. You have it split in two R -e "... lines, and setting options in the first does not persist into the second.

Note that once you set repos with options you don't need to also set it in the install command.

Basically I think this is a devtools bug for not passing the repo arg down, but arguably setting this via options is better anyway.

@benmarwick
Copy link
Owner

Thanks, I've got it back to use COPY and options(repos='$MRAN'); ... now I'm happy with those choices.

It seems like the COPY . /mjbtramp command is only copying the dockerfile, and not the whole repo.

I added an ls command and we can see the output in https://circleci.com/gh/benmarwick/mjbtramp/173, just Dockerfile

Where is the rest of the github repo?

@benmarwick
Copy link
Owner

benmarwick commented Nov 21, 2016

I've added you as a collaborator in case you want to try some quick edits to the Dockerfile.

@cboettig
Copy link
Contributor Author

Ah, I bet that is because you are giving the docker build command the address of the remote Dockerfile. Recall build takes a directory as an argument. Change it to docker build -t benmarwick/mjbtramp . since . indicates that the Dockerfile (and its context, ie the rest of the repo) are found in the current directory.

@cboettig
Copy link
Contributor Author

I think the COPY method should work with the above fix, but not sure if it's truly any better than git clone in the Dockerfile. One last possible problem for the COPY is that I think circl CI does only a shallow clone of the repo, so not sure of that will cause the git2r cmds you have to fail...

@benmarwick
Copy link
Owner

Thanks, yes, that seems to have fixed it on circle, using docker build -t benmarwick/mjbtramp . And all the packages are coming from MRAN now too.

This is making me think more seriously about taking a break from packrat and using MRAN/checkpoint as my precaution against breaking updates from packages.

Thanks again for taking a look here!

@cboettig
Copy link
Contributor Author

Thanks for being willing to experiment, as always!

I do kinda wonder if the version-tagged images, e.g. rocker/verse:3.3.2 as opposed to rocker/verse (which implicitly means rocker/verse:latest) should fix the MRAN snapshot repo as the default. @hadley talked me out of this earlier; pointing out that users should reasonably expect something like install.packages() to install the most recent version of said package; but that was in a more general discussion of the rocker/tidyverse image. The latest tagged images already install from the latest snapshot at build time, rather than from the fixed date. A user bothering to include the tag (particularly if that tag is no longer current; say, rocker/verse:3.1.0 or something) might more rightly expect to be getting older R packages as well; since after all the latest CRAN versions may not even be compatible with the older R versions.

Having MRAN preset in the Dockerfile would avoid the gotcha of doing devtools.install(..., repo='$MRAN') and seeing some packages actually still come from CRAN, and would simplify a Dockerfile like yours a bit further.

The expected workflow then might be to do most regular computing on the latest images, (when one is using Docker at all that is), but when writing stuff for archive, test the software on the most recent versioned image; adding newer stuff in explicitly (perhaps also from MRAN on a later date; possibly from GitHub when necessary, ideally with specific hash). Of course maybe there's a better release interval for this process than relying on R version; e.g. in principle one could automatically tag the Dockerfile version by month or even day.

Thoughts @benmarwick @hadley?

@benmarwick
Copy link
Owner

Yes, I think that's a good idea. Having latest grab the current package versions makes sense. But I'd expect that if I specified the Docker container as rocker/verse:3.1.0 then I'd expect that packages installed into that container would come from a CRAN snapshot from the last day the R v3.1.0 was available. I wouldn't want the default install to grab the most recent versions of packages, since they might not work on v3.1.0. Is this possible to set for github-only packages also?

Setting to a specific date would be very handy also. For example, if I start a new @rstudio project, it would be great to be able to get the date of the .RProj file creation, and use that to set the R version and package versions for that project. And then have options to update that date along the way.

These options would certainly save me a bit of fussing around. But perhaps I'm out on the long tail of R users, with my RStudio projects having half-life of well over a year (and so typically spanning 3-4 minor releases of R, and sometimes major changes in packages)?

@cboettig
Copy link
Contributor Author

Thanks, this is very helpful! Come to think of it, I'm not sure I've ever published a paper with non-trivial amount of code in the space of a single R release myself... That's an important added wrinkle.

For a faster writer whose paper spans only 1-2 minor releases of R, it might be feasible to just lock that project to a particular R version and date and stick with those versions for the duration of the project. For GitHub dependencies, one would have to do something like:

installGithub.r -u FALSE hadley/xml2@86f89395c0e4dc48e854b8aa8f21fec2a6746f4a

(the -u FALSE turns off install_github()'s default behavior of upgrading all dependencies, though if the repo is pre-set to MRAN in options then that wouldn't be a problem.

The date could be the date the project 'started' (see below), though it might be simpler to just use the last tagged version of R (with the MRAN last date for which that version was current). This would suggest that we drop the 3.3.2 tag (or make it identical to latest), and have a 3.3.1 tag currently, pinned on the day before 3.3.2 was released.

For people with longer research half-lives, that's a great question, which I haven't actually given enough thought to. My practice so far has been to keep the project current; i.e. I've been developing on hadleyverse:latest images so usually have a relatively recent library, and deal with stuff breaking as it happens. Maybe that's not ideal, but I don't think I'd be happy to still be happy to be using only older versions of packages on a project that's been running for a year or two.

I would imagine switching the Dockerfile for such a project from latest to the most recent versioned tag when published (or at least when most coding is done), and just fix anything that is broken by the change of moving from latest to the stable version. With luck, things will be backwards compatible enough and everything will be fine. If changing versions are causing major issues, then maybe I'd just run packrat on whatever setup was working and call it day; but could otherwise avoid the added complexity of packrat.

On custom dates: A user could pin MRAN to an arbitrary date if they build the docker stack manually; e.g. clone a copy of the tidyverse dockerfile and build it locally with docker build --build-args BUILD_DATE=<CUSTOM_DATE> -t user/tidyverse tidyverse_dir, and then build their image on top of that, but that is starting to get cumbersome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants