There are several reasons you would want to deploy your own R archive repo: you don’t want to rely on GitHub for your dev packages, you want to use a more “confidential” way, or maybe (and that’s good enough a reason), you’re a nerd and you like the idea of hosting your own repo. So, here’s how to.
What’s a repo?
Table of Contents
An R archive network / repo is a URL (unique resource locator) where you can download packages from. For example, when you do :
install.packages("attempt")
There is an argument called “repos”, which is defining the spot on the internet where I want R to go and get the package. By default, you don’t have to specify that argument, as it is defined as : getOption("repos")
. For example, right now, on my laptop, I have:
getOption("repos")
## CRAN
## "https://cran.rstudio.com/"
## attr(,"RStudio")
## [1] TRUE
Which indicates that when I try to install a package, R will go an look on the mirror of the CRAN hosted at RStudio. But I could specify any other endpoint:
install.packages(pkgs = "attempt", repos = "http://mirror.fcaglp.unlp.edu.ar/CRAN/", type = "source")
Here, I’m installing {attempt}
from Argentina.
What’s in a RAN?
About install.packages
So, how does this work? What does install.packages
do when it is called?
We will not dive in the precise details, but let’s sum up:
- install.packages goes to the url, and looks for “url/src/contrib”
- in this folder, R looks for a file called
PACKAGES
- R parses this file, isolate the
pkgs
elements, add the necessary elements for the download (version number and other things…) - R download and install the package
It’s “that” simple: if your endpoint has a “src/contrib” folder, if inside this folder there is a PACKAGES
file well filled, and if all the tar.gz are there too, you can install.packages(pkgs = "mypkg", repos = "myrepo", type = "source")
.
The PACKAGES
file
In this file, you’ll need to have an entry for each package in your repo. Each one should be described as:
Package: craneur # The name of your package
Version: 0.0.0.9000 # The version
Imports: attempt, desc, glue, R6, tools # The Imports
Suggests: testthat # The suggests
License: MIT + file LICENSE # The licence
MD5sum: e3ef1ff3d829c040c9bafb960fb8630b # The MD5sum
NeedsCompilation: no # Wether or not your package needs compilation
With {craneur}
Doing this by hand can be cumbersome, so I’ve developped this little package to do this automatically, called {craneur}, that you can get with:
remotes::install_github("ColinFay/craneur")
Here’s how to use it:
library(craneur)
colin <- Craneur$new("Colin")
colin$add_package("../craneur_0.0.0.9000.tar.gz")
colin$add_package("../jekyllthat_0.0.0.9000.tar.gz")
colin$add_package("../tidystringdist_0.1.2.tar.gz")
colin$add_package("../attempt_0.2.1.tar.gz")
colin$add_package("../rpinterest_0.4.0.tar.gz")
colin$add_package("../rgeoapi_1.2.0.tar.gz")
colin$add_package("../proustr_0.3.0.9000.tar.gz")
colin$add_package("../languagelayeR_1.2.3.tar.gz")
colin$add_package("../fryingpane_0.0.0.9000.tar.gz")
colin$add_package("../dockerfiler_0.1.1.tar.gz")
colin$add_package("../devaddins_0.0.0.9000.tar.gz")
colin
## package path
## 1 craneur ../craneur_0.0.0.9000.tar.gz
## 2 jekyllthat ../jekyllthat_0.0.0.9000.tar.gz
## 3 tidystringdist ../tidystringdist_0.1.2.tar.gz
## 4 attempt ../attempt_0.2.1.tar.gz
## 5 rpinterest ../rpinterest_0.4.0.tar.gz
## 6 rgeoapi ../rgeoapi_1.2.0.tar.gz
## 7 proustr ../proustr_0.3.0.9000.tar.gz
## 8 languagelayeR ../languagelayeR_1.2.3.tar.gz
## 9 fryingpane ../fryingpane_0.0.0.9000.tar.gz
## 10 dockerfiler ../dockerfiler_0.1.1.tar.gz
## 11 devaddins ../devaddins_0.0.0.9000.tar.gz
You can then save it with:
colin$write()
You now have a folder you can copy and paste on your server. This server can be your own ftp, a university server, a git repo… anywhere you can point to with a url!
Note: there are other packages that can do this, also. Notably {drat}, {cranlike} or {packrat}.
Creating a server
With Digital Ocean
For the sake of this article, I’ll use a server deployed on Digital Ocean. If you want to try DO, here’s a 10$ coupon (full disclosure: it’s an affiliated link, and I’ll get a 10$ credit if ever you spend 25 there).
As this is not a DO deployment tuto, I’ll skip this part and assume you succeeded to install a server (roughly, it’s juste “create a droplet with ubuntu”, and access with ssh using the password you receive by mail). You can still refer to the doc if you need more info about how to deploy a droplet.
So, I’ve launched my DO server throught ssh (with the password received via email), and installed Docker, following this tutorial.
I now have a digital ocean machine with Docker on it.
The Dockerfile
Let’s write the Dockerfile for our RAN. Basically, we’ll need
- a webserver — which will be launched with the {servr} package (let’s keep the project R-only)
- the ran repo I created earlier
This simple Dockerfile would create a RAN:
library(dockerfiler)
dock <- Dockerfile$new()
dock$RUN("mkdir usr/ran/src/contrib/ -p")
dock$COPY("src/contrib", "usr/ran/src/contrib")
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$EXPOSE(8000)
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/src/contrib\", host = \"0.0.0.0\", port = 8000)'")
dock
FROM rocker/r-base
RUN mkdir usr/ran/src/contrib/ -p
COPY src/contrib usr/ran/src/contrib
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/src/contrib", host = "0.0.0.0", port = 8000)'
But there is a thing that’s missing: what if I want to regenerate a RAN everytime I have a new package? Well, let’s write a different Dockerfile to do that.
A updatable Dockerfile
- First of all, I’ll copy all the packages sources in a pkg folder
pkg <- list.files("../", pattern = "tar.gz", full.names = TRUE)
file.copy(pkg, "pkg")
list.files("pkg")
## [1] "attempt_0.2.1.tar.gz" "craneur_0.0.0.9000.tar.gz"
## [3] "devaddins_0.0.0.9000.tar.gz" "dockerfiler_0.1.1.tar.gz"
## [5] "fryingpane_0.0.0.9000.tar.gz" "jekyllthat_0.0.0.9000.tar.gz"
## [7] "languagelayeR_1.2.3.tar.gz" "prenoms_0.1.0.tar.gz"
## [9] "proustr_0.3.0.9000.tar.gz" "rgeoapi_1.2.0.tar.gz"
## [11] "rpinterest_0.4.0.tar.gz" "tidystringdist_0.1.2.tar.gz"
- I’ll then create a craneur.R (
file.create("craneur.R")
) to automatically launch and write with {craneur} from a folder. It will contain the following code:
library(craneur)
colin <- Craneur$new("Colin")
lapply(list.files("usr/pkg", pattern = "tar.gz", full.names = TRUE), function(x) colin$add_package(x))
colin$write(path = "usr/ran")
- As I want the user to be able to do
http://url
only, and as my RAN index is insrc/contrib
, I’ll create an html that simply does the redirection:
file.create("index.html")
with in it: <body onload=”window.location = ‘src/contrib/index.html'”>
- And here is the new Dockerfile:
dock <- Dockerfile$new()
# Install the packages
dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'install.packages(\"remotes\", repos = \"https://cran.rstudio.com/\")'")
dock$RUN("Rscript -e 'remotes::install_github(\"ColinFay/craneur\")'")
# Create the dir
dock$RUN("mkdir usr/ran -p")
dock$RUN("mkdir usr/pkg -p")
# Move some stuffs
dock$COPY("craneur.R", "usr/pkg/craneur.R")
dock$COPY("pkg", "usr/pkg")
# Copy the index.html
dock$COPY("index.html", "usr/ran/index.html")
# Create the folders
dock$RUN("Rscript usr/pkg/craneur.R")
# Open port
dock$EXPOSE(8000)
# Launch server
dock$CMD("Rscript -e 'servr::httd(\"usr/ran/\", host = \"0.0.0.0\", port = 8000)'")
dock
FROM rocker/r-base
RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'install.packages("remotes", repos = "https://cran.rstudio.com/")'
RUN Rscript -e 'remotes::install_github("ColinFay/craneur")'
RUN mkdir usr/ran -p
RUN mkdir usr/pkg -p
COPY craneur.R usr/pkg/craneur.R
COPY pkg usr/pkg
COPY index.html usr/ran/index.html
RUN Rscript usr/pkg/craneur.R
EXPOSE 8000
CMD Rscript -e 'servr::httd("usr/ran/", host = "0.0.0.0", port = 8000)'
dock$write()
So here, if I build it:
docker build -t ran .
And:
docker run -d -p 80:8000 ran
I can go to http://127.0.0.1/
on my browser, and I’ll get the index of all available packages.
I can now try:
install.packages("attempt", repos = "http://127.0.0.1/", type = "source")
And that works as expected 🙂
To the server and beyond
Let’s copy everything on our server in our ran
folder:
scp torun.R [email protected]:/usr/ran/
scp craneur.R [email protected]:/usr/ran/
scp Dockerfile [email protected]:/usr/ran/
scp -r pkg/ [email protected]:/usr/ran/
scp index.html [email protected]:/usr/ran/
Let’s go to our virtual machine, and run the Dockerfile with the code we’ve just seen.
docker run -d -p 80:8000 ran
And tadaaa : http://206.189.28.254
.
So you can now install from your server:
install.packages("attempt", repos = "http://206.189.28.254", type = "source")
Update your server
So now, the good thing here is that I can update my package server if ever I remove or add a new tar.gz : I’ll just have to rebuild my Docker image.
Further work
Efficient update
Here, to be really efficient, I should split my Docker images in two: one with all the packages, and one with the {craneur} generation : that way, I wouldn’t have to recompile my docker image from scratch everytime I have a modification in the package list.
DNS
A http://206.189.28.254 is not that nice an adress to share or remember, so we could buy a domain and point it to our server. But… that’s for another day 😉