Playing around with RStudio Package Manager

Playing around with RStudio Package Manager

Managing packages in production is a lot of work: you have to juggle between versions, internal packages, CRAN updates, Bioconductor, GitHub sources… Let’s have a look into RStudio Package Manager, one of the tools available that helps you dealing with this.

What is love RSPM (Baby don’t hurt me, no more 🎶)

RStudio Package Manager (or RSPM for short) is a solution designed to help you to deal with the installation of packages in your organisation. Wether you need to make available to your team the whole CRAN through a proxy, a selection of the CRAN, GitHub packages, internal packages, etc, you can use RSPM as a central point for managing that.

It can also be useful if you have an network connection which is handled through a proxy, as RPSM can be configured to work with a proxy. You’ll then have an internal access to a specific package repository.

Install RSPM

RSPM is available for trial for 45 days, you can get it on RStudio Package Manager. You can install it on Red Hat, CentOS, SUSE and Ubuntu (See here for the precise specs).

Once it is installed and running, the default port it is 4242, so you can access it with http://<adress-of-server>:4242/

 

Inside RSPM

RSPM is structured in repos and sources.

Repos

A repository is a collection of available packages, which are gathered from one or more sources (see below for what sources are). Roughly, what this does is creating a RAN structure (see Dockerise and deploy your own R Archive Repo for more info about a repo structure).

For example, if I go, on my machine, to http://192.168.0.10:4242/thinkr/latest/src/contrib/PACKAGES, I’ll have :

I can also have alternative repos, which contain other packages:

The source of each package is available at http://<server>/<repo>/latest/src/contrib/<my-package>, as you can see on the description page of the package:

And archives are at http://<server>/<repo>/latest/src/contrib/archive/package/dockerfiler_0.0.0.9000.tar.gz.

Sources

As said before, a repo is built on top of one or more sources. In a few words, a source is “where to look out for packages”. It’s what is used by a repo to define which packages are available to download. You can have 3 types of sources:

  • The whole CRAN, pointing to RStudio’s CRAN service
  • Curated list of CRAN packages; i.e a selection of packages (see below)
  • Local source: internal packages

About curated CRAN sources

In an enterprise context, it’s not always easy to give access to the whole CRAN, for a variety of reasons (but mainly for security issues, as IT wants to control what a user installs on a machine).

RStudio Package Manager allows you to do exactly that. You can select a subset of CRAN packages that will be available.

And the good news is that you can delay the upload of the packages, allowing you to wait for a validation:

  • You can produce an output of the packages you want to make available. This output contains the info about the packages and all their dependencies, with information about compilation or licence.

  • Once you have this subset, you can put the add “on hold”, i.e you’ll send the output of the previous command (available also in csv, not only as a console output), to your manager and/or IT department and ask for validation. Let’s assume you do this on the 1st of September. Bureaucracy being what it is, you might get a validation 15 days later. The bad news is, open source being what it is, packages might have been updated in the meantime. Well, the good news is that if you now upload the selection you have put on hold, it will install the package at the exact state it was when you created the list. If once approved and uploaded, you are requested to add a new package, it will also be added as the version it was on the 1st of September.

Here is an example of a repo created with the previous command — as you can see, I now have a “curated” repo, which only contains the necessary packages for installation attempt, dplyr, fcuk and purrr:

And yes, of course, this repo can always be updated to the newest date.

Stats about usage

The RSPM also provides a convenient way to have a look into package downloads: each time a 📦 is installed on a machine, it is tracked inside the Stats Panel:

What now?

Ok, now what? How can I access packages on the RSPM in R?

Well, now you can use the repo url to install packages. Here are several strategies:

Plain old way

install.packages(
  "pkgtest", 
  repos = "http://<server>/<repo>/latest", 
  type = "source"
)

Using {withr}

We can change temporarily the repo R uses for installation:

withr::with_options(
  list(repos = "http://<server>/<repo>/latest"), 
  install.packages("pkgtest")
)

A function to install it

You can also provide a function (so, a package) that does the install:

install_packages <- function(package){
  install.packages(
  package, 
  repos = "http://<server>/<repo>/latest", 
  type = "source"
)
}
install_packages("attempt")

Setting it globally

options(repos = "http://<server>/<repo>/latest")
install.packages("pkgtest")

(Of course, it can be set in ~/.Rprofile).

Upcoming RStudio version

In the upcoming RStudio version (1.2+) that you can get from the dailies, you can change this in the Global Options panel:

I want to learn!

You want to learn how to deploy and orchestrate a RSPM? We can help 😉