Edit
Very fast R packages installation with r2u – Bioinformatics Services

Bioinformatics Services

Very fast R packages installation with r2u

Jacques-Henri Lartigue, Grand Prix de l'ACF, June 26th 1912

Introduction

In this post, I want to test r2u, a very rapid and efficient tool to install R packages. It currently supports 19,066 and 18,921 binary packages from CRAN in “focal” and “jammy” respectively. It also supports 207 (focal) and 215 (jammy) BioConductor packages from the 3.15 release. They limited the Bioconductor packages to the ones used in CRAN. Everything is provided as “.deb” binary files with proper dependency resolution by using a proper apt repo which also has a signed Release file.

From their web page, here are its key features:

  • Full integration with apt as every binary resolves all dependencies: No more installations (of pre-built archives) only to discover that a shared library is missing. No more surprises.
  • Full integration with apt so that an update of a system library cannot break an R package: if a (shared) library is used by CRAN, the package manager knows and will not remove it. No more (R package) breakage from (system) library updates.
  • Installations are fast, automated and reversible thanks to the package management layer.
  • Optional (but recommended) use with bspm automagically connects R functions like install.packages() to apt for access to binaries and dependencies.

Creating the singularities

Below we want to measure the time spent on the installation of dplyr and deseq2 from R and using r2u. For this, we are going to use a singularity container. If you do not have singularity already installed, please look at the procedure here.

Let’s build two singularities in sandbox mode (writable). Create a singularity recipe in a Singularity file:

BootStrap: docker
From: ubuntu:focal
%post
    # ~~~~~~ General setup ~~~~~~ #
    # See https://cloud.r-project.org/bin/linux/ubuntu/
    apt update -qq
    export DEBIAN_FRONTEND=noninteractive
    apt-get install --assume-yes --no-install-recommends software-properties-common dirmngr \
    wget build-essential libblas-dev liblapack-dev gcc-10 g++-10 gfortran-10 emacs \
    libcurl4-openssl-dev libxml2-dev libsodium-dev libssl-dev
    # ~~~~~~ R 4.2.0 ~~~~~~ #
    wget -q -O- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc \
    | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
    echo "deb [arch=amd64] https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/" \
        > /etc/apt/sources.list.d/cran-ubuntu.list
    apt update && apt upgrade --yes
    apt install --yes r-base r-base-core

Insert the following code in a script buildSingularity.sh:

#!/usr/bin/bash
singularity build --sandbox sandbox1 Singularity
singularity build --sandbox sandbox2 Singularity

Build the two images using sudo and measure the execution time with time (4:25.84elapsed 68%CPU):

sudo time bash buildSingularity.sh

Installation from R

Run sandbox1:

sudo singularity shell --writable sandbox1

Open R and install dplyr and deseq2:

> start_time1<-Sys.time();install.packages("dplyr");end_time1<-Sys.time()
> install.packages("BiocManager")
> library("BiocManager")
> start_time2<-Sys.time();install("DESeq2");end_time2<-Sys.time()
> end_time1-start_time1
## 1.51152 mins
> end_time2-start_time2
## 12.23746 mins

Installation with r2u

With Docker hub

After installing docker, run the command (0m18,790s):

time docker pull eddelbuettel/r2u:focal

Run the docker:

docker run -it eddelbuettel/r2u:focal

Install dplyr from R:

> start_time<-Sys.time();install.packages("dplyr");end_time<-Sys.time()
> end_time-start_time
## 12.99056 secs

Install deseq2 with apt:

time apt install --yes r-bioc-deseq2
## 0m23.676s

Manually

Run sandbox2 in another terminal:

sudo singularity shell --writable sandbox2

Copy the following script to install-r2u.sh:

#!/usr/bin/bash
## First: update apt and get gpg-agent and key
apt update -qq
apt install --yes --no-install-recommends gpg-agent     # to add the key
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A1489FE2AB99A21A
## Second: add the repo
echo "deb [arch=amd64] https://dirk.eddelbuettel.com/cranapt focal main" > /etc/apt/sources.list.d/cranapt.list
apt update
## Third: ensure R 4.2.0 is used
echo "deb [arch=amd64] https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/" > /etc/apt/sources.list.d/edd-misc.list
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 67C2D66C4B1D4339
## Fourth: add pinning to ensure package sorting
echo "Package: *" > /etc/apt/preferences.d/99cranapt
echo "Pin: origin \"dirk.eddelbuettel.com\"" >> /etc/apt/preferences.d/99cranapt
echo "Pin-Priority: 700"  >> /etc/apt/preferences.d/99cranapt
## Fifth: install bspm and enable it
Rscript -e 'install.packages("bspm")'
RHOME=$(R RHOME)
echo "suppressMessages(bspm::enable())" >> ${RHOME}/etc/Rprofile.site
echo "options(bspm.sudo=TRUE)" >> ${RHOME}/etc/Rprofile.site

Run the script (0m38.782s):

bash install-r2u.sh

Open R and install dplyr:

> start_time<-Sys.time();install.packages("dplyr");end_time<-Sys.time()
> end_time-start_time
## 33.09381 secs

From the command line:

time apt install --yes r-bioc-deseq2
## 4m51.869s

Conclusion

In this post, we have compared the time needed for installing two R packages (dplyr and DESeq2) with or without using r2u. Here is a summary:

Methodpackagetime
From Rdplyr1.51152 mins
From RDESeq212.23746 mins
From r2u dockerdplyr12.99056 secs
From r2u dockerDESeq20m23.676s
From r2udplyr33.09381 secs
From r2uDESeq24m51.869s

Edit