Development of a webscraping R package and turnkey Rstudio project with {renv}

Context summary

In order to automate and save time, the NGO asked ThinkR to carry out web scraping operations on a website.
The objective is to gain in efficiency on the extraction of information, then to automate these tasks daily, as they were extracted by hand, table by table, and pasted in Excel.

Our intervention

  • Development of a R package containing useful functions for webscraping.
  • Creation of functions to extract the set of resources to scrape.
  • Development of the package using {renv}
  • Package versioned and documented with {fusen}
  • The functions are tested (91.01% coverage)
  • Creation of a function to initiate a turnkey Rstudio project:
    • Embedded project with {renv} to allow the use of functions

Result & added value

The package contains a set of tools and functions to allow automation of scrapping tasks.
The extraction of all the resources now takes only a few seconds compared to several minutes/hours before.
Informational messages allow to follow the evolution of the tasks related to the extraction of the resources.

Functions are tested and documented to allow the package to be well maintained.
Finally, the package allows the creation of a turnkey RStudio project, thanks to {renv} to allow the execution of the functions.
The user is able to use the functions, following the Rmarkdown guide provided, using the same versions of R packages used during development, reducing the risk of future breaking changes.

Our latest Use Cases