As a part of the RStudio internship 2021, I wrote a package called {shinymodels}, the package that allows you to explore a {tidymodels} object in a shiny application. The 12-weeks long internship under the supervision of Max Kuhn and Julia Silge was a great learning experience for me. I have used R for several of my classes and projects but the internship was an unique experience in that it really taught me the difference between running a script versus writing a package in R. As someone who started learning R fairly recently, I was both excited and nervous to spend this summer writing my own package. My day-to-day schedule was very much working on a particular workflow of building a function package in R. This blogpost is a step-by-step guide to that workflow to write a package in R using devtools, usethis, roxygen2, and Github Actions.

Preparation

Create a Github repository

Create a version control repository for the package, I use github for this. If you are new to version control or Github, please read this article by Jenny Bryan. In my experience, Happy Git and GitHub for the useR is the must-do reading for anyone new to working on a project in R and Github. Long story short, a version control system like Github provides a home for our projects on the internet making it easy to share, collaborate, and reproduce any code.

Clone the repository to your R server

Once you create a git repository, you can clone the project to your local server.

Initiate the package

First, install and load all the packages required for your package.

    ```{r}
    install.packages(c("devtools", "roxygen2", "usethis"))
    library(devtools)
    library(roxygen2)
    library(usethis)
    ```

Here is an excellent reading about initiating a package. Now it's time to implement some magic of usethis::use_*() functions. In this example, we will be creating a package with tidyverse interface, so I will be using usethis::create_tidy_package() but feel free to use usethis::create_package(). Below is the code that uses usethis to set up a package.

    ```{r}
    library(usethis)
    
    setwd("~/cu") # replace 'cu' with the name of your package directory
    #initiate the package
    create_tidy_package("cu") #replace with the name of your package
   #add some files and package information
    use_description(
      list(
        `Authors@R` =
          'c(
            person(
              given = "Shisham",
              family = "Adhikari",
              email = "shisham.adhikari@rstudio.com",
              role = c("aut" "cre")
              ),
        Title = "cu",
        Description = "Practice to make a pacakge"
      )
    )
    use_mit_license("GPL")
    use_code_of_conduct()
    use_spell_check()
    use_package("dplyr", "ggplot2")
    ```

Work on the package

If you made this far, congratulations, you have the foundation for your package, now it is time to build up on that foundation. At this stage, your package folder would have three main files to know:

  • R: All your R codes go in the R/directory. You can either use a single .R file for all your functions or create a file for each function or group functions with similar purpose in a file.

  • DESCRIPTION: As the name suggests, this file is where all the important description, aka metadata of your package goes. I used Hadley Wickham's chapter on package metadata to better understand what goes into DESCRIPTION.

  • NAMESPACE: This file contains all the information on imports and exports of your package that is exposed to the users. I have never edited this file by hand, devtools and roxygen2 will take care of it; make sure to add roxygen2 comments to your .R files and run devtools::document() to save the documentation to NAMESPACE.

Now you know what to work on, let's get started with the actual work: functions, documentation, and tests.

Functions

All your R codes for the package functions go to .R files in the R/ directory. I recommend using separate .R file for each function, you can even group the functions by their purpose. For example: let's add a function called is_between() that checks if a number is between the other two numbers, and let's save it to a file named R/is_between.R.

    ```r{}
    is_divisible <- function(a, b, c) {
    # function to check if a is between b and c
    if (b <= a <= c) {
        print(paste(a,"is between", b, "and", c))
      }
     print(paste(a,"is not between", b, "and", c))
    }
    ```

Once you have a function, you can add more functions in the same file or start a new file with new functions. A key step here is to add an roxygen2 tag called @export above the function and run devtools::document(). This will both add the function to the NAMESPACE and export the function for users to use. Your function should look something like:

    ```r{}
    #' @export
     is_between <- function(a, b, c) {
    # function to check if a is between b and c
    if (b < a < c) {
        print(paste(a,"is between", b, "and", c))
      }
     print(paste(a,"is not between", b, "and", c))
    }
    ```

Note that if your function uses function(s) from some other package, you need to import the package. For instance, if you were using a dplyr function between() in the is_between() function, the code will be:

    ```r{}
    #' @export
     is_between <- function(a, b, c) {
    # function to check if a is between b and c
    if dplyr::between(a,b,c) {
        print(paste(a,"is between", b, "and", c))
      }
     print(paste(a,"is not between", b, "and", c))
    }
    ```

Using dplyr::between instead of between is important for R to recognize that you are using the external package dplyr. You also manually have to add the external pacakge, in this case dplyr, under Imports: in the DESCRIPTION file.

    Imports:
        dplyr
        

You can add multiple external packages under Imports, just make sure to put comma after each package name. Beyond Imports, there are also options to add to Depends and Suggests. I use Depends for only the external packages that users need to explicitly load with library() while loading my package. I usually just add the version of R needed for my package and some crucial packages like ggplot2 in the Depends option. All the other external packages used and needed for my package goes under Imports. Finally, only the packages used for development, examples, vignettes, and tests go under Suggests. You can learn more about pacakge dependencies here.

Once you have your function(s), either run devtools::load_all() or Build -> Install and restart to run and load your package. You should then be able to use your package, check your functions in the console.

Documentation

Once you write your function(s), it's time to document it. Again, this is another place you really want to think the difference between running a script versus writing a package; you want to document your package so clearly and thoroughly that a completely new user can understand it. I always put myself on a user's shoes when I am documenting my code. Thanks to roxygen2, most of this is automated for us. You want to use various roxygen2 tags to document your function. For example, I would document our is_between() function as below:

    ```r{}
    #' Check if a number is in between two other numbers
    
    #' This function checks if the first number is between the second and 
    the third number.
    #'
    #' @param a A scalar to check.
    #' @param b The left boundary value (must be a scalar).
    #' @param c The right boundary value (must be a scalar).
    #' @return A character.
    #' @export
    is_between <- function(a, b, c) {
    if dplyr::between(a,b,c) {
        print(paste(a,"is between", b, "and", c))
      }
     print(paste(a,"is not between", b, "and", c))
    }
    ```

To learn more about various roxygen2 tags, please read here.

Once you are satisfied with the documentation, simply run devtools::document() to save the documentation in the man folder. This is what will enable the users to see the documentation using ?is_between. Do not edit these files by hand. Make sure to document well and run devtools::document() anytime you work on your package.

Tests

This is something I learned this summer, anytime you write a function for your package, you want to write tests for it. This not only makes sure that a function is doing what it's supposed to do, it will make your life much easier when trying to debug any kind of issues in the function. Understanding the concept of formal automated testing in R called unit tests and using them generously is a key step to writing a robust R package. Start by creating tests folder in your package directory using usethis::use_testthat(). When possible for every step of your function, you want to write unit tests, which are tests in R that are organised hierarchically: expectations are grouped into tests which are organised in files. To learn more about unit tests, check this out. Once you have written your tests, you can either use devtools::test() or simply click on Run tests to run them. Make sure all your tests pass before you proceed.

Check

Once you have written your functions, documented them, and written tests, it's time to check your package. There are two main ways to check your package:

  • Build Check: In your local server you can either run devtools::check() or click on Built -> Check to check your package. Make sure to run Build -> Install and restart and devtools::document() before you check. Your goal is to make sure that there are no errors, warnings, and notes at the end.

  • Github Actions: I think of Github Actions as little computers which will do what we want them to do. By setting up and using Github actions, we are basically running command check on our package on different kinds of computers with different operating system and different versions of R. This will eliminate the chances of having something in our package that is specific to our device. Additionally, it allows for Continuous Integration (CI), meaning we can run the checks on the package automatically every time there is any change to the source code. Github Actions is a powerful tool to automate checking process for your package, to learn more, read the README.md here. You can set up Github actions using codes similar to the following in your console.

      pr_init("GHA")
      use_tidy_github_actions()
      usethis::use_github_action(url = "https://raw.githubusercontent.com/tidymodels/.github/master/pkgdown.yaml")
      pr_push()
    

Anytime you change and add anything to your package, you want to push the changes to Github and make sure all the Github Actions checks pass.

Miscellaneous
  • Data: If you need data for your package development, you can store them in one of these three folders in your package: data, inst/extdata, and R/sysdata.rda. To learn more about storing data in your package, read this. The easiest way is to use devtools::use_data() on the data object you want to store and show to users. For example:
a <- 2:4
devtools::use_data(a)

This will save the data in data/a.rda and can be accessed while using the package.

  • Vignettes: In addition to the documentation and README.rd in the github repository, you can use vignettes to give users more information and guide about the package. This is a great place to write step-by-step guide to use the package and a good example using your package. I treat vignettes like blogs. You can create vignettes in your package simply by running devtools::use_vignette("users_guide") for <devtools 2.1.0 or usethat::use_vignette("users_guide") for >devtools 2.1.0. This will create a vignettes folder with vignette/users_guide.Rmd in your package folder. You can now edit the vignettes.

  • Pull Requests (PRs): Pull requests in Github are like edit comments and suggestions in a Google doc. By creating a pull request, you are asking to review, change, and comment on your work. Creating pull requests at every required step is a convenient way to both tell others about changes you've pushed to a branch in your repository and get feedback on your code. You can do this directly on Github or in your console using pr_init("first_pr"). Make sure you are on your main branch when you first create the PR and run pr_push() when you are ready to push and create the PR in Github. You can add collaborators and reviewers in your PRs. When someone commits or comments on your PR, you can either accept or decline their suggestions and merge the PR to the main branch. A pro tip: create a draft PR instead of a PR until you are absolutely ready to have your work reviewed.

  • Github Issues: Issues are like to-dos while working on your project. It can be about anything from proposed extensions, bugs., things in progress, and reminders to yourself. Creating, organizing, and prioritizing issues while working on your package will really help you organize your development workflow. Take a look at the Github repositories of few existing R packages to learn more about how to effectively use issues.

  • {styler}: {styler} formats all your code as per the tidyverse style guide. Running {styler} on your package frequently will format your code so that you can just focus on the content of your code. You can install styler using

      ```{r} 
      install.packages("styler")
      ```
    

    You can style your entire package using style_pkg() or individual file using style_file(). You can even apply the styler package using RStudio Addin.

Congratulations, if you made it this far, you have built your own R function package. Now, anyone can install and use your package using the following code.

devtools::install("yourusername/packagename")

I highly recommend reading Hadley Wickham's book on R Packages and Advanced R for anyone looking to write their own R package.