43 Introduction to the PEcAn R API

43.1 Introduction

The PEcAn API package (pecanapi) is designed to allow users to submit PEcAn workflows directly from an R session. The basic idea is that users build the PEcAn settings object via an R script (manually, or using the included helper functions) and then use the RabbitMQ API to send this object to a Dockerized PEcAn instance running on a local or remote machine.

pecanapi is specifically designed to only depend on CRAN packages, and not on any PEcAn internal packages. This makes it easy to install, and allows it to be used without needing to download and install PEcAn itself (which is large and has many complex R package and system dependencies). It can be installed directly from GitHub as follows:

## devtools::install_github("pecanproject/pecan", subdir = "api")
library(pecanapi)

## Error in library(pecanapi): there is no package called 'pecanapi'

This vignette covers the following major sections:

Initial setup goes over the configuration, both inside and outside R, required to make pecanapi work.
Registering a workflow goes over how to register a PEcAn workflow with the PEcAn database, including searching for the required site and model IDs
Building a settings object covers how to configure a PEcAn workflow using the PEcAn settings list.
Finally, submitting a run covers how to submit the complete settings object for execution.

43.2 Initial setup

This tutorial assumes you are running a Dockerized instance of PEcAn on your local machine (hostname localhost, port 8000). To check this, open a browser and try to access http://localhost:8000/pecan/. If you are trying to access a remote instance of PEcAn, you will need to substitute the hostname and port accordingly.

To perform database operations, you will also need to have read access to the PEcAn database. Note that the PEcAn database Docker container (postgres) does not provide this by default, so you will need to open port 5432 (the PostgreSQL default) to that container. You can do this by creating a docker-compose.override.yml file with the following contents in the root directory of the PEcAn source code:

version: "3"
services:
  postgres:
    ports:
      - 5432:5432

Here, the first port is the one used to access the database (can be any open port; most PostgreSQL applications assume 5432 by default), and the second is the port the database is actually running on (which will always be 5432). After making this change, reload the postgres container by running docker-compose up -d. To check that this works, open an R session and try to create a database connection object to the PEcAn database.

con <- DBI::dbConnect(
  RPostgres::Postgres(),
  user = "bety",
  password = "bety",
  host = "localhost",
  port = 5432
)
DBI::dbListTables(con)[1:5]

This code should print out five table names from the PEcAn database. If it throws an error, you have a problem with your database connection.

The rest of this tutorial assumes that you are using this same database connection object (con).

In addition, any API operations that modify the database will not work unless a user ID is set. To avoid having to manually specify the ID each time, we can set it via options:

options(pecanapi.user_id = 99000000002)

The pecanapi package has many other options that it uses for its default configuration, including the Docker server and RabbitMQ hostname and credentials. To learn more about them, see ?pecanapi_options.

43.3 Registering a workflow with the database

For the PEcAn workflow to work, it needs to be registered with the PEcAn database. In pecanapi, this is done via the insert_new_workflow function.

Building a workflow requires two important pieces of information: the model and site IDs. If you know these for your site and model, you can pass them directly into insert_new_workflow. However, chances are you may have to look them up in the database first. pecanapi provides several search_* utilities to make this easier.

First, let’s pick a model. To list all models, we can run search_models with no arguments (other than the database connection object, con).

models <- search_models(con)

We can narrow down our search by model name, revision, or “type”.

search_models(con, "ED")
search_models(con, "sipnet")
search_models(con, "ED", revision = "git")

Note that the search is case-insensitive by default, and searches before and after the input string. See ?search_models to learn how to toggle this behavior. For the purposes of this tutorial, let’s use the SIPNET model because it has low input requirements and runs very quickly. Specifically, let’s use the 136 version. We could grab the model ID from the search results, but pecanapi also provides an additional helper function for retrieving model IDs if you know the exact name and revision.

model_id <- get_model_id(con, "SIPNET", "136")
model_id

We can repeat this process for sites with the search_sites function (though there is currently no get_site_id function). Note the use of % as a wildcard (matches zero or more of any character, equivalent to the regular expression .*). The two sites in the search below are largely identical, so we’ll use the one with more site information (i.e. where mat is not NA).

all_umbs <- search_sites(con, "umbs%disturbance")
all_umbs
site_id <- subset(all_umbs, !is.na(mat))[["id"]]

With site and model IDs in hand, we are ready to create a workflow.

workflow <- insert_new_workflow(con, site_id, model_id, start_date = "2004-01-01", end_date = "2004-12-31")
workflow

The insert_new_workflow function inserts the workflow into the database and returns a data.frame containing the row that was inserted.

43.4 Building a settings object

Now that we have a workflow registered, we need to configure it via the PEcAn settings list. The PEcAn settings list is a nested list providing parameters for the various actions performed by the PEcAn workflow, including the trait meta-analysis, processing input files, and running models. It can be created manually with a bunch of list calls. However, this is tedious and error-prone, so pecanapi provides several utilities that facilitate this process.

We start with a blank list.

settings <- list()

Let’s start by adding the workflow we created in the previous section to this list. This is done via the add_workflow function, which takes as input a workflow data.frame and adds the relevant fields to the right places in the settings list.

settings <- add_workflow(settings, workflow)

All add_* functions work by incrementally adding to an input settings object and returning a new modified settings object. The first argument of these functions is always the settings list, which gives these functions a consistent syntax and makes it easy to string multiple settings modifications together using the magrittr pipe (%>%), similar to tidyverse tabular data manipulations.

Let’s continue by adding a basic database configuration to this settings list.

settings <- add_database(settings)

## Error in add_database(settings): could not find function "add_database"

settings

## list()

The add_database function adds a sensible default configuration for the PEcAn database in the right place with the right names in the settings file. These defaults can, of course, be modified in the function call (see ?add_database), or, better yet, by setting package options, which is where most add_* functions get their defaults (see ?pecanapi_options).

Similarly, add_rabbitmq automatically adds the RabbitMQ configuration to the settings object. Like add_database, it takes all of its defaults from options (see ?pecanapi_options).

settings <- add_rabbitmq(settings)

## Error in add_rabbitmq(settings): could not find function "add_rabbitmq"

settings

## list()

PFTs are added to the settings object with the add_pft function. To search for PFTs, use the search_pfts function, which can take optional arguments for PFT name (name), description of its definition (definition), and model type (modeltype).

search_pfts(con, name = "deciduous", modeltype = "sipnet")
search_pfts(con, name = "tundra", modeltype = "ED")

As with search_models and search_sites, these functions are case insensitive and do partial matching by default. The add_pft function adds individual PFTs by name.

settings <- add_pft(settings, "temperate.deciduous")

## Error in add_pft(settings, "temperate.deciduous"): could not find function "add_pft"

settings

## list()

This adds the temperate.deciduous PFT to the appropriate spot in the settings hierarchy. Whereas add_pft adds a single PFT to the settings, add_pft_list can add a vector of PFTs.

settings <- add_pft_list(settings, c("temperate.coniferous", "miscanthus"))

## Error in add_pft_list(settings, c("temperate.coniferous", "miscanthus")): could not find function "add_pft_list"

settings

## list()

Like add_database, add_pft and add_pft_list can also take arbitrary additional configuration arguments via their ... argument. For add_pft, such arguments are passed only to that PFT, while for add_pft_list, they are shared between all PFTs. For more details, see ?add_pft.

One final note is that, because the settings object is just a list, you can make arbitrary modifications to it via base R’s modifyList function (indeed, many of the pecanapi::add_* functions use modifyList under the hood).

customization <- list(
    meta.analysis = list(iter = 3000, random.effects = FALSE),
    run = list(
      inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))
    )
  )
settings <- modifyList(settings, customization)

Note that modifyList operates recursively on nested lists, which makes it easy to modify settings at different levels of the list hierarchy. For instance, below, we modify the previous settings object to make random.effects = TRUE, and change the download method of the inputs to OpenDAP, but keep all the other settings the same.

settings <- modifyList(settings, list(
  meta.analysis = list(random.effects = TRUE),
  run = list(inputs = list(met = list(method = "opendap")))
))

All of these steps can be chained together via magrittr pipes (%>%).

library(magrittr)
settings <- list() %>%
  add_workflow(workflow) %>%
  add_database() %>%
  add_rabbitmq() %>%
  add_pft("temperate.deciduous") %>%
  add_pft("temperate.coniferous") %>%
  modifyList(list(
    meta.analysis = list(iter = 3000, random.effects = FALSE),
    run = list(inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))),
    host = list(rabbitmq = list(
      uri = "amqp://guest:guest@rabbitmq:5672/%2F",
      queue = "SIPNET_136"
    ))
  ))

43.5 Submitting a run

Now that we have all the pieces, let’s put them together into a single settings object.

settings <- list() %>%
  add_workflow(workflow) %>%
  add_database() %>%
  add_pft("temperate.deciduous") %>%
  modifyList(list(
    meta.analysis = list(iter = 3000, random.effects = FALSE),
    run = list(inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))),
    host = list(rabbitmq = list(
      uri = "amqp://guest:guest@rabbitmq:5672/%2F",
      queue = "SIPNET_136"
    ))
  ))

We can then submit these settings as a run via the submit_workflow function. This function has only one required input – the settings list – but a number of optional arguments for specifying how to connect to the RabbitMQ API (see ?submit_workflow for details).

submit_workflow(settings)

If the workflow was submitted successfully, this will return the HTTP response routed = TRUE as a named list. Note that this only means that the RabbitMQ message was posted; the workflow can still crash for various reasons. To see the status of the workflow, look at docker-compose logs executor or use the Portainer interface.

43.6 Processing output

All of PEcAn’s outputs as well as its database files (dbfiles) can be accessed remotely via the THREDDS data server. You can explore these files by browsing to localhost:8000/thredds/ in a browser (substituting hostname and port, accordingly).

All files, regardless of file type, can be downloaded directly (via HTTP) through the THREDDS fileServer protocol. In pecanapi, URLs for these files can be easily constructed via output_url for any workflow output and run_url for run-specific outputs. For instance, to read the workflow.Rout file from the workflow we created earlier, you can do the following:

workflow_id <- workflow[["id"]]
readLines(workflow_id, "workflow.Rout")

Outputs in NetCDF format can also be accessed via the OpenDAP service, which allows remote variable selection and subsetting (meaning you can only download the outputs you need without needing to download the entire file). These URLs are created via the thredds_dap_url (for a generic URL) or run_dap (to access outputs from a specific model run).

sipnet_out <- ncdf4::nc_open(run_dap(workflow_id, "2004.nc"))
gpp <- ncdf4::ncvar_get(sipnet_out, "GPP")
time <- ncdf4::ncvar_get(sipnet_out, "time")
ncdf4::nc_close(sipnet_out)
plot(time, gpp, type = "l")