43 Introduction to the PEcAn R API
43.1 Introduction
The PEcAn API package (pecanapi
) is designed to allow users to submit PEcAn workflows directly from an R session.
The basic idea is that users build the PEcAn settings object via an R script (manually, or using the included helper functions) and then use the RabbitMQ API to send this object to a Dockerized PEcAn instance running on a local or remote machine.
pecanapi
is specifically designed to only depend on CRAN packages, and not on any PEcAn internal packages.
This makes it easy to install, and allows it to be used without needing to download and install PEcAn itself (which is large and has many complex R package and system dependencies).
It can be installed directly from GitHub as follows:
## devtools::install_github("pecanproject/pecan", subdir = "api")
library(pecanapi)
## Error in library(pecanapi): there is no package called 'pecanapi'
This vignette covers the following major sections:
- Initial setup goes over the configuration, both inside and outside R, required to make
pecanapi
work. - Registering a workflow goes over how to register a PEcAn workflow with the PEcAn database, including searching for the required site and model IDs
- Building a settings object covers how to configure a PEcAn workflow using the PEcAn settings list.
- Finally, submitting a run covers how to submit the complete settings object for execution.
43.2 Initial setup
This tutorial assumes you are running a Dockerized instance of PEcAn on your local machine (hostname localhost
, port 8000).
To check this, open a browser and try to access http://localhost:8000/pecan/
.
If you are trying to access a remote instance of PEcAn, you will need to substitute the hostname and port accordingly.
To perform database operations, you will also need to have read access to the PEcAn database.
Note that the PEcAn database Docker container (postgres
) does not provide this by default, so you will need to open port 5432 (the PostgreSQL default) to that container.
You can do this by creating a docker-compose.override.yml
file with the following contents in the root directory of the PEcAn source code:
version: "3"
services:
postgres:
ports:
- 5432:5432
Here, the first port is the one used to access the database (can be any open port; most PostgreSQL applications assume 5432 by default), and the second is the port the database is actually running on (which will always be 5432).
After making this change, reload the postgres
container by running docker-compose up -d
.
To check that this works, open an R session and try to create a database connection object to the PEcAn database.
con <- DBI::dbConnect(
RPostgres::Postgres(),
user = "bety",
password = "bety",
host = "localhost",
port = 5432
)
DBI::dbListTables(con)[1:5]
This code should print out five table names from the PEcAn database. If it throws an error, you have a problem with your database connection.
The rest of this tutorial assumes that you are using this same database connection object (con
).
In addition, any API operations that modify the database will not work unless a user ID is set.
To avoid having to manually specify the ID each time, we can set it via options
:
options(pecanapi.user_id = 99000000002)
The pecanapi
package has many other options that it uses for its default configuration, including the Docker server and RabbitMQ hostname and credentials.
To learn more about them, see ?pecanapi_options
.
43.3 Registering a workflow with the database
For the PEcAn workflow to work, it needs to be registered with the PEcAn database.
In pecanapi
, this is done via the insert_new_workflow
function.
Building a workflow requires two important pieces of information: the model and site IDs.
If you know these for your site and model, you can pass them directly into insert_new_workflow
.
However, chances are you may have to look them up in the database first.
pecanapi
provides several search_*
utilities to make this easier.
First, let’s pick a model.
To list all models, we can run search_models
with no arguments (other than the database connection object, con
).
models <- search_models(con)
We can narrow down our search by model name, revision, or “type”.
search_models(con, "ED")
search_models(con, "sipnet")
search_models(con, "ED", revision = "git")
Note that the search is case-insensitive by default, and searches before and after the input string.
See ?search_models
to learn how to toggle this behavior.
For the purposes of this tutorial, let’s use the SIPNET model because it has low input requirements and runs very quickly.
Specifically, let’s use the 136
version.
We could grab the model ID from the search results, but pecanapi
also provides an additional helper function for retrieving model IDs if you know the exact name and revision.
model_id <- get_model_id(con, "SIPNET", "136")
model_id
We can repeat this process for sites with the search_sites
function (though there is currently no get_site_id
function).
Note the use of %
as a wildcard (matches zero or more of any character, equivalent to the regular expression .*
).
The two sites in the search below are largely identical, so we’ll use the one with more site information (i.e. where mat
is not NA
).
all_umbs <- search_sites(con, "umbs%disturbance")
all_umbs
site_id <- subset(all_umbs, !is.na(mat))[["id"]]
With site and model IDs in hand, we are ready to create a workflow.
workflow <- insert_new_workflow(con, site_id, model_id, start_date = "2004-01-01", end_date = "2004-12-31")
workflow
The insert_new_workflow
function inserts the workflow into the database and returns a data.frame
containing the row that was inserted.
43.4 Building a settings object
Now that we have a workflow registered, we need to configure it via the PEcAn settings list.
The PEcAn settings list is a nested list providing parameters for the various actions performed by the PEcAn workflow, including the trait meta-analysis, processing input files, and running models.
It can be created manually with a bunch of list
calls.
However, this is tedious and error-prone, so pecanapi
provides several utilities that facilitate this process.
We start with a blank list.
settings <- list()
Let’s start by adding the workflow we created in the previous section to this list.
This is done via the add_workflow
function, which takes as input a workflow data.frame
and adds the relevant fields to the right places in the settings list.
settings <- add_workflow(settings, workflow)
All add_*
functions work by incrementally adding to an input settings object and returning a new modified settings object.
The first argument of these functions is always the settings list, which gives these functions a consistent syntax and makes it easy to string multiple settings modifications together using the magrittr
pipe (%>%
), similar to tidyverse
tabular data manipulations.
Let’s continue by adding a basic database configuration to this settings list.
settings <- add_database(settings)
## Error in add_database(settings): could not find function "add_database"
settings
## list()
The add_database
function adds a sensible default configuration for the PEcAn database in the right place with the right names in the settings file.
These defaults can, of course, be modified in the function call (see ?add_database
), or, better yet, by setting package options, which is where most add_*
functions get their defaults (see ?pecanapi_options
).
Similarly, add_rabbitmq
automatically adds the RabbitMQ configuration to the settings object.
Like add_database
, it takes all of its defaults from options
(see ?pecanapi_options
).
settings <- add_rabbitmq(settings)
## Error in add_rabbitmq(settings): could not find function "add_rabbitmq"
settings
## list()
PFTs are added to the settings object with the add_pft
function.
To search for PFTs, use the search_pfts
function, which can take optional arguments for PFT name (name
), description of its definition (definition
), and model type (modeltype
).
search_pfts(con, name = "deciduous", modeltype = "sipnet")
search_pfts(con, name = "tundra", modeltype = "ED")
As with search_models
and search_sites
, these functions are case insensitive and do partial matching by default.
The add_pft
function adds individual PFTs by name.
settings <- add_pft(settings, "temperate.deciduous")
## Error in add_pft(settings, "temperate.deciduous"): could not find function "add_pft"
settings
## list()
This adds the temperate.deciduous
PFT to the appropriate spot in the settings hierarchy.
Whereas add_pft
adds a single PFT to the settings, add_pft_list
can add a vector of PFTs.
settings <- add_pft_list(settings, c("temperate.coniferous", "miscanthus"))
## Error in add_pft_list(settings, c("temperate.coniferous", "miscanthus")): could not find function "add_pft_list"
settings
## list()
Like add_database
, add_pft
and add_pft_list
can also take arbitrary additional configuration arguments via their ...
argument.
For add_pft
, such arguments are passed only to that PFT, while for add_pft_list
, they are shared between all PFTs.
For more details, see ?add_pft
.
One final note is that, because the settings object is just a list, you can make arbitrary modifications to it via base R’s modifyList
function (indeed, many of the pecanapi::add_*
functions use modifyList
under the hood).
customization <- list(
meta.analysis = list(iter = 3000, random.effects = FALSE),
run = list(
inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))
)
)
settings <- modifyList(settings, customization)
Note that modifyList
operates recursively on nested lists, which makes it easy to modify settings at different levels of the list hierarchy.
For instance, below, we modify the previous settings object to make random.effects = TRUE
, and change the download method of the inputs
to OpenDAP, but keep all the other settings the same.
settings <- modifyList(settings, list(
meta.analysis = list(random.effects = TRUE),
run = list(inputs = list(met = list(method = "opendap")))
))
All of these steps can be chained together via magrittr
pipes (%>%
).
library(magrittr)
settings <- list() %>%
add_workflow(workflow) %>%
add_database() %>%
add_rabbitmq() %>%
add_pft("temperate.deciduous") %>%
add_pft("temperate.coniferous") %>%
modifyList(list(
meta.analysis = list(iter = 3000, random.effects = FALSE),
run = list(inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))),
host = list(rabbitmq = list(
uri = "amqp://guest:guest@rabbitmq:5672/%2F",
queue = "SIPNET_136"
))
))
43.5 Submitting a run
Now that we have all the pieces, let’s put them together into a single settings object.
settings <- list() %>%
add_workflow(workflow) %>%
add_database() %>%
add_pft("temperate.deciduous") %>%
modifyList(list(
meta.analysis = list(iter = 3000, random.effects = FALSE),
run = list(inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))),
host = list(rabbitmq = list(
uri = "amqp://guest:guest@rabbitmq:5672/%2F",
queue = "SIPNET_136"
))
))
We can then submit these settings as a run via the submit_workflow
function.
This function has only one required input – the settings list – but a number of optional arguments for specifying how to connect to the RabbitMQ API (see ?submit_workflow
for details).
submit_workflow(settings)
If the workflow was submitted successfully, this will return the HTTP response routed = TRUE
as a named list.
Note that this only means that the RabbitMQ message was posted; the workflow can still crash for various reasons.
To see the status of the workflow, look at docker-compose logs executor
or use the Portainer interface.
43.6 Processing output
All of PEcAn’s outputs as well as its database files (dbfiles
) can be accessed remotely via the THREDDS data server.
You can explore these files by browsing to localhost:8000/thredds/
in a browser (substituting hostname and port, accordingly).
All files, regardless of file type, can be downloaded directly (via HTTP) through the THREDDS fileServer
protocol.
In pecanapi
, URLs for these files can be easily constructed via output_url
for any workflow output and run_url
for run-specific outputs.
For instance, to read the workflow.Rout
file from the workflow we created earlier, you can do the following:
workflow_id <- workflow[["id"]]
readLines(workflow_id, "workflow.Rout")
Outputs in NetCDF format can also be accessed via the OpenDAP service, which allows remote variable selection and subsetting (meaning you can only download the outputs you need without needing to download the entire file).
These URLs are created via the thredds_dap_url
(for a generic URL) or run_dap
(to access outputs from a specific model run).
sipnet_out <- ncdf4::nc_open(run_dap(workflow_id, "2004.nc"))
gpp <- ncdf4::ncvar_get(sipnet_out, "GPP")
time <- ncdf4::ncvar_get(sipnet_out, "time")
ncdf4::nc_close(sipnet_out)
plot(time, gpp, type = "l")