Basic HLI queries

This vignette focuses on the Coordinated Assessment Program (CAP) standardized high-level indicators (HLI) tables. See the CAP HLI webpage for details on each indicator. The tables returned in this vignette are the same data returned via the CAP Fish HLIs Tabular Query GUI. Note CAP includes other datasets (tables) and {rCAX} will return these also. See the CAX Datasets vignette.

Getting started

See the Getting Started vignette for instructions on installing {rCAX}. Once you have installed {rCAX}, you can begin using the package by loading the library.

Load the library.

library(rCAX)

Read the terms of use:

rcax_termsofuse()

HLI queries

The main user function is rcax_hli() which returns HLI tables from Coordinated Assessments data eXchange with meta data such as NMFS_PopID as returned from the CAP Fish HLIs Tabular Query or CAX.

The basic rcax_hli() functionality is shown with the NOSA HLI but the syntax is the same for all the HLI tables:

  • NOSA: Natural Origin Spawner Abundance
  • SAR: Smolt to Adult Ratios
  • PNI: Proportionate Natural Influence of supplementation hatcheries
  • RperS: Recruits per Spawner
  • JuvOut: Juvenile Outmigrants
  • PreSmolt: Presmolt Abundance

Show the columns for the NOSA table

Get the table and show all the column names with definitions. Only first 10 are shown.

head(rcax_hli("NOSA", type="colnames"))
#>                        name
#> 110               age10prop
#> 43      age10proplowerlimit
#> 24      age10propupperlimit
#> 111           age11plusprop
#> 112 age11plusproplowerlimit
#> 65  age11pluspropupperlimit
#>                                                                            definition
#> 110          The proportion of natural origin fish that were age 10 (brood year +10).
#> 43                              Lower limit of the confidence interval for Age10Prop.
#> 24                              Upper limit of the confidence interval for Age10Prop.
#> 111 The proportion of natural origin fish that were age 11 (brood year +11) or older.
#> 112                         Lower limit of the confidence interval for Age11PlusProp.
#> 65                          Upper limit of the confidence interval for Age11PlusProp.

Get records for the NOSA HLI table

Here the columns returned are restricted by cols. The table is filtered with flist to be just the columns with nmfs_popid equal to 7. Note the cols argument is case insensitive, NMFS_PopID and nmfs_popid are the same, but the column names in the returned tables will all be lower case.

tab <- rcax_hli("NOSA",
  flist = list(nmfs_popid = 7),
  cols=c("nmfs_popid", "spawningyear", "tsaej", "nosaej"))
head(tab)
#>   nmfs_popid spawningyear tsaej nosaej
#> 1          7         1964             
#> 2          7         1965             
#> 3          7         1966             
#> 4          7         1967             
#> 5          7         1968             
#> 6          7         1969

Return data for a single ESU. The ESU/DPS names must be exact and are case sensitive. Use rCAX:::caxesu to see the ESU/DPS names.

tab <- rcax_hli("NOSA",
  flist = list(esu_dps = "Salmon, chum (Columbia River ESU)")
)
#> Warning in rcax_table_query(tablename = tablename, flist = flist, qlist =
#> qlist, : Not all names in cols appear in the table. Removing cbfwapopname,
#> trtmethod, dataentrynotes

We can then plot:

library(ggplot2)
# Convert tsaej to a number
tab$tsaej <- as.numeric(tab$tsaej)
# plot
ggplot(
  subset(tab, spawningyear>2000), 
  aes(x=spawningyear, y=log(tsaej), color=waterbody)) +
  geom_line(na.rm = TRUE) +
  ggtitle("log(total spawners)")

Keep in mind that not all ESUs or DPSs are in the CAX database for each HLI, nor are all populations for each ESU or DPS. Go to https://www.streamnet.org/data/hli/ to do a search to quickly see what is available in different HLI tables.

Filtering

Querying for columns with specific values is filtering and filter specification is via the flist argument. The flist argument is a list with the columns and values you want to filter on. A query with the flist argument takes this form:

tab <- rcax_hli("NOSA",
  flist = list(...))

Single value

Filter based on one value. This shows examples of how you might specify the flist argument in a rcax_hli() call:

flist = list(esu_dps = "Steelhead (Middle Columbia River DPS)")
flist = list(popid = 7)

For example, you could use

tab <- rcax_hli("NOSA",
  flist = list(popid = 7))

to retrieve only data for popid 7.

Multiple values

You can also filter based on multiple values. In this case, data with popid 7, 8 or 9 are returned.

flist = list(popid = c(7,8,9))

Filter based on two columns. Here we getting the the summer run data for one ESU. The values in flist are not case sensitive so “Summer” will return both “Summer” and “summer”.

flist = list(esu_dps = "Salmon, Chinook (Snake River spring/summer-run ESU)", run = c("Summer"))

Unfortunately there seems to be a server-side problem with passing in multiple values with multiple columns. This works

flist = list(run = c("Summer", "Spring"))

But this throws an error.

flist = list(esu_dps = "Salmon, Chinook (Snake River spring/summer-run ESU)", run = c("Summer", "Spring"))

Change the number of records returned

The default maximum number of records is 1000. You can increase (or decrease) this by passing in the limit query parameter using the qlist argument.

tab <- rcax_hli("NOSA",
  qlist = list(limit=1),
  cols=c("popid", "spawningyear", "tsaej"))
tab
#>   popid spawningyear tsaej
#> 1    58         2001

Increase the limit to 2000 to ensure all the data are returned. Not run.

tab <- rcax_hli("NOSA",
  flist = list(esu_dps="Salmon, Chinook (Snake River spring/summer-run ESU)")
  qlist = list(limit=2000),
  cols=c("popid", "spawningyear", "tsaej"))
tab

Show the available tables

Only the name and id columns are shown.

tab <- rcax_datasets(cols=c("name", "id"))
head(tab)
#>                           name                                   id
#> 1             SuperPopulations 009A08FE-6479-44FC-9B6F-01C55E2C8BA3
#> 2                  XPortCA_PNI 0474CE96-789B-4E16-8FD5-05C431E5034A
#> 3                  EscData4GIS 0603946B-84FF-450D-8F3E-C2513517126D
#> 4             HatcheryReleases 080DDAA3-E315-4CF3-BD24-EFD6AD1DB6CB
#> 5 XPortCA_PresmoltAbundance_01 086448AE-4F1F-4FE1-B794-6CC4FB0C451F
#> 6              HatcheryProgram 0934C3BD-092D-4ED9-BDD7-3C5AF72C1E07

Show internal data

These are internal data sets. Access with rCAX:::

  • caxesu The ESU and DPS names, which appear in the esu_dps column in tables.
  • caxpops The Populations table with all the population metadata, like NMFS_PopID and MPG.
  • caxsuperpops The SuperPopulations table with the metadata describing which populations are included in each superpopulation. These are used when the data, e.g. Genetic Stock Identification, cannot separate data to the population level.
rCAX:::caxesu[1:5]
#> [1] "N/A"                                                
#> [2] "Salmon, Chinook (Lower Columbia River ESU)"         
#> [3] "Salmon, Chinook (Puget Sound ESU)"                  
#> [4] "Salmon, Chinook (Snake River fall-run ESU)"         
#> [5] "Salmon, Chinook (Snake River spring/summer-run ESU)"
colnames(rCAX:::caxpops)
#>  [1] "popid"               "recoverydomain"      "esu_dps"            
#>  [4] "majorpopgroup"       "populationname"      "esapopname"         
#>  [7] "nmfs_popid"          "nmfs_population"     "nmfs_popcode"       
#> [10] "nmfs_species"        "nmfs_common_species" "nmfs_run"           
#> [13] "popstatus"           "nmfs_type"           "listing_status"     
#> [16] "bpa_priority"        "species"             "run"                
#> [19] "fcrps_sectiontitle"  "recordnote"