Package 'FishSET'

Title: Spatial Economics Toolbox for Fisheries
Description: The Spatial Economics Toolbox for Fisheries (FishSET) is a set of tools for organizing data; developing, improving and disseminating modeling best practices.
Authors: Lisa Pfeiffer [aut, cre], Paul G Carvalho [aut] , Anna Abelman [aut], Min-Yang Lee [aut], Melanie Harsch [aut], Bryce McManus [aut], Alan Haynie [aut]
Maintainer: Lisa Pfeiffer <[email protected]>
License: MIT + file LICENSE
Version: 1.1.0
Built: 2024-11-25 17:33:08 UTC
Source: https://github.com/noaa-nwfsc/FishSET

Help Index


Add removed variables back into dataset - non-interactive version

Description

Add columns that have been removed from the primary dataset back into the primary dataset.

Usage

add_vars(working_dat, raw_dat, vars, project)

Arguments

working_dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

raw_dat

Unmodified raw version of the primary dataset. Should be a character specifying a table from the FishSET database containing the string ‘MainDataTable’ and date table was created.

vars

Character string, variables from raw_dat to add back into working_dat.

project

Character, name of project. Parameter is used to generate meaningful table names in FishSET database.

Details

Add variables back into the dataset that were removed. The removed variables are obtained from the raw_dat and merged into the working data based on a row identifier. The row identifier is created when a variable is removed using the select_vars function. The row identifier is used to match the raw data variables to working_dat.

Examples

## Not run: 
add_vars(pcodMainDataTable, "pcodMainDataTable20200410", "pollock")

## End(Not run)

Add removed variables back into dataset

Description

Add columns that have been removed from the primary dataset back into the primary dataset.

Usage

add_vars_gui(working_dat, raw_dat, project)

Arguments

working_dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

raw_dat

Unmodified raw version of the primary dataset. Should be a character specifying a table from the FishSET database containing the string ‘MainDataTable’ and date table was created.

project

String, name of project.

Details

Opens an interactive table that allows users to select which variables to be added back into the working dataset.
The removed variables are obtained from the raw_dat and merged into the working data based on a row identifier. The row identifier is created when the variable is removed using the select_vars function. The row identifier is used to match the raw data variables to working_dat.

Examples

## Not run: 
select_vars_gui(pcodMainDataTable)
add_vars_gui(pcodMainDataTable, 'pcodMainDataTable20100101', 'pcod')

## End(Not run)

Aggregating function

Description

Aggregating function

Usage

agg_helper(
  dataset,
  value,
  period = NULL,
  group = NULL,
  within_group = NULL,
  fun = "sum",
  count = FALSE,
  format_tab = "decimal"
)

Arguments

dataset

'MainDataTable' to aggregate.

value

String, name of variable to aggregate.

period

String, name of period variable to aggregate by. Primarily for internal use. Places temporal variables to the right-end of the summary table.

group

String, name of grouping variable(s) to aggregate by.

within_group

String, name of grouping variable(s) for calculating within group percentages. fun = "percent" and period or group are required.

fun

String, function name to aggregate by. Also accepts anonymous functions. To calculate percentage, set fun = "percent"; this will return the percent of total when within_group = NULL.

count

Logical, if TRUE then returns the number of observations by period and/or group.

format_tab

String. Options include "decimal" (default), "scientific", and "PrettyNum" (rounds to two decimal places and uses commas).

Examples

## Not run: 

# total catch by port
agg_helper(pollockMainDataTable, value = "OFFICIAL_TOTAL_CATCH_MT", 
           group = "PORT_CODE", fun = "sum")

# count permits
agg_helper(pollockMainDataTable, value = "PERMIT", count = TRUE, fun = NULL)

# count permits by gear type
agg_helper(pollockMainDataTable, value = "PERMIT", group = "GEAR_TYPE",
           count = TRUE, fun = NULL)

# percent of total by gear type
agg_helper(pollockMainDataTable, value = "PERMIT", group = "GEAR_TYPE",
           count = TRUE, fun = "percent")
 
# within group percentage          
agg_helper(pollockMainDataTable, value = "OFFICIAL_TOTAL_CATCH_MT", 
           fun = "percent", group = c("PORT_CODE", "GEAR_TYPE"), 
           within_group = "PORT_CODE")

## End(Not run)

Get Alternative Choice List

Description

Returns the Alternative Choice list from the FishSET database.

Usage

alt_choice_list(project, name = NULL)

Arguments

project

Name of project.

name

Name of Alternative Choice list in the FishSET database. The table name will contain the string "AltMatrix". If NULL, the default table is returned. Use tables_database to see a list of FishSET database tables by project.


Set x-axis labels to 45 degrees

Description

Set x-axis labels to 45 degrees

Usage

angled_theme()

Assign each observation in the primary dataset to a fishery management or regulatory zone Assign each observation in the primary dataset to a fishery management or regulatory zone. Function is primarily called by other functions that require zone assignment but can also be used on its own.

Description

Assign each observation in the primary dataset to a fishery management or regulatory zone

Assign each observation in the primary dataset to a fishery management or regulatory zone. Function is primarily called by other functions that require zone assignment but can also be used on its own.

Usage

assignment_column(
  dat,
  project,
  spat,
  lon.dat,
  lat.dat,
  cat,
  name = "ZoneID",
  closest.pt = FALSE,
  bufferval = NULL,
  lon.spat = NULL,
  lat.spat = NULL,
  hull.polygon = FALSE,
  epsg = NULL,
  log.fun = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

name of project.

spat

Spatial data containing information on fishery management or regulatory zones. sf objects are recommended, but sp objects can be used as well. If using a spatial table read from a csv file, then arguments lon.spat and lat.spat are required. To upload your spatial data to the FishSETFolder see load_spatial.

lon.dat

Longitude variable in dat.

lat.dat

Latitude variable in dat.

cat

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, cat should be name of list containing information on zones.

name

The name of the new assignment column. Defaults to "ZoneID".

closest.pt

Logical, if TRUE, observations that fall outside zones are classed as the closest zone polygon to the point.

bufferval

Maximum buffer distance, in meters, for assigning observations to the closest zone polygon. If the observation is not within the defined bufferval, then it will not be assigned to a zone polygon. Required if closest.pt = TRUE.

lon.spat

Variable or list from spat containing longitude data. Required for spatial tables read from csv files. Leave as NULL if spat is an sf or sp object.

lat.spat

Variable or list from spat containing latitude data. Required for spatial tables read from csv files. Leave as NULL if spat is an sf or sp object.

hull.polygon

Logical, if TRUE, creates convex hull polygon. Use if spatial data creating polygon are sparse or irregular.

epsg

EPSG code. Manually set the epsg code, which will be applied to spat and dat. If epsg is not specified but is defined for spat, then the spat epsg will be applied to dat. In addition, if epsg is not specified and epsg is not defined for spat, then a default epsg value will be applied to spat and dat (epsg = 4326). See http://spatialreference.org/ to help identify optimal epsg number.

log.fun

Logical, whether to log function call (for internal use).

Details

Function uses the specified latitude and longitude from the primary dataset to assign each row of the primary dataset to a zone. Zone polygons are defined by the spatial dataset. Set hull.polygon to TRUE if spatial data is sparse or irregular. Function is called by other functions if a zone identifier does not exist in the primary dataset.

Value

Returns primary dataset with new assignment column.

Examples

## Not run: 
pollockMainDataTable <- 
     assignment_column(pollockMainDataTable, "pollock", spat = pollockNMFSSpatTable,
                       lon.dat = "LonLat_START_LON", lat.dat = "LonLat_START_LAT")

## End(Not run)

Creates numeric variables divided into equal sized groups

Description

Creates numeric variables divided into equal sized groups

Usage

bin_var(dat, project, var, br, name = "bin", labs = NULL, ...)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

var

Numeric variable in dat to bin into a factor.

br

Numeric vector. If a single number, the range of var is divided into br even groups. If two or more values are given, var is divided into intervals.

name

Variable name to return. Defaults to 'bin'.

labs

A character string of category labels.

...

Additional arguments passed to cut.

Details

Function adds a new factor variable, labeled by name, to the primary dataset. The numeric variable is divided into equal sized groups if the length of br is equal to one and into intervals if the length of br is greater than one.

Value

Returns the primary dataset with binned variable added.

Examples

## Not run: 
 pollockMainDataTable <- bin_var(pollockMainDataTable, 'pollock', 'HAUL', 10, 'HAULCAT')
 pollockMainDataTable <- bin_var(pollockMainDataTable, 'pollock', 'HAUL', c(5,10), 'HAULCAT')

## End(Not run)

Compare bycatch CPUE and total catch/percent of total catch for one or more species

Description

Compare bycatch CPUE and total catch/percent of total catch for one or more species

Usage

bycatch(
  dat,
  project,
  cpue,
  catch,
  date,
  period = "year",
  names = NULL,
  group = NULL,
  sub_date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  conv = "none",
  tran = "identity",
  format_lab = "decimal",
  value = "stc",
  combine = FALSE,
  scale = "fixed",
  output = "tab_plot",
  format_tab = "wide"
)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

name of project.

cpue

A string of CPUE variable names. The function outputs the mean CPUE by period. The variable names must match the order of variable names in catch and names.

catch

A character string of names of catch variables to aggregate. The function outputs the total catch or share of total catch by period depending on the value argument. The order of the catch variable string must match those of the cpue and names arguments.

date

A variable containing dates to aggregate by.

period

Period to aggregate by. Options include 'year', month', and weeks'.

names

An optional string of species names that will be used in the plot. If NULL, then species names from catch will be used.

group

A categorical variable in dat to group by.

sub_date

Date variable used for subsetting, grouping, or splitting by date.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by using the variable in filter_by.

facet_by

Variable name to facet by. Accepts up to two variables. Facetting by "year", "month", or "week" is available if a date variable is added to sub_date.

conv

Convert catch variable to "tons", "metric_tons", or by using a function entered as a string. Defaults to "none" for no conversion.

tran

A function to transform the y-axis. Options include log, log2, log10, sqrt.

format_lab

Formatting option for y-axis labels. Options include "decimal" or "scientific".

value

Whether to return raw catch ("raw") or share of total catch ('stc').

combine

Logical, whether to combine variables listed in group.

scale

Scale argument passed to facet_grid. Defaults to "fixed". Other options include "free_y", "free_x", and "free_xy".

output

Output type. Options include 'table' or 'plot'.

format_tab

How table output should be formatted. Options include 'wide' (the default) and 'long'.

Details

Returns a plot and/or table of the mean CPUE and share of total catch or raw count for each species entered. For optimal plot size in an R Notebook/Markdown document, we recommend including no more than four species. The order of variables in the cpue and catch arguments must be in the same order as in the names argument. The names argument is used to join the catch and cpue variables together.

Value

bycatch() compares the average CPUE and catch total/share of total catch between one or more species. The data can be filtered by date and/or by a variable. filter_date specifies the type of date filter to apply–by date-range or by period. date_value should contain the values to filter the data by. To filter by a variable, enter its name as a string in filter_by and include the values to filter by in filter_value. Only one grouping variable will be displayed; however, any number of variables can be combined by using combine = TRUE, but no more than three is recommended. For faceting, any variable in the dataset can be used, but "year" and "month" are also available provided a date variable is added to sub_date. Generally, no more than four species should be compared, and even fewer when faceting due to limited plot space. A list containing a table and plot are printed to the console and viewer by default. For optimal plot size in an R Notebook/Markdown document, use the chunk option fig.asp = 1.

Examples

## Not run: 
cpue(pollockMainDataTable, "myproject", xWeight = "f1Weight",
  xTime = "Hour", "f1_cpue"
)

bycatch(pollockMainDataTable, "myproject", 
        cpue = c("f1_cpue", "f2_cpue", "f3_cpue", "f4_cpue"),
        catch = c("f1", "f2", "f3", "f4"), date = "FISHING_START_DATE",
        names = c("fish_1", "fish_2", "fish_3", "fish_4"), period = "month",
        date_filter = "year", date_value = 2011, value = "stc", 
        output = "table")

## End(Not run)

Linear Model for Catch

Description

First stage regression model for catch.

Usage

catch_lm(
  dat,
  project,
  catch.formula,
  zoneID = NULL,
  exp.name = NULL,
  new.name = NULL,
  date,
  output = "matrix"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

catch.formula

A formula object specifying the linear model.See stats::lm().

zoneID

zone ID Variable in dat that identifies the individual zones or areas. Required if merging expected catch into dat using exp.name, or when creating a new expected catch matrix and exp.name is NULL (see output below).

exp.name

Name(s) of expected catch matrix to merge into dat.

new.name

Optional, string. When output = 'matrix', new.name will become the name of the new expected catch matrix saved to the FishSET DB expected catch list. When output = 'dataset', new.name will become the name of the new expected catch variable added to the primary dataset.

date

Date variable from dat used to create expected catch matrix.

output

Whether to output dat with the expected catch variable added ('dataset') or to save an expected catch matrix to the expected catch FishSET DB table ('matrix'). Defaults to output = 'matrix'.

Details

catch_lm() can merge an expected catch matrix into the primary dataset before running the linear model. This is done using by passing exp.name and zoneID to merge_expected_catch() and is for convenience; users can do this separately using merge_expected_catch() if desired, just make sure to leave exp.name empty before running catch_lm(). Merging expected catch in a separate step is useful for creating tables and plots before running a first stage linear regression.

Value

catch_lm() has two output options: dataset and matrix. When output == 'dataset', the primary dataset will be returned with the fitted values from the model added as a new column. The new column is named using new.name.

When output == 'matrix' an expected catch matrix is created and saved to the FishSET DB expected catch list (it is not outputted to the console). There are two ways to create an expected catch matrix: by using an existing expected catch matrix in catch.formula, or by using a zone-identifier column (i.e. zoneID) in the catch.formula. For example, if you have created an expected catch matrix named 'user1' using create_expectations(), catch.formula could equal catch ~ vessel_length * user1. In this case exp.name would equal 'user1'. Alternatively, you could create an expected catch matrix by specifying catch.formula as catch ~ vessel_length * zone. In this case, exp.name = NULL and zoneID = 'zone'.

See Also

merge_expected_catch()


Save Primary Table's Centroid Columns to FishSET Database

Description

Save the unique centroid values from the primary table to the FishSET Database. Use this function if zone ID and centroid longitude/latitude are included in the primary table.

Usage

centroid_to_fsdb(
  dat,
  spat.name = NULL,
  project,
  zoneID,
  cent.lon,
  cent.lat,
  type = "zone"
)

Arguments

dat

Required, main data frame containing data on hauls or trips. Table in FishSET database should contain the string MainDataTable.

spat.name

Optional, a name to associate with the centroid table.

project

Name of project.

zoneID

Variable in dat that identifies the individual zones or areas.

cent.lon

Required, variable in dat that identifies the centroid longitude of zones or areas.

cent.lat

Required, variable in dat that identifies the centroid latitude of zones or areas.

type

The type of centroid. Options include "zone" for zonal centroids and "fish" for fishing centroids.

Details

In certain cases, the user may have the necessary spatial variables to run a discrete choice model included in the primary table when uploaded to FishSET, and does not need a spatial table to assign observations to zones or find centroids (e.g. by using create_centroid()). However, a centroid table table must be saved to the FishSET Database if a centroid option is used to define alternative choice (see create_alternative_choice()). cent_to_fsdb() allows users to save a zonal or fishing centroid table provided they have the required variables: a zone ID (zoneID), a centroid longitude (cent.lon), and a centroid latitude (cent.lat) column.


Change variable data class

Description

View data class for each variable and call appropriate functions to change data class as needed.

Usage

change_class(dat, project, x = NULL, new_class = NULL, save = FALSE)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Name of project.

x

A character string of variable(s) in dat that will be changed to new_class. One ore more variables may be included. Default set to NULL.

new_class

A character string of data classes that x should be changed to. Length of new_class should match the length of x unless all variables in x should be the same new_class. Defaults to NULL. Options are "numeric", "factor", "date", "character". Must be in quotes.

save

Logical. Should the data table be saved in the FishSET database, replacing the working data table in the database? Defaults to FALSE.

Details

Returns a table with data class for each variable in dat and changes variable classes. To view variable classes run the function with default settings, specifying only dat and project. If variable class should be changed, run the function again, specifying the variable(s) (x) to be changed and the new_class(es) (new_class). Set save to TRUE to save modified data table.

Value

Table with data class for each variable and the working data with modified data class as specified.

Examples

## Not run: 
#View table without changing class or saving
change_class(pollockMainDataTable, "myproject")

#Change class for a single variable and save data table to FishSET database
change_class(pollockMainDataTable, "myproject", x = "HAUL", new_class = 'numeric', save=TRUE)

#Change class for multiple variables and save data table to FishSET database
change_class(pollockMainDataTable, "myproject", x = c("HAUL","DISEMBARKED_PORT"),
 new_class = c('numeric', 'factor'), save=TRUE)

## End(Not run)

Check for common data quality issues affecting modeling functions

Description

Check the primary dataset for NAs, NaNs, Inf, and that each row is a unique choice occurrence

Usage

check_model_data(dat, project, uniqueID, latlon = NULL, save.file = TRUE)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

uniqueID

Variable in dat containing unique occurrence identifier.

latlon

Vector of names for variables with lat, lon coordinates to be check if using 'lat-lon' as starting location.

save.file

Logical, if TRUE and no data issues are identified, the dataset is saved to the FishSET database. Defaults to TRUE.

Details

It is best to check the data for NAs, NaNs and Inf, and that each row is a unique choice occurrence after data creation functions have been run but before making the model design file (make_model_design). These steps should be taken even if the data passed earlier data verification checks, as data quality issues can arise in the creation or modification of data. Model functions may fail or return inaccurate results if data quality issues exist. The integrated data will not save if any of these issues are in the dataset. If data passes all tests, then data will be saved in the FishSET database with the prefix ‘final’. The data index table will also be updated and saved.

Value

Returns statements of data quality issues in the data. Saves table to FishSET database.

Examples

## Not run: 
check_model_data(MainDataTable, uniqueID = "uniqueID_Code", save.file = TRUE)

## End(Not run)

Check and correct spatial data format

Description

Converts spatial data to a sf object

Usage

check_spatdat(spatdat, lon = NULL, lat = NULL, id = NULL)

Arguments

spatdat

Spatial data containing information on fishery management or regulatory zones.

lon

Longitude variable in spatdat. This is required for csv files or if spatdat is a dataframe (i.e. is not a sf or sp object).

lat

Latitude variable in spatdat. This is required for csv files or if spatdat is a dataframe (i.e. is not a sf or sp object).

id

Polygon ID column. This is required for csv files or if spatdat is a dataframe (i.e. is not a sf or sp object).

Details

This function checks whether spatdat is a sf object and attempts to convert it if not. It also applies clean_spat which fixes certain spatial issues such as invalid or empty polygons, whether a projected CRS is used (converts to WGS84 if detected), and if longitude should be shifted to Pacific view (0-360 format) to avoid splitting the Alaska region during plotting.


Retrieve closure scenario names

Description

A helper function used to display the names of currently saved closure scenarios.

Usage

close_names(project)

Arguments

project

Name of project

Details

To retrieve the complete closure scenario file, use get_closure_scenario.


Combine zone and closure area

Description

Creates a new spatial dataset that merges regulatory zones with closure areas.

Usage

combine_zone(spat, closure, grid.nm, closure.nm, recast = TRUE)

Arguments

spat

Spatial file containing regulatory zones.

closure

Closure file containing closure areas.

grid.nm

Character, column name containing grid ID.

closure.nm

Character, column name containing closure ID.

recast

Logical, if TRUE combined is passed to recast_multipoly.

Details

To combine zones with closure areas, this function performs the following steps:

  1. Create the union of the closure area

  2. Take the difference between the closure union and the zone file

  3. Take the intersection of zone and the closure union

  4. Combine the difference and intersection objects into one spatial dataframe

  5. Assign new zone IDs to intersecting polygons

The result is a single spatial dataset containing all polygons from both spat and closure with overlapping (intersecting) polygons receiving new IDs (see new_zone_id). This allows users to partially close regulatory zones during the model design stage.

See Also

recast_multipoly new_zone_id


Confidentialy cache exists

Description

Returns TRUE if confidentiality cache file is found in the project output folder.

Usage

confid_cache_exists(project)

Arguments

project

Name of project.

Examples

## Not run: 
confid_cache_exists("pollock")

## End(Not run)

View correlation coefficients between numeric variables

Description

Correlations coefficients can be displayed between all numeric variables or selected numeric variables. Defaults to pearson correlation coefficient. To change the method, specify 'method' as 'kendall', or 'spearman'. Both a plot and table output are generated and saved to the 'output' folder.

Usage

corr_out(
  dat,
  project,
  variables = "all",
  method = "pearson",
  show_coef = FALSE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, project name.

variables

A character string of variables to include. Defaults to "all" numeric variables.

method

A character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".

show_coef

Logical, whether to include the correlation coefficients on the correlation plot. Only coefficients with a p-value of less than or equal to 0.05 are shown.

Details

Returns Pearson's correlation coefficient between numeric variables in plot and table format. Output saved to output folder.

Examples

## Not run: 
corr_out(pollockMainDataTable, 'pollock', 'all')

## End(Not run)

Create catch or revenue per unit effort variable

Description

Add catch per unit effort (CPUE) or revenue per unit effort variable to the primary dataset. Catch should be a weight variable but can be a count. Effort should be in a duration of time, such as days, hours, or minutes.

Usage

cpue(dat, project, xWeight = NULL, xTime, price = NULL, name = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

xWeight

Catch variable in dat. Variable should be a measure of weight (pounds, metric tons, etc) but can also be count. If calculating revenue per unit effort (RPUE) and a revenue column exists in dat, then add the revenue column to price and set xWeight = NULL.

xTime

Duration of time variable in dat representing effort, such as weeks, days, hours, or minutes.

price

Optional, variable from dat containing price/value data. Price is multiplied against the catch variable, xWeight, to generated revenue. If revenue exists in dat and you wish to use this revenue instead of price, then xWeight must be NULL. Defaults to NULL.

name

String, name of created variable. Defaults to "cpue" or "rpue" if price is not NULL.

Details

Creates the catch or revenue per unit effort variable. Catch variable should be in weight (lbs, mts). Effort variable should be a measurement of duration in time. New variable is added to the primary dataset with the column name defined by the name argument. CPUE for individual species should be calculated separately.

Value

Returns primary dataset with CPUE variable added.

Examples

## Not run: 
pollockMainDataTable <- cpue(pollockMainDataTable, 'pollock', 
                             xWeight = 'OFFICIAL_TOTAL_CATCH_MT', 
                             xTime = 'DURATION_IN_MIN', name = 'cpue')

## End(Not run)

Define alternative fishing choice

Description

Required step. Creates a list identifying how alternative fishing choices should be defined. Output is saved to the FishSET database. Run this function before running models. dat must have a zone assignment column (see assignment_column()). In certain cases a centroid table must be saved to the FishSET Database, see occasion_var for details.

Usage

create_alternative_choice(
  dat,
  project,
  occasion = "zonal centroid",
  occasion_var = NULL,
  alt_var = "zonal centroid",
  dist.unit = "miles",
  min.haul = 0,
  zoneID,
  zone.cent.name = NULL,
  fish.cent.name = NULL,
  spatname = NULL,
  spatID = NULL,
  outsample = FALSE
)

Arguments

dat

Required, main data frame containing data on hauls or trips. Table in FishSET database should contain the string MainDataTable.

project

Required, name of project.

occasion

String, determines the starting point when calculating the distance matrix. Options are "zonal centroid", "fishing centroid", "port", or "lon-lat". See occasion_var for requirements.

occasion_var

Identifies an ID column or set of lon-lat variables needed to create the distance matrix. Possible options depend on the value of occasion:

Centroid

When ⁠occasion = zonal/fishing centroid⁠ the possible options are NULL, the name of a zone ID variable, or a set coordinate variables (in Lon-Lat order).

NULL

This will merge centroid lon-lat data to the primary table using the column enter in zoneID. A centroid table must be saved to the FishSET Database.

Zone ID

This option specifies the zone ID variable to merge the centroid table to. For example, a column containing the previous zonal area. A centroid table must be saved to the FishSET Database.

Lon-Lat

A string vector of length two containing the longitude and latitude of an existing set centroid variables in dat.

Port

When occasion = port the possible options include the name of a port ID variable or a set of lon-lat variables describing the location of the port. A value of NULL will return an error.

Port ID

The name of a port ID variable in dat that will be used to join the port table to the primary table. A port table is required (see load_port()) which contains the port name and the longitude and latitude of each port.

Lon-Lat

A string vector of length two containing a port's longitude and latitude in dat.

Lon-Lat

When occasion = lon-lat, occasion_var must contain a string vector of length two containing the longitude and latitude of a vessel's location in the dat. For example, the current or previous haul location.

alt_var

Determines the alternative choices used to calculate the distance matrix. alt_var may be the centroid of zonal assignment ("zonal centroid"), "fishing centroid", or the closest point in fishing zone ("nearest point"). The centroid options require that the appropriate centroid table has been saved to the project's FishSET Database. See create_centroid() to create and save centroids. List existing centroid tables by running list_tables("project", type = "centroid").

dist.unit

String, how distance measure should be returned. Choices are "meters" or "m", "kilometers" or "km", "miles", or "nmiles" (nautical miles). Defaults to "miles".

min.haul

Required, numeric, minimum number of hauls. Zones with fewer hauls than the min.haul value will not be included in model data.

zoneID

Variable in dat that identifies the individual zones or areas.

zone.cent.name

The name of the zonal centroid table to use when occasion or alt_var is set to ⁠zonal centroid⁠. Use list_tables("project", type = "centroid") to view existing centroid tables. See create_centroid() to create centroid tables or centroid_to_fsdb() to create a centroid table from columns found in dat.

fish.cent.name

The name of the fishing centroid table to use when occasion or alt_var is set to ⁠fishing centroid⁠. Use list_tables("project", type = "centroid") to view existing centroid tables. See create_centroid() to create centroid tables or centroid_to_fsdb() to create a centroid table from columns found in dat.

spatname

Required when alt_var = 'nearest point'. spat is a spatial data file containing information on fishery management or regulatory zones boundaries. sf objects are recommended, but sp objects can be used as well. See dat_to_sf() to convert a spatial table read from a csv file to an sf object. To upload your spatial data to the FishSETFolder see load_spatial().If spat should come from the FishSET database, it should be the name of the original file name, in quotes. For example, "pollockNMFSZonesSpatTable". Use tables_database() or list_tables("project", type = "spat") to view the names of spatial tables in the FishSET database.

spatID

Required when alt_var = 'nearest point'. Variable in spat that identifies the individual zones or areas.

outsample

Logical, indicating whether this is for main data or out-of sample data.

Details

Defines the alternative fishing choices. These choices are used to develop the matrix of distances between observed and alternative fishing choices (where they could have fished but did not). The distance matrix is calculated by the make_model_design() function. occasion defines the observed fishing location and alt_var the alternative fishing location. occasion_var identifies an ID column or set of lon-lat variables needed to create the distance matrix.

Parts of the alternative choice list are pulled by create_expectations(), make_model_design(), and the model run discretefish_subroutine()) functions. These output include choices of which variable to use for catch and which zones to include in analyses based on a minimum number of hauls per trip within a zone. Note that if the alternative choice list is modified, the create_expectations() and make_model_design() functions should also be updated before rerunning models.

Value

Saves the alternative choice list to the FishSET database as a list. Output includes:

dataZoneTrue: Vector of 0/1 indicating whether the data from that zone is to be included in the model
greaterNZ: Zones which pass numofNecessary test
numOfNecessary: Minimum number of hauls for zone to be included
altChoiceUnits: Set to miles
altChoiceType: Set to distance
occasion: Identifies how to find latitude and longitude for starting point
occasion_var: Identifies how to find latitude and longitude for starting point
alt_var: Identifies how to find latitude and longitude for alternative choice
zoneRow: Zones and choices array
zone_cent: Geographic centroid for each zone. Generated from find_centroid()
fish_cent: Fishing centroid for each zone. Generated from find_fishing_centroid()
zone_cent_name: Name of the zonal centroid table
fish_cent_name: Name of the fishing centroid table
spat: Spatial data object
spatID: Variable in spat that identifies individuals zones

Create Centroid Table

Description

Create a zonal or fishing centroid table. The centroid can be joined with the primary data if output = "dataset". The centroid table is automatically saved to the FishSET Database.

Usage

create_centroid(
  spat = NULL,
  dat = NULL,
  project,
  spatID = NULL,
  zoneID = NULL,
  lon.dat = NULL,
  lat.dat = NULL,
  weight.var = NULL,
  type = "zonal centroid",
  names = NULL,
  cent.name = NULL,
  output = "dataset"
)

Arguments

spat

Spatial data containing information on fishery management or regulatory zones. Required for type = "zonal centroid", not required for type = "fishing centroid". spat will be included in centroid table name.

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'. dat is not required if type = "zonal centroid" and output = "centroid table".

project

Name of project.

spatID

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, spatID should be name of list containing information on zones. Ignored if type = "fishing centroid".

zoneID

Variable in dat that identifies zonal assignments. zoneID is not required if type = "zonal centroid" and output = "centroid table".

lon.dat

Longitude variable in dat. Required for type = "fishing centroid".

lat.dat

Latitude variable in dat. Required for type = "fishing centroid".

weight.var

Variable from dat for weighted average (for type = "fishing centroid". only). If weight.var is defined, the centroid is defined by the latitude and longitude of fishing locations in each zone weighted by weight.var.

type

The type of centroid to create. Options include "zonal centroid" and "fishing centroid". See other arguments for type requirements.

names

Character vector of length two containing the names of the fishing centroid columns. The order should be c("lon_name", "lat_name"). The default names are c("weight_cent_lon", "weight_cent_lat") for weighted fishing centroid and c("fish_cent_lon", "fish_cent_lat") for unweighted fishing centroid.

cent.name

A string to include in the centroid table name. Table names take the form of "projectNameZoneCentroid" for zonal centroids and "projectNameFishCentroid" for fishing centroids.

output

Options are "centroid table", "dataset", or "both". "centroid table" returns a table containing the zone name and the longitude and latitude of the centroid. "dataset" returns the primary table joined with the centroid table. "both" returns a list containing the merged primary table and the centroid table.


Interactive application to create distance between points variable

Description

Adds a variable for distance between two points to the primary dataset. There are two versions of this function. The difference between the two versions is how additional arguments specific to start and end locations are added. This version requires only five arguments to be specified before running. Additional arguments specific to identifying the lat/lon of start or end points are added through prompts. This function is designed for an interactive session. The create_dist_between_for_gui function requires all necessary arguments to be specified before running and is best used in a non-interactive session. Both versions of the distance between function require that the start and end points be different vectors. If the start or ending points are from a port then PortTable must be specified to obtain lat/lons. If the start or ending points are the center of a fishing zone or area then spat, lon.dat, lat.dat, cat, lon.spat, and lat.spat must be specified to obtain latitude and longitude.

Usage

create_dist_between(
  dat,
  project,
  start,
  end,
  units = c("miles", "meters", "km", "midpoint"),
  zoneid = NULL,
  name = "distBetween"
)

Arguments

dat

Primary data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Project name.

start, end

Starting and ending location. Should be a port, lat/lon location, or the fishery management zone/area centroid. or area. If port is desired, start should be the column name in the dat containing the port Latitude and longitude for the port are extracted from the port table. If a lat/lon location is desired then start should be a character string of column names from dat. The order must be lon, lat. If fishery management centroid is used then set start="centroid" or end="centroid". find_centroid and assignment_column will be called to identify the latitude and longitude if the centroid table does not exist in the FishSET database.

units

Unit of measurement for calculated distance between start and ending points. Can be in "miles", "meters", "kilometers", or "midpoint" location.

zoneid

Variable in dat that identifies the individual zones or areas. Define if exists in dat and is not names 'ZoneID'.

name

String, output variable name. Defaults to 'distBetween'.

Details

Additional arguments.
Further arguments are required to identify the latitude and longitude of the starting or ending location if start or end is defined as zonal centroid or a column from primary dataset containing port information, such as departing or embarking port. Prompts will appear asking for required arguments.

Port arguments required:

portTable: Port table from FishSET database. Required if start or end is a port vector.



Centroids arguments required:

spat: Spatial data set containing information on fishery management or regulatory zones. Can be shape file, json, geojson, data frame, or list data frame or list. Required if start or end is centroid.
lon.dat: Longitude variable from dat.
lat.dat: Latitude variable from dat.
lon.spat: Variable or list from spat containing longitude data. Required if start or end is centroid. Leave as NULL if spat is a shape or json file.
lat.spat: Variable or list from spat containing latitude data. Required if start or end is centroid. Leave as NULL if spat is a shape or json file.
cat: Variable or list in spat that identifies the individual areas or zones. If spat is class sf, cat should be the name of list containing information on zones.

Value

Returns primary data set with distance between variable.

Examples

## Not run: 
pollockMainDataTable <- create_dist_between(pollockMainDataTable, 'pollock', 'centroid',
 'EMBARKED_PORT', units = 'miles', 'DistCentPort')

pollockMainDataTable <- create_dist_between(pollockMainDataTable, 'pollock', c('LonLat_START_LON',
 'LonLat_START_LAT'), c('LonLat_END_LON','LonLat_END_LAT'), units='midpoint', 'DistLocLock')
 
pollockMainDataTable <- create_dist_between(pollockMainDataTable, 'pollock', 'DISEMBARKED_PORT',
  'EMBARKED_PORT', units='meters', 'DistPortPort')

## End(Not run)

Create distance between points variable - non-interactive version

Description

Adds distance between two points to the primary data set. There are two versions of this function. The difference between the two versions is how additional arguments specific to start and end locations are added. This version requires all necessary arguments to be specified before running and is best used in a non-interactive session. The create_dist_between version requires only five arguments to be specified before running. Additional arguments specific to identifying the lat/long of start or end points are added through prompts. This function is designed for an interactive session. Both versions of the distance between function require that the start and end points be different vectors. If the start or ending points are from a port, then PortTable must be specified to obtain lat/lons. If the start or ending points are the center of a fishing zone or area then spat, lon.dat, lat.dat, cat, lon.spat, and lat.spat must be specified to obtain latitude and longitude.

Usage

create_dist_between_for_gui(
  dat,
  project,
  start = c("lat", "lon"),
  end = c("lat", "lon"),
  units,
  name = "DistBetwen",
  portTable = NULL,
  zoneid,
  spat = NULL,
  lon.dat = NULL,
  lat.dat = NULL,
  cat = NULL,
  lon.spat = NULL,
  lat.spat = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name

start, end

Starting location. Should be a port, lat/lon location, or centroid of regulatory zone/area.

port: start should be the column name in dat containing the port names. Latitude and longitude for the port are extracted from the port table.
lat/lon: start should be a character string of column names from dat. The order must be `lat` then `lon` start=c('lat', 'lon').
units

Unit of distance. Choices are "miles", "kilometers", or "midpoint".

name

String, name of new variable. Defaults to 'DistBetween'.

portTable

Data table containing port data. Required if start or end are a vector from the dat containing port names.

zoneid

Variable in dat that identifies the individual zones or areas. Required if zone identifier variable exists and is not 'ZoneID'. Defaults to NULL.

spat

Spatial data containing information on fishery management or regulatory zones. Shape, json, geojson, and csv formats are supported. Required if start or end are "centroid" and a centroid table doesn't exist in the FishSET database.

lon.dat

Longitude variable from dat. Required if start or end are ‘centroid’.

lat.dat

Latitude variable from dat. Required if start or end are ‘centroid’.

cat

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, cat should be name of list containing information on zones. Required if start or end are "centroid".

lon.spat

Variable or list from spat containing longitude data. Required for csv files. Leave as NULL if spat is a shape or json file, Required if start or end are "centroid".

lat.spat

Variable or list from spat containing latitude data. Required for csv files. Leave as NULL if spat is a shape or json file, Required if start or end are "centroid".

Value

Primary data set with distance between points variable added.


Create duration of time variable

Description

Create duration of time variable based on start and ending dates in desired temporal units.

Usage

create_duration(
  dat,
  project,
  start,
  end,
  units = c("week", "day", "hour", "minute"),
  name = "create_duration"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

start

Date variable from dat indicating start of time period.

end

Date variable from dat indicating end of time period.

units

String, unit of time for calculating duration. Must be "week", "day", "hour", or "minute".

name

String, name of created vector. Defaults to name of the function if not defined.

Details

Calculates the duration of time between two temporal variables based on defined time unit. The new variable is added to the dataset. A duration of time variable is required for other functions, such as cpue.

Value

Returns primary dataset with duration of time variable added.

Examples

## Not run: 
pollockMainDataTable <- create_duration(pollockMainDataTable, 'pollock', 'TRIP_START', 'TRIP_END',
  units = 'minute', name = 'TripDur')

## End(Not run)

Create expected catch/expected revenue matrix

Description

Create expected catch or expected revenue matrix. The matrix is required for the logit_c model. Multiple user-defined matrices can be saved by setting replace.output = FALSE and re-running the function.

Usage

create_expectations(
  dat,
  project,
  catch,
  price = NULL,
  defineGroup = NULL,
  temp.var = NULL,
  temporal = "daily",
  calc.method = "standardAverage",
  lag.method = "simple",
  empty.catch = NULL,
  empty.expectation = 1e-04,
  temp.window = 7,
  temp.lag = 0,
  year.lag = 0,
  dummy.exp = FALSE,
  default.exp = FALSE,
  replace.output = TRUE,
  weight_avg = FALSE,
  outsample = FALSE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

catch

Variable from dat containing catch data.

price

Optional, variable from dat containing price/value data. Price is multiplied against catch to generated revenue. If revenue exists in dat and you wish to use this revenue instead of price, then catch must be a vector of 1 of length equal to dat. Defaults to NULL.

defineGroup

Optional, variable from dat that defines how to split the fleet. Defaults to treating entire dataframe dat as a fleet.

temp.var

Optional, temporal variable from dat. Set to NULL if temporal patterns in catch should not be considered.

temporal

String, choices are "daily" or "sequential". Should time, if temp.var is defined, be included as a daily timeline or sequential order of recorded dates. For daily, catch on dates with no record are filled with NA. The choice affects how the rolling average is calculated. If temporal is daily then the window size for average and the temporal lag are in days. If sequential, then averaging will occur over the specified number of observations, regardless of how many days they represent.

calc.method

String, how catch values are average over window size. Select standard average ("standardAverage"), simple lag regression of means ("simpleLag"), or weights of regressed groups ("weights")

lag.method

String, use regression over entire group ("simple") or for grouped time periods ("grouped").

empty.catch

String, replace empty catch with NA, 0, mean of all catch ("allCatch"), or mean of grouped catch ("groupCatch").

empty.expectation

Numeric, how to treat empty expectation values. Choices are to not replace (NULL) or replace with 0.0001 or 0.

temp.window

Numeric, temporal window size. If temp.var is not NULL, set the window size to average catch over. Defaults to 14 (14 days if temporal is "daily").

temp.lag

Numeric, temporal lag time. If temp.var is not NULL, how far back to lag temp.window.

year.lag

If expected catch should be based on catch from previous year(s), set year.lag to the number of years to go back.

dummy.exp

Logical, should a dummy variable be created? If TRUE, output dummy variable for originally missing value. If FALSE, no dummy variable is outputted. Defaults to FALSE.

default.exp

Whether to run default expectations. Defaults to FALSE. Alternatively, a character string containing the names of default expectations to run can be entered. Options include "recent", "older", "oldest", and "logbook". The logbook expectation is only run if defineGroup is used. "recent" will not include defineGroup. Setting default.exp = TRUE will include all four options. See Details for how default expectations are defined.

replace.output

Logical, replace existing saved expected catch data frame with new expected catch data frame? If FALSE, new expected catch data frames appended to previously saved expected catch data frames. Default is TRUE. If TRUE

weight_avg

Logical, if TRUE then all observations for a given zone on a given date will be included when calculating the mean, thus giving more weight to days with more observations in a given zone. If FALSE, then the daily mean for a zone will be calculated prior to calculating the mean across the time window.

outsample

Logical, if TRUE then generate expected catch matrix for out-of-sample data. If FALSE generate for main data table. Defaults to outsample = FALSE

Details

Function creates an expectation of catch or revenue for alternative fishing zones (zones where they could have fished but did not). The output is saved to the FishSET database and called by the make_model_design function. create_alternative_choice must be called first as observed catch and zone inclusion requirements are defined there.
The primary choices are whether to treat data as a fleet or to group the data (defineGroup) and the time frame of catch data for calculating expected catch. Catch is averaged along a daily or sequential timeline (temporal) using a rolling average. temp.window and temp.lag determine the window size and temporal lag of the window for averaging. Use temp_obs_table before using this function to assess the availability of data for the desired temporal moving window size. Sparse data is not suited for shorter moving window sizes. For very sparse data, consider setting temp.var to NULL and excluding temporal patterns in catch.
Empty catch values are considered to be times of no fishing activity. Values of 0 in the catch variable are considered times when fishing activity occurred but with no catch. These points are included in the averaging and dummy creation as points in time when fishing occurred.
Four default expected catch cases will be run:

  • recent: Moving window size of two days. In this case, there is no grouping, and catch for entire fleet is used.

  • older: Moving window size of seven days and lag of two days. In this case, vessels are grouped (or not) based on defineGroup argument.

  • oldest: Moving window of seven days and lag of eight days. In this case, vessels are grouped (or not) based on defineGroup argument.

  • logbook: Moving window size of 14 days and lag of one year, seven days. Only used if fleet is defined in defineGroup.

Value

Function saves a list of expected catch matrices to the FishSET database as projectExpectedCatch. The list includes the expected catch matrix from the user-defined choices, recent fine grained information, older fine grained information, oldest fine grained information, and logbook level information. Additional expected catch cases can be added to the list by specifying replace.output = FALSE. The list is automatically saved to the FishSET database and is called in make_model_design. The expected catch output does not need to be loaded when defining or running the model.

newGridVar, newDumV

Examples

## Not run: 
create_expectations(pollockMainDataTable, "pollock", "OFFICIAL_TOTAL_CATCH_MT",
  price = NULL, defineGroup = "fleet", temp.var = "DATE_FISHING_BEGAN",
  temporal = "daily", calc.method = "standardAverage", lag.method = "simple",
  empty.catch = "allCatch", empty.expectation = 0.0001, temp.window = 4,
  temp.lag = 2, year.lag = 0, dummy.exp = FALSE, replace.output = FALSE,
  weight_avg = FALSE, outsample = FALSE
)

## End(Not run)

Creates haul midpoint latitude and longitude variables

Description

Calculates latitude and longitude of the haul midpoint and adds two variables to the primary data set: the midpoint latitude and the midpoint longitude.

Usage

create_mid_haul(
  dat,
  project,
  start = c("lon", "lat"),
  end = c("lon", "lat"),
  name = "mid_haul"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

start

Character string, variables in dat defining the longitude and latitude of the starting location of haul. Must be in decimal degrees.

end

Character string, variables in dat defining the longitude and latitude of the ending location of haul. Must be in decimal degrees.

name

String, name of new variable. Defaults to 'mid_haul'.

Details

Each row of data must be a unique haul. Requires a start and end point for each observation.

Value

Returns primary dataset with two new variables added: latitude and longitude of haul midpoint.

Examples

## Not run: 
pollockMainDataTable <- create_mid_haul(pollockMainDataTable, 'pollock', 
    start = c('LonLat_START_LON', 'LonLat_START_LAT'), 
   end = c('LonLat_END_LON', 'LonLat_END_LAT'), name = 'mid_haul')

## End(Not run)

Create fishery season identifier variable

Description

Create fishery season identifier variable

Usage

create_seasonal_ID(
  dat,
  project,
  seasonal.dat,
  use.location = c(TRUE, FALSE),
  use.geartype = c(TRUE, FALSE),
  sp.col,
  target = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

seasonal.dat

Table containing date of fishery season(s). Can be pulled from the FishSET database.

use.location

Logical, should fishery season dates depend on fishery location? Column names containing location in dat and seasonal.dat must match.

use.geartype

Logical, should fishery season dates depend on gear type. Column names containing gear type in dat and seasonal.dat must match.

sp.col

Variable in seasonal.dat containing species names.

target

Name of target species. If target is NULL, runs through fisheries in order listed in seasonal.dat

Details

Uses a table of fishery season dates to create fishery season identifier variables. Output is a SeasonID variable and/or multiple SeasonID*fishery variables. If fishery season dates vary by location or gear type, then use.location and use.geartype should be TRUE.

The function matches fishery season dates provided in seasonal.dat to the earliest date variable in dat. The 'seasonID' variable is a vector of fishery seasons whereas the 'SeasonID*fishery' variables are 1/0 depending on whether the fishery was open on the observed date.

If target is not defined, then each row of seasonID is defined as the earliest fishery listed in seasonal.dat for which the fishery season date encompasses the date variable in the primary dataset. If target fishery is defined, then 'SeasonID' is defined by whether the target fishery is open on the date in the primary dataset or a different fishery. The vector is filled with 'target' or 'other'.

'SeasonID*fishery' variables are a 1/0 seasonID vector for each fishery (labeled by seasonID and fishery) where 1 indicates the dates for a given row in the main data table fall within the fishery dates for that fishery.

Value

Returns the primary dataset with the variable SeasonID, or a series of variables identifying by the individual fisheries included (seasonID*fishery).

Examples

## Not run: 
pcodMainDataTable <- create_seasonal_ID("pcodMainDataTable", seasonal_dat,
  use.location = TRUE, use.geartype = TRUE, sp.col = "SPECIES", target = "POLLOCK"
)

## End(Not run)

Create starting location variable

Description

Creates a variable containing the zone/area location of a vessel when choice of where to fish next was made. This variable is required for data with multiple sets or hauls in a single trip and for the full information model with Dahl's correction (logit_correction).

Usage

create_startingloc(
  dat,
  project = NULL,
  spat,
  port,
  port_name,
  port_lon,
  port_lat,
  trip_id,
  haul_order,
  starting_port,
  zoneID,
  spatID,
  name = "startingloc"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Name of project

spat

Spatial data. Required if ZoneID does not exists in dat. Shape, json, geojson, and csv formats are supported.

port

Port data. Contains columns: Port_Name, Port_Long, Port_Lat. Table is generated using the load_port and saved in the FishSET database as the project and port table, for example 'pollockPortTable'.

port_name

Character string indicating the column in port table that contains the port name

port_lon

Character string indication the column in port table that contains port longitude

port_lat

Character string indication the column in port table that contains port latitude

trip_id

Variable in dat that identifies unique trips.

haul_order

Variable in dat containing information on the order that hauls occur within a trip. Can be time, coded variable, etc.

starting_port

Variable in dat to identify port at start of trip.

zoneID

Variable in dat that identifies the individual zones or areas.

spatID

Variable in spat that identifies the individual zones or areas.

name

String, name of created variable. Defaults to name of the function if not defined.

Details

Function creates the startloc vector that is required for the full information model with Dahl's correction logit_correction. The vector is the zone location of a vessel when the decision of where to fish next was made. Generally, the first zone of a trip is the departure port. The assignment_column function is called to assign starting port locations and haul locations to zones. If ZoneID exists in dat, assignment_column is not called and the following arguments are not required: spat, lon.dat, lat.dat, cat, lon.grid, lat.grid.

Value

Primary data set with starting location variable added.

Examples

## Not run: 
pcodMainDataTable <- create_startingloc(pcodMainDataTable, 'pcod',
    map2, "pcodPortTable", "TRIP_SEQ", "HAUL_SEQ", "DISEMBARKED_PORT", 
 "START_LON", "START_LAT", "NMFS_AREA", "STARTING_LOC"
)

## End(Not run)

Create trip centroid variable

Description

Create latitude and longitude variables containing the centroid of each trip

Usage

create_trip_centroid(dat, project, lon, lat, tripID, weight.var = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

lon

Variable in dat containing longitudinal data.

lat

Variable in dat containing latitudinal data.

tripID

Variable in dat containing trip identifier. If trip identifier should be defined by more than one variable then list as c('var1', 'var2').

weight.var

Variable in dat for computing the weighted average.

Details

Computes the average longitude and latitude for each trip. Specify weight.var to calculate the weighted centroid. Additional arguments can be added that define unique trips. If no additional arguments are added, each row will be treated as a unique trip.

Value

Returns the primary dataset with centroid latitude and centroid longitude variables added.

Examples

## Not run: 
pollockMainDataTable <- create_trip_centroid(pollockMainDataTable, 'pollock', 'LonLat_START_LON', 
  'LonLat_START_LAT', weight.var = NULL, 'DISEMBARKED_PORT', 'EMBARKED_PORT')

## End(Not run)

Create haul level trip distance variable

Description

Create haul level trip distance variable

Usage

create_trip_distance(
  dat,
  project,
  port,
  trip_id,
  starting_port,
  starting_haul = c("Lon", "Lat"),
  ending_haul = c("Lon", "Lat"),
  ending_port,
  haul_order,
  name = "TripDistance",
  a = 6378137,
  f = 1/298.257223563
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

port

Port data frame. Contains columns: Port_Name, Port_Long, Port_Lat. Table is generated using the load_port function and saved in the FishSET database as the project and port, for example 'pollockPortTable'.

trip_id

Unique trip identifier in dat.

starting_port

Variable in dat containing ports at the start of the trip.

starting_haul

Character string, variables containing latitude and longitude at start of haul in dat.

ending_haul

Character string, variables containing latitude and longitude at end of haul in dat.

ending_port

Variable in dat containing ports at the end of the trip.

haul_order

Variable in dat that identifies haul order within a trip. Can be time, coded variable, etc.

name

String, name of created variable. Defaults to 'TripDistance'.

a

Numeric, major (equatorial) radius of the ellipsoid. The default value is for WGS84 ellipsoid.

f

Numeric, ellipsoid flattening. The default value is for WGS84 ellipsoid.

Details

Summation of distance across a trip based on starting and ending ports and hauls in between. The function uses distGeo from the geosphere package to calculate distances between hauls. Inputs are the trips, ports, and hauls from the primary dataset, and the latitude and longitude of ports from the port. The ellipsoid arguments, a and f, are numeric and can be changed if an ellipsoid other than WGS84 is appropriate. See the geosphere R package for more details (https://cran.r-project.org/web/packages/geosphere/geosphere.pdf).

Value

Returns the primary dataset with a trip distance variable added.

Examples

## Not run: 
pcodMainDataTable <- create_trip_distance(pcodMainDataTable, "pcod", "pcodPortTable", 
  "TRIP_SEQ", "DISEMBARKED_PORT", c("LonLat_START_LON", "LonLat_START_LAT"),
  c("LonLat_END_LON", "LonLat_END_LAT"), "EMBARKED_PORT", "HAUL_SEQ", "TripDistance"
)

## End(Not run)

#

Create numeric variable using arithmetic expression

Description

Creates a new variable based on the arithmetic operation between two variables. Function is useful for creating rate variables or the summation of two related variables.

Usage

create_var_num(dat, project, x, y, method, name = "create_var_num")

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Variable in dat. Variable will be the numerator if method is division.

y

Variable in dat or numeric value. Variable will be the denominator if method is division.

method

String, arithmetic expression. Options include: "sum", addition ("add"), subtraction ("sub"), multiplication ("mult"), and division ("div").

name

String, name of created vector. Defaults to name of the function if not defined.

Details

Creates a new numeric variable based on the defined arithmetic expression method. New variable is added to the primary dataset.

Value

Returns primary dataset with new variable added.

Examples

## Not run: 
pollockMainDataTable <- create_var_num(pollockMainDataTable, 'pollock', x = 'HAUL_CHINOOK',
    y = 'HAUL_CHUM', method = 'sum', name = 'tot_salmon')

## End(Not run)

K-fold cross validation

Description

K-fold cross validation for estimating model performance

Usage

cross_validation(
  project,
  mod.name,
  zone.dat,
  groups,
  k = NULL,
  time_var = NULL,
  use.scalers = FALSE,
  scaler.func = NULL
)

Arguments

project

Name of project

mod.name

Name of saved model to use. Argument can be the name of the model or can pull the name of the saved "best" model. Leave mod.name empty to use the saved "best" model. If more than one model is saved, mod.name should be the numeric indicator of which model to use. Use table_view("modelChosen", project) to view a table of saved models.

zone.dat

Variable in main data table that identifies the individual zones or areas.

groups

Determine how to subset dataset into groups for training and testing

k

Integer, value required if groups = 'Observations' to determine the number of groups for splitting data into training and testing datasets. The value of k should be chosen to balance bias and variance and values of k = 5 or 10 have been found to be efficient standard values in the literature. Note that higher k values will increase runtime and the computational cost of cross_validation. Leave-on-out cross validation is a type of k-fold cross validation in which k = n number of observations, which can be useful for small datasets.

time_var

Name of column for time variable. Required if groups = 'Years'.

use.scalers

Input for create_model_input(). Logical, should data be normalized? Defaults to FALSE. Rescaling factors are the mean of the numeric vector unless specified with scaler.func.

scaler.func

Input for create_model_input(). Function to calculate rescaling factors.

Details

K-fold cross validation is a resampling procedure for evaluating the predictive performance of a model. First the data are split into k groups, which can be split randomly across observations (e.g., 5-fold cross validation where each group is randomly assigned across observations) or split based on a particular variable (e.g., split groups based on gear type). Each group takes turn being the 'hold-out' or 'test' data set, while the remaining groups are the training dataset (parameters are estimated for the training dataset). Finally the predictive performance of each iteration is calculated as the percent absolute prediction error. s

Examples

## Not run: 

model_design_outsample("scallop", "scallopModName")


## End(Not run)

Convert dataframe to sf

Description

Used to convert spatial data with no spatial class to a sf object. This is useful if the spatial data was read from a non-spatial file type, e.g. a CSV file.

Usage

dat_to_sf(dat, lon, lat, id, cast = "POLYGON", multi = FALSE, crs = 4326)

Arguments

dat

Spatial data containing information on fishery management or regulatory zones.

lon

Longitude variable in spatdat.

lat

Latitude variable in spatdat.

id

Spatial feature ID column.

cast

Spatial feature type to create. Commonly used options are "POINT", "LINESTRING", and "POLYGON". See st_cast for details.

multi

Logical, use if needing to convert to a multi-featured (grouped) sf object, e.g. MULTIPOLYGON or MULTILINESTRING.

crs

Coordinate reference system to assign to dat. Defaults to WGS 84 (EPSG: 4326).


Check for common data quality issues

Description

Check primary data for common data quality issues, such as NaNs, NAs, outliers, unique rows, and empty variables.

Usage

data_check(dat, project, x)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

x

Variable in dat to check for outliers. Must be in quotes if called from the FishSET database.

Details

Prints summary stats for all variables in dat. Prints column names that contain NaNs or NAs. Checks for outliers for specified variable x. Checks that all column names are unique, whether any columns in dat are empty, whether each row is a unique choice occurrence at the haul or trip level, that data for either lat/lon or fishing area are included. The function is also called by other functions.

Examples

## Not run: 
data_check(pcodMainDataTable, "OFFICIAL_TOTAL_CATCH_MT")

## End(Not run)

Upload data from file, FishSET DB, or working environment

Description

Helper function that can read data from file, from FishSET DB, or a dataframe in the working environment. Used for data upload functions: load_maindata, load_port, load_aux, load_grid, load_spatial.

Usage

data_upload_helper(dat, type, ...)

Arguments

dat

Reference to a dataframe. This can be a filepath, the name of an existing FishSET table, or a dataframe object in the working environment.

type

The type of data to upload. Options include "main", "port", "grid", "aux", and "spat".

...

Additional arguments passed to read_dat.

Examples

## Not run: 
dataset <- data_upload_helper(dat, type = "main")

## End(Not run)

Check and convert lat/lon to decimal degrees

Description

Check that latitude and longitude are in decimal degrees and the variable sign is correct. Correct lat/lon if required.

Usage

degree(
  dat,
  project,
  lat = NULL,
  lon = NULL,
  latsign = FALSE,
  lonsign = FALSE,
  replace = TRUE
)

Arguments

dat

Dataset containing latitude and longitude data.

project

Project name.

lat

Variable(s) containing latitude data. If NULL the function will attempt to search for all latitude variables by name (e.g. by matching "lat" or "LAT").

lon

Variable(s) containing longitude data. If NULL the function will attempt to search for all longitude variables by name (e.g. by matching "lon" or "LON").

latsign

How should the sign value of lat be changed? Choices are NULL for no change, "neg" to convert all positive values to negative, "pos" to convert all negative values to positive, and "all" to change all values.

lonsign

How should the sign value of lon be changed? Choices are NULL for no change, "neg" to convert all positive values to negative, "pos" to convert all negative values to positive, and "all" to change all values.

replace

Logical, should lat and lon in dat be converted to decimal degrees? Defaults to TRUE. Set to FALSE if checking for compliance.

Details

First checks whether any variables containing 'lat' or 'lon' in their names are numeric. Returns a message on results. To convert a variable to decimal degrees, identify the lat or lon variable(s) and set replace = TRUE. To change the sign, set latsign (for lat) or lonsign (for lon = TRUE. FishSET requires that latitude and longitude be in decimal degrees.

Value

Returns the primary dataset with the latitudes and longitudes converted to decimal degrees if replace = TRUE or if Changing the sign. Otherwise, a message indicating whether selected longitude and latitude variables are in the correct format.

Examples

## Not run: 
# check format
degree(pollockMainDataTable, 'pollock', lat = 'LatLon_START_LAT',
       lon = 'LatLon_START_LON')

# change signs and convert to decimal degrees
pollockMainDataTable <- degree(pollockMainDataTable, 'pollock', 
                               lat = 'LatLon_START_LAT', 
                               lon = 'LatLon_START_LON', latsign = FALSE, 
                               lonsign = FALSE, replace = TRUE)

## End(Not run)

Delete table meta data or project meta file

Description

Delete table meta data or project meta file

Usage

delete_meta(project, tab.name = NULL, delete_file = FALSE)

Arguments

project

Project name.

tab.name

String, table name.

delete_file

Logical, whether to delete project meta file.


Delete models from FishSET Database

Description

Delete models from the model design file (MDF) and the model output table (MOT).

Usage

delete_models(project, model.names, delete.nested = FALSE)

Arguments

project

String, name of project.

model.names

String, name of models to delete. Use model_names() to see model names from the model design file.

delete.nested

Logical, whether to delete a model containing nested models. Defaults to FALSE.

Details

Nested models are conditional logit models that include more than one expected catch/revenue model. For example, if a conditional logit model named 'logit_c_mod1' was saved to the MDF with the argument expectcatchmodels = list('exp1', 'recent', 'older'), then ⁠'logit_c_mod1⁠ will include three separate models, each using a different expected catch matrix. To delete all three models, enter model.names = 'logit_c_mod1' and set delete.nested = TRUE. To delete one or more specific nested models, use model.names = 'logit_c_mod1.exp1', i.e. the original model name, a period, and the name of the expected catch matrix used in the model.

See Also

model_design_list(), model_out_view()


Create KDE, CDF, or empirical CDF plots

Description

Creates a kernel density estimate, empirical cumulative distribution function, or cumulative distribution function plot of selected variable. Grouping, filtering, and several plot options are available.

Usage

density_plot(
  dat,
  project,
  var,
  type = "kde",
  group = NULL,
  combine = TRUE,
  date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  conv = "none",
  tran = "identity",
  format_lab = "decimal",
  scale = "fixed",
  bw = 1,
  position = "identity",
  pages = "single"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

var

String, name of variable to plot.

type

String, type of density plot. Options include "kde" (kernel density estimate), "ecdf" (empirical cdf), "cdf" (cumulative distribution function), or "all" (all plot types). Two or more plot types can be chosen.

group

Optional, string names of variables to group by. If two or grouping variables are included, the default for "cdf" and "ecdf" plots is to not combine groups. This can be changed using combine = TRUE. "kde" plots always combine two or more groups. "cdf" and "ecdf" plots can use up to two grouping variables if combine = FALSE: the first variable is represented by color and second by line type.

combine

Logical, whether to combine the variables listed in group for plot.

date

Date variable from dat used to subset and/or facet the plot by.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by.

facet_by

Variable name to facet by. This can be a variable that exists in dat or a variable created by density_plot() such as "year", "month", or "week". date is required if facetting by period.

conv

Convert catch variable to "tons", "metric_tons", or by using a function entered as a string. Defaults to "none" for no conversion.

tran

String; name of function to transform variable, for example "log" or "sqrt".

format_lab

Formatting option for x-axis labels. Options include "decimal" or "scientific".

scale

Scale argument passed to facet_grid. Defaults to "fixed". Other options include "free_y", "free_x", and "free".

bw

Adjusts KDE bandwidth. Defaults to 1.

position

The position of the grouped variable for KDE plot. Options include "identity", "stack", and "fill".

pages

Whether to output plots on a single page ("single", the default) or multiple pages ("multi").

Details

The data can be filtered by date or by variable (see filter_date and filter_by). If type contains "kde" or "all" then grouping variables are automatically combined. Any variable in dat can be used for faceting, but "year", "month", or "week" are also available if date is provided.

Value

denstiy_plot() can return up to three plots in a single call. When pages = "single" all plots are combined and stacked vertically. pages = "multi" will return separate plots.

Examples

## Not run: 

density_plot(pollockMainDataTable, "pollock", var = "OFFICIAL_TOTAL_CATCH_MT",
             type = c("kde", "ecdf"))

# facet 
density_plot(pollockMainDataTable, "pollock", var = "OFFICIAL_TOTAL_CATCH_MT",
             type = c("kde", "ecdf"), facet_by = "GEAR_TYPE")

# filter by period
density_plot(pollockMainDataTable, "pollock", var = "OFFICIAL_TOTAL_CATCH_MT", 
             type = "kde", date = "FISHING_START_DATE", filter_date = "year-month", 
             filter_value = list(2011, 9:11))

## End(Not run)

Run discrete choice model

Description

Subroutine to run chosen discrete choice model. Function pulls necessary data generated in make_model_design and loops through model design choices and expected catch cases. Output is saved to the FishSET database.

Usage

discretefish_subroutine(
  project,
  run = "new",
  select.model = FALSE,
  explorestarts = TRUE,
  breakearly = TRUE,
  space = NULL,
  dev = NULL,
  use.scalers = FALSE,
  scaler.func = NULL,
  CV = FALSE
)

Arguments

project

String, name of project.

run

String, how models should be run. 'new' will only run models that exist in the model design file but not in the model output table. 'all' will run all models in the model design file, replacing existing model output. The third option is to enter a vector of model names to run (use model_names() to see current model names). If the specified model already has output it will be replaced.

select.model

Return an interactive data table that allows users to select and save table of best models based on measures of fit.

explorestarts

Logical, should starting parameters value space be explored? Set to TRUE if unsure of the number of starting parameter values to include or of reasonable starting parameters values. Better starting parameter values can help with model convergence.

breakearly

Logical, if explorestarts = TRUE, should the first set of starting parameter values that returns a valid (numeric) loglikelihood value be returned (TRUE) or should the entire parameter space be considered and the set of starting parameter values that return the lowest loglikelihood value be returned (FALSE).

space

Specify if explorestarts = TRUE. List of length 1 or length equal to the number of models to be evaluated. space is the number of starting value permutations to test (the size of the space to explore). The greater the dev argument, the larger the space argument should be.

dev

Specify if explorestarts = TRUE. List of length 1 or length equal to the number of models to be evaluated. dev refers to how far to deviate from the average parameter values when exploring (random normal deviates). The less certain the average parameters are, the greater the dev argument should be.

use.scalers

Logical, should data be normalized? Defaults to FALSE. Rescaling factors are the mean of the numeric vector unless specified with scaler.func.

scaler.func

Function to calculate rescaling factors. Can be a generic function, such as mean, or a user-defined function. User-defined functions must be specified as scaler.fun = function(x, FUN = sd) 2*FUN(x). This example returns two times the standard deviation of x.

CV

Logical, CV = TRUE when running discretefish_subroutine for k-fold cross validation, and the default value is CV = FALSE.

Details

Runs through model design choices generated by make_model_design and stored as 'ModelInputData' in FishSET database. Data matrix is created in create_model_input. Required data, optional data, and details on likelihood functions are outlined in make_model_design.

Likelihood-specific initial parameter estimates:

  1. Conditional logit likelihood (logit_c)
    Starting parameter values takes the order of: c([alternative-specific parameters], [travel-distance parameters]). The alternative-specific parameters and travel-distance parameters are of length (# of alternative-specific variables) and (# of travel-distance variables) respectively.

  2. Zonal logit with area specific constants (logit_zonal)
    Starting parameters takes the order of: c([average-catch parameters], [travel-distance parameters]). The average-catch and travel-distance parameters are of length (# of average-catch variables)*(k-1) and (# of travel-distance variables) respectively, where (k) equals the number of alternative fishing choices.

  3. Full information model with Dahl's correction function (logit_correction)
    Starting parameter values takes the order of: c([marginal utility from catch], [catch-function parameters], [polynomial starting parameters], [travel-distance parameters], [catch sigma]). The number of polynomial interaction terms is currently set to 2, so given the chosen degree 'polyn' there should be "(((polyn+1)*2)+2)*(k)" polynomial starting parameters, where (k) equals the number of alternative fishing choices. The marginal utility from catch and catch sigma are of length equal to unity respectively. The catch-function and travel-distance parameters are of length (# of catch variables)*(k) and (# of cost variables) respectively.

  4. Expected profit model with normal catch function (epm_normal)
    Starting parameters values take the order of: c([catch-function parameters], [travel-distance parameters], [catch sigma(s)], [scale parameter]). The catch-function and travel-distance parameters are of length (# of catch-function variables)*(k) and (# of travel-distance variables) respectively, where (k) equals the number of alternative fishing choices. The catch sigma(s) are either of length equal to unity or length (k) if the analyst is estimating location-specific catch sigma parameters. The scale parameter is of length equal to unity.

  5. Expected profit model with Weibull catch function (epm_weibull)
    Starting parameter values takes the order of: c([catch-function parameters], [travel-distance parameters], [catch sigma(s)], [scale parameter]). The catch-function and travel-distance parameters are of length (# of catch-function variables)*(k) and (# of travel-distance variables) respectively, where (k) equals the number of alternative fishing choices. The catch sigma(s) are either of length equal to unity or length (k) if the analyst is estimating location-specific catch sigma parameters. The scale parameter is of length equal to unity.

  6. Expected profit model with log-normal catch function (epm_lognormal)
    Starting parameter values takes the order of: c([catch-function parameters], [travel-distanceparameters], [catch sigma(s)], [scale parameter]). The catch-function and travel-distance parameters are of length (# of catch-function variables)*(k) and (# of travel-distance variables) respectively, where (k) equals the number of alternative fishing choices. The catch sigma(s) are either of length equal to unity or length (k) if the analyst is estimating location-specific catch sigma parameters. The scale parameter is of length equal to unity.


Model output are saved to the FishSET database and can be loaded to the console with:

model_out_view: model output including optimization information, standard errors, coefficients, and t- statistics.
model_params: model estimates and standard error
model_fit: model comparison metrics
globalcheck_view: model error message

For obtaining catch, choice, distance, and otherdat data generated from make_model_design function. ModelInputData table will be pulled from FishSET database.

Value

OutLogit: [outmat1 se1 EPM2] (coefs, ses, tstats)
optoutput: optimization information
seoumat2: ses
MCM: Model Comparison metrics

Examples

## Not run: 
results <- discretefish_subroutine("pcod", run = 'all', select.model = TRUE)

## End(Not run)

Create dummy matrix from a coded ID variable

Description

Create dummy matrix from a coded ID variable

Usage

dummy_matrix(dat, project, x)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Variable in dat used to generate dummy matrix.

Details

Creates a dummy matrix of 1/0 with dimensions [(number of observations in dataset) x (number of factors in x)] where each column is a unique factor level. Values are 1 if the value in the column matches the column factor level and 0 otherwise.

Examples

## Not run: 
PortMatrix <- dummy_matrix(pollockMainDataTable, 'pollock', 'PORT_CODE')

## End(Not run)

Create a binary vector from numeric, date, and character or factor vectors.

Description

Create a binary vector from numeric, date, and character or factor vectors.

Usage

dummy_num(dat, project, var, value, opts = "more_less", name = "dummy_num")

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

var

Variable in dat to create dummy variable from.

value

String, value to set dummy variable by. If var is a date, value should be a year, If var is a factor, value should be a factor level. If var is numeric, value should be a single number or range of numbers [use c(1,5)].

opts

String, how dummy variable should be defined. Choices are "x_y" and "more_less’". For "x_y", each element of var is set to 1 if the element matches value, otherwise 0. For "more_less", each element of var less than value is set to 0 and all elements greater than value set to 1. If var is a factor, then elements that match value will be set to 1 and all other elements set to 0. Default is set to "more_less".

name

String, name of created dummy variable. Defaults to name of the function if not defined.

Details

For date variables, the dummy variable is defined by a date (year) and may be either year x versus all other years ("x_y") or before vs after year x ("more_less"). Use this function to create a variable defining whether or not a policy action had been implemented.
Example: before vs. after a 2008 amendment:
dummy_num('pollockMainDataTable', 'Haul_date', 2008, 'more_less', 'amend08')

For factor variables, both choices in opts compare selected factor level(s) against all other factor levels.
Example: Fishers targeting pollock vs. another species:
dummy_num('pollockMainDataTable', 'GF_TARGET_FT', c('Pollock - bottom', 'Pollock - midwater'), 'x_y', 'pollock_target')

For numeric variables, value can be a single number or a range of numbers. The dummy variable is the selected value(s) against all others (x_y) or less than the selected value versus more than the selected value (more_less). For more_less, the mean is used as the critical value if a range of values is provided.

Value

Returns primary dataset with dummy variable added.

Examples

## Not run: 
pollockMainDataTable <- dummy_num(pollockMainDataTable, 'pollock', 'Haul_date', 2008, 
  'more_less', 'amend80')

## End(Not run)

Create dummy variable

Description

Create dummy variable

Usage

dummy_var(dat, project, DumFill = 1, name = "dummy_var")

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

DumFill

Fill the dummy variable with 1 or 0

name

String, name of created dummy variable. Defaults to name of the function if not defined.

Details

Creates a dummy variable of either 0 or 1 with length of the number of rows of the data set.

Value

Primary dataset with dummy variable added.

Examples

## Not run: 
pollockMainDataTable <- dummy_var(pollockMainDataTable, 'pollock', DumFill=1, 'dummyvar')

## End(Not run)

Check variables are not empty

Description

Check for and remove empty variables from dataset. Empty variables are columns in the data that contain all NAs and/or empty strings.

Usage

empty_vars_filter(dat, project, remove = FALSE)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

remove

Logical, whether to remove empty variables. Defaults to FALSE.

Details

Function checks for empty variables and prints an outcome message to the console. If empty variables are present and remove = TRUE, then empty variables will be removed from the dataset. Empty variables are columns in the dataset that contain all NAs or empty strings.

Value

Returns the dataset with empty variables removed if remove = TRUE.

Examples

## Not run: 
# check for empty vars
empty_vars_filter(pollockMainDataTable)

# remove empty vars from data
mod.dat <- empty_vars_filter(pollockMainDataTable, 'pollock', remove = TRUE)

## End(Not run)

Expected profit model with log-normal catch function

Description

Calculate the negative log-likelihood of the expected profit model (EPM) with log-normal catch function. For more information on the EPM lognormal model see section 8.4.5 in the FishSET user manual. https://docs.google.com/document/d/1dzXsVt5iWcAQooDDXRJ3XyMoqnSmpZOqirU_f_PnQUM/edit#heading=h.ps7td88zo4ge

Usage

epm_lognormal(starts3, dat, otherdat, alts, project, expname, mod.name)

Arguments

starts3

Starting parameter values as a numeric vector. The order of parameters in the vector is:
c([catch-function params], [travel-dist params], [stdev], [common scale param]),
where the length of catch-function parameters is the # of alternatives * # of catch-function variables, length of travel-distance parameters is the # of travel-distance variables, length of standard deviation defaults to 1 but alternative- specific standard deviation values can be specified (length = # of alternatives), and the common scale parameter is a single value.

dat

Data matrix, see output from shift_sort_x, alternatives with distance.

otherdat

List that contains other data used in the model, see section 8.4.5 in the FishSET user manual for more details (link in the description above): (1) 'griddat': catch-function variables that interact with alternative-specific catch-function parameters and do not vary across alternatives (e.g., vessel gross tonnage). (2) 'intdat': travel-distance variables that interact with travel-distance parameters and the distance matrix and do not vary across alternatives. (3) 'prices': price in terms of $/landings units. This is typically a vector with prices for each observation, but can be a single value representing price for the entire dataset.

alts

Number of alternative choices in model

project

Name of project

expname

Expected catch table (optional)

mod.name

Name of model run for model result output table

Details

This function is called in discretefish_subroutine when running an EPM model with a log-normal catch function.

Value

ld: negative log likelihood


Expected profit model with normal catch function

Description

Calculate the negative log-likelihood of the expected profit model (EPM) with a normal catch function. For more information on the EPM normal model see section 8.4.3 in the FishSET user manual. https://docs.google.com/document/d/1p8mK65uG8yp-HbzCeBgtO0q6DSpKV1Zyk_ucNskt5ug/edit#heading=h.mrt9b1ee2yb8

Usage

epm_normal(starts3, dat, otherdat, alts, project, expname, mod.name)

Arguments

starts3

Starting values as a numeric vector. The order of parameters in the vector is:
c([catch-function params], [travel-dist params], [stdev], [common scale param]),
where the length of catch-function parameters is the # of alternatives * # of catch-function variables, length of travel-distance parameters is the # of travel-distance variables, length of standard deviation defaults to 1 but alternative- specific standard deviation values can be specified (length = # of alternatives), and the common scale parameter is a single value.

dat

Data matrix, see output from shift_sort_x, alternatives with distance.

otherdat

List that contains other data used in the model, see section 8.4.3 in the FishSET user manual for more details (link in the description above): (1) 'griddat': catch-function variables that interact with alternative-specific catch-function parameters and do not vary across alternatives (e.g., vessel gross tonnage). (2) 'intdat': travel-distance variables that interact with travel-distance parameters and the distance matrix and do not vary across alternatives. (3) 'prices': price in terms of $/landings units. This is typically a vector with prices for each observation, but can be a single value representing price for the entire dataset.

alts

Number of alternative choices in model

project

Name of project

expname

Expected catch table (optional)

mod.name

Name of model run for model result output table

Details

This function is called in discretefish_subroutine when running an EPM model with a normal catch function.

Value

ld: negative log likelihood


Expected profit model with Weibull catch function

Description

Calculate the negative log-likelihood of the expected profit model (EPM) with Weibull catch function. For more information on the EPM Weibull model see section 8.4.4 in the FishSET user manual. https://docs.google.com/document/d/1dzXsVt5iWcAQooDDXRJ3XyMoqnSmpZOqirU_f_PnQUM/edit#heading=h.gh3zw8f9nsdi

Usage

epm_weibull(starts3, dat, otherdat, alts, project, expname, mod.name)

Arguments

starts3

Starting parameter values as a numeric vector. The order of parameters in the vector is:
c([catch-function params], [travel-dist params], [shape params], [common scale param]),
where the length of catch-function parameters is the # of alternatives * # of catch-function variables, length of travel-distance parameters is the # of travel-distance variables, length of shape parameters defaults to 1 but alternative- specific shape parameters can be specified (length = # of alternatives), and the common scale parameter is a single value.

dat

Data matrix, see output from shift_sort_x, alternatives with distance.

otherdat

List that contains other data used in the model, see section 8.4.4 in the FishSET user manual for more details (link in the description above): (1) 'griddat': catch-function variables that interact with alternative-specific catch-function parameters and do not vary across alternatives (e.g., vessel gross tonnage). (2) 'intdat': travel-distance variables that interact with travel-distance parameters and the distance matrix and do not vary across alternatives. (3) 'prices': price in terms of $/landings units. This is typically a vector with prices for each observation, but can be a single value representing price for the entire dataset.

alts

Number of alternative choices in the model

project

Name of project

expname

Expected catch table (optional)

mod.name

Name of model run for model result output table

Details

This function is called in discretefish_subroutine when running an EPM model with a Weibull catch function.

Value

ld: negative log likelihood


Return names of expected catch matrices

Description

Return the names of expected catch matrices saved to the FishSET database.

Usage

exp_catch_names(project)

Arguments

project

Name of project.


Get Expected Catch List

Description

Returns the Expected Catch list from the FishSET database.

Usage

expected_catch_list(project, name = NULL)

Arguments

project

Name of project.

name

Name of expected catch table from the FishSET database. The table name will contain the string "ExpectedCatch". If NULL, the default table is returned. Use tables_database to see a list of FishSET database tables by project.


Explore starting value parameter space

Description

Shotgun method to find better parameter starting values by exploring starting value parameter space.

Usage

explore_startparams(project, space, dev, startsr = NULL)

Arguments

project

String, name of project.

space

List of length 1 or length equal to the number of models to be evaluated. space is the number of starting value permutations to test (the size of the space to explore). The greater the dev argument, the larger the space argument should be.

dev

List of length 1 or length equal to the number of models to be evaluated. dev refers to how far to deviate from the average parameter values when exploring (random normal deviates). The less certain the average parameters are, the greater the dev argument should be.

startsr

Optional. List, average starting value parameters for revenue/location-specific covariates then cost/distance. The best guess at what the starting value parameters should be (e.g. all ones). Specify starting value parameters for each model if values should be differetn than ones. The number of starting value parameters should correspond to the likelihood and data that you want to test.

Details

Function is used to identify better starting parameters when convergence is an issue. For more details on the likelihood functions or data, see make_model_design. Function calls the model design file and should be used after the make_model_design function is called.
If more than one model is defined in the model design file, then starting parameters must be defined for each model.

Value

Returns three data frames.

newstart: Chosen starting values with smallest likelihood
saveLLstarts: Likelihood values for each starting value permutation
savestarts: Starting value permutations (corresponding to each saved likelihood value)

Examples

## Not run: 
Example with only one model specified
results <- explore_startparams('myproject', 15, 3, rep(1,17))

Example with three models specified
results <- explore_startparams('myproject', space = list(15,10,100),
   dev=list(3,3,1), startsr=list(c(1,2,3), c(1,0, -1), c(0,0,.5)))

View results
results$startsOut

## End(Not run)

Remove rows based on filter expressions defined in 'filterTable'

Description

Remove rows based on filter expressions defined in 'filterTable'

Usage

filter_dat(dat, project, exp, filterTable = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

exp

How to filter. May be a row in the filter table generated by filter_table that contains a filter expression or the filter expression to apply to the data. If the filter expression is supplied, it should take on the form of "x < 100" or "is.na(x) == FALSE".

filterTable

Name of filter table in FishSET database. Name should contain the phrase 'filterTable'.

Details

Filter data frame based on a predefined filter expression from filter_table or a filter expression. We recommend creating a filter table using filter_table so that filter expressions are stored and easily accessed in the future.

Value

Filtered data frame

Examples

## Not run: 
newdat <- filter_dat(pcodMainDataTable, 'pcod', exp = 3, 
                     filterTable = 'pcodfilterTable01012011')
                     
newdat <- filter_dat(pcodMainDataTable, 'pcod', 
                     exp = 'PERFORMANCE_Code == 1', filteTable = NULL)
                     
newdat <- filter_dat(pcodMainDataTable, "pcod", exp = "SEASON == 'A'",
                     filterTable = NULL)

## End(Not run)

Filter out-of-sample data for model predictions

Description

Filter the out-of-sample dataset and prepare for predictions of fishing probability.

Usage

filter_outsample(
  dat,
  project,
  mod.name,
  spatial_outsample = FALSE,
  zone.dat = NULL,
  spat = NULL,
  zone.spat = NULL,
  outsample_zones = NULL,
  lon.spat = NULL,
  lat.spat = NULL,
  use.scalers = FALSE,
  scaler.func = NULL
)

Arguments

dat

Out-of-sample data

project

Name of project

mod.name

Name of saved model to use. Argument can be the name of the model or can pull the name of the saved "best" model. Leave mod.name empty to use the saved "best" model. If more than one model is saved, mod.name should be the numeric indicator of which model to use. Use table_view("modelChosen", project) to view a table of saved models.

spatial_outsample

Logical, indicate whether the data are out-of-sample spatially or not. Note that models with zone-specific coefficients (e.g., zonal logit) cannot be used to predict data that are out-of-sample spatially. spatial_outsample = FALSE can represent data out-of-sample temporally or out-of-sample based on another variable (e.g., vessel tonnage, gear type, etc.)

zone.dat

Variable in datthat identifies the individual areas or zones.

spat

Required, data file or character. spat is a spatial data file containing information on fishery management or regulatory zones boundaries. Shape, json, geojson, and csv formats are supported. geojson is the preferred format. json files must be converted into geoson. This is done automatically when the file is loaded with read_dat with is.map set to true. spat cannot, at this time, be loaded from the FishSET database.

zone.spat

Variable in spat that identifies the individual areas or zones.

outsample_zones

Vector of out-of-sample zones to filter dat. Only provided as input when running this function in the main app.

lon.spat

Required for csv files. Variable or list from spat containing longitude data. Leave as NULL if spat is a shape or json file.

lat.spat

Required for csv files. Variable or list from spat containing latitude data. Leave as NULL if spat is a shape or json file.

use.scalers

Input for create_model_input(). Logical, should data be normalized? Defaults to FALSE. Rescaling factors are the mean of the numeric vector unless specified with scaler.func.

scaler.func

Input for create_model_input(). Function to calculate rescaling factors.

Details

This function filters the out-of-sample data. If the data is out-of-sample spatially, then set spatial_outsample = TRUE and provide a spatial file (spat) and the zone id in the spatial file zone.spat. An interactive map is used for selecting out of sample zones. If the data are not spatially out-of-sample, then just filter the data for the zones included in the selected model. Note that models with zone-specific coefficients (e.g., zonal logit) cannot predict spatial out-of-sample data. Upon successful execution of filter_outsample() the filtered dataset will be saved to an RDS file in the outputs folder. This function will overwrite the existing RDS file each time it is run.

Value

Returns probability of logit model by choice


Define and store filter expressions

Description

Define and store filter expressions

Usage

filter_table(dat, project, x, exp)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

x

Variable in dat over which filter will be applied.

exp

Filter expression. Should take on the form of "x < 100" or "is.na(x) == FALSE".

Details

This function allows users to define and store data filter expressions which can then be applied to the data. The filter table will be saved in the FishSET database under the project name and 'filterTable'. The new filter functions are added each time the function is run and the table is automatically updated in the FishSET database. The function call will be logged in the log file.

Value

Filter expressions saved as a table to the FishSET database.

Examples

## Not run: 
filter_table(pcodMainDataTable, 'pcod', x = 'PERFORMANCE_Code',
             exp = 'PERFORMANCE_Code == 1')

## End(Not run)

Identify geographic centroid of fishery management or regulatory zone

Description

Identify geographic centroid of fishery management or regulatory zone

Usage

find_centroid(
  spat,
  project,
  spatID,
  lon.spat = NULL,
  lat.spat = NULL,
  cent.name = NULL,
  log.fun = TRUE
)

Arguments

spat

Spatial data containing information on fishery management or regulatory zones. Can be shape file, json, geojson, data frame, or list.

project

Name of project

spatID

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, spatID should be name of list containing information on zones.

lon.spat

Variable or list from spat containing longitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

lat.spat

Variable or list from spat containing latitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

cent.name

String, name to include in centroid table. Centroid name take the form of '"projectNameZoneCentroid"'. Defaults to 'NULL' (e.g. '"projectZoneCentroid"').

log.fun

Logical, whether to log function call (for internal use).

Details

Returns the geographic centroid of each area/zone in spat. The centroid table is saved to the FishSET database. Function is called by the create_alternative_choice and create_dist_between functions.

Value

Returns a data frame where each row is a unique zone and columns are the zone ID and the latitude and longitude defining the centroid of each zone.


Create fishing or weighted fishing centroid

Description

Create fishing or weighted fishing centroid

Usage

find_fishing_centroid(
  dat,
  project,
  zoneID,
  weight.var = NULL,
  lon.dat,
  lat.dat,
  names = NULL,
  cent.name = NULL,
  log.fun = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Name of project

zoneID

Variable in dat that identifies zonal assignments or the If spat is class sf, zoneID should be name of list containing information on zones.

weight.var

Variable from dat for weighted average. If weight.var is defined, the centroid is defined by the latitude and longitude of fishing locations in each zone weighted by weight.var.

lon.dat

Required. Longitude variable in dat.

lat.dat

Required. Latitude variable in dat.

names

Then names of the fishing centroid columns to be added. A vector of length two in the order of c("lon", "lat"). The default is c("fish_cent_lon", "fish_cent_lat") and c("weight_cent_lon", "weight_cent_lat") if weight.var is used.

cent.name

A string to include in the centroid table name. Table names take the form of '"projectNameFishCentroid"' for fishing centroids.

log.fun

Logical, whether to log function call (for internal use).

Details

Fishing centroid defines the centroid by mean latitude and longitude of fishing locations in each zone. Weighted centroid defines the centroid by the mean latitude and longitude of fishing locations in each zone weighted by the weight.var. The fishing and weighted centroid variables can be used anywhere latitude/longitude variables appear. Each observation in dat must be assigned to a fishery or regulatory area/zone. If the zone identifier exists in dat and is not called 'ZoneID', then zoneID should be the variable name containing the zone identifier. If a zone identifier variable does not exist in dat, spat must be be specified and zoneID must be zone identifier in spat. The assignment_column function will be run and a zone identifier variable added to dat.

Value

Returns primary dataset with fishing centroid and, if weight.var is specified, the weighted fishing centroid.


Compare imported data table to the previously saved version of the data table

Description

Compare imported data table to the previously saved version of the data table

Usage

fishset_compare(x, y, compare = c(TRUE, FALSE), project)

Arguments

x

Updated data table to be saved.

y

Previously saved version of data table.

compare

Logical, if TRUE, compares x to y before saving x to FishSET database.

project

Name of project

Details

Function is optional. It is designed to check for consistency between versions of the same data frame so that the logged functions can be used to rerun the previous analysis on the updated data. The column names, including spelling and capitalization, must match the previous version to use the logged functions to rerun code after data has been updated (i.e., new year of data). The function is called by the data import functions (load_maindata, load_port, load_aux, load_grid). Set the compare argument to TRUE to compare column names of the new and previously saved data tables. The new data tables will be saved to the FishSET database if column names match. Set the compare argument to FALSE if no previous versions of the data table exist in the FishSET database. No comparison will be made and the new file will be saved to the database.


Show all SQL Tables in FishSET Folder

Description

Returns a data frame containing all tables from each project by project name and table type.

Usage

fishset_tables(project = NULL)

Arguments

project

Project name. If NULL, tables from all available projects will be displayed.

Examples

## Not run: 
# return all tables for all projects
fishset_tables()

# return all tables for a specific project
fishset_tables("pollock")

## End(Not run)

Create fleet variable using fleet definition table

Description

Add a fleet ID column to the main data using a fleet table (see fleet_table for details).

Usage

fleet_assign(
  dat,
  project,
  fleet_tab,
  assign = NULL,
  overlap = FALSE,
  format_var = "string"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

fleet_tab

String, name of the fleet table stored in FishSET database. Should contain the string 'FleetTable'.

assign

Integer, a vector of row numbers from fleet_tab. Only fleet definitions in these rows will be used and added to 'MainDataTable'. If assign = NULL (the default), all fleet definitions in the table will be used.

overlap

Logical; whether overlapping fleet assignments are allowed. Defaults to FALSE.

format_var

String. If format_var = "string", a single column named "fleet" will be added to 'MainDataTable'. If overlap = TRUE, observations with multiple fleet assignments are duplicated. format_var ="dummy" outputs a binary column for each fleet in the fleet table. Defaults to "string".

Value

Returns the primary dataset with added fleet variable(s).

See Also

fleet_table

Examples

## Not run: 
fleet_assign(pollockMainDataTable, 'pollock', fleet_tab = 'pollockFleetTable', 
             overlap = TRUE)

## End(Not run)

Define and store fleet expressions

Description

fleet_table saves a table of fleet expression to the FishSET database which can then be applied to a dataset with fleet_assign. The table must contain a 'condition' and 'fleet' column with each row corresponding to a set of expressions that will be used to assign observations to fleets. A table can be created with the cond and fleet_val arguments or by uploading an existing table that matches the format requirements. See 'Details' below for examples of how tables can be formatted.

Usage

fleet_table(
  dat,
  project,
  cond = NULL,
  fleet_val = NULL,
  table = NULL,
  save = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

cond

String; a vector containing valid R expressions saved as strings. Must be used with the fleet_val argument. Each expression should be in quotes (double or single) with nested quotes indicated with escaped quotes (\') or with the opposite quote used to contain the expression. For example, "species == 'cod'" and "species == \'cod\'" are both valid.

fleet_val

String; a vector of fleet names to be assigned. Must be used with the cond argument.

table

A data frame that has one condition column and one fleet column. See 'Details' for table formatting.

save

Logical; whether to save the current fleet_table to the FishSET database. Defaults to TRUE. Tables are saved in the format of 'projectFleetTable'. Each project can only have one fleet table. New fleet definitions are appended to the exiting fleet table. See table_remove to delete a table.

Details

Below is a simple example of a fleet table. For a fleet table to be created, it must contain one "condition" column and one "fleet" column. Each fleet definition can be as long as necessary. For example, the first expression in the condition column example could also be "GEAR == 8 & species == 'pollock'". Use the '&' operator when combining expressions.

condition fleet
'GEAR == 8' 'A'
'species == "cod"' 'B'
'area %in% c(640, 620)' 'C'

Value

Returns a table of fleet conditions that is saved to the FishSET database with the name 'projectFleetTable'.

Examples

## Not run:  
fleet_table("MainDataTable", "myProject", 
            cond = c("GEAR == 8", "species == 'cod'", "area %in% c(640, 620)"),
            fleet_val = c("A", "B", "C"), save = TRUE
            ) 

## End(Not run)

Format Gridded Data

Description

Change the format of a gridded dataset from wide to long (or vice versa) and remove any unmatched area/zones from grid. This is a necessary step for including gridded variables in the conditional logit (logit_c()) model.

Usage

format_grid(
  grid,
  dat,
  project,
  dat.key,
  area.dat,
  area.grid = NULL,
  id.cols,
  from.format = "wide",
  to.format = "wide",
  val.name = NULL,
  save = FALSE
)

Arguments

grid

Gridded dataset to format.

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Name of project.

dat.key

String, name of column(s) in MainDataTable to join by. The number of columns must match id.cols.

area.dat

String, the name of the area or zone column in dat.

area.grid

String, the name of the area or zone column in dat if from.format = "long". Ignored if from.format = "wide".

id.cols

String, the names of columns from grid that are neither area (area.grid) or value (val.name) columns, for example date or period column(s).

from.format

The original format of grid. Options include "long" or "wide". Use "long" if a single area column exists in grid. Use "wide" if grid contains a column for each area.

to.format

The desired format of grid. Options include "long" or "wide". Use "long" if you want a single area column with a corresponding value column. Use "wide" if you would like each area to have its own column.

val.name

Required if converting from wide to long or long to wide format. When from.format = "wide" and to.format = "long", val.name will be the name of the new value variable associated with the area column. When from.format = "long" and to.format = "wide", val.name will be the name of the existing value variable associated with the area column.

save

Logical, whether to save formatted grid. When TRUE, the table will be saved with the string "Wide" or "Long" appended depending on the value of to.format.

See Also

merge_dat()


Reformat out-of-sample model coefficients

Description

Reformat out-of-sample model coefficients by removing zones not included in the out-of-sample dataset

Usage

format_outsample_coefs(in_zones, out_zones, Eq, likelihood)

Arguments

in_zones

Vector of zoneIDs in the in-sample dataset

out_zones

Vector of zoneIDs in the out-of-sample dataset

Eq

Tibble containing estimated model coefficients (including standard errors and t-values)

likelihood

Character, name of the likelihood

Value

Return a list with (1) vector of coefficients (zones not in the out-of-sample dataset removed) and (2) flag indicating if the first alt (in-sample dataset) is not included in the out-of-sample dataset.


Display summary of function calls

Description

Display summary of function calls

Usage

function_summary(project, date = NULL, type = "dat_load", show = "all")

Arguments

project

Project name.

date

Character string; the date of the log file (" retrieve. If NULL the most recent log is pulled.

type

The type of function to display. "dat_load", "dat_quality", "dat_create", "dat_exploration", "fleet", and "model".

show

Whether to display "all" calls, the "last" (most recent) call, or the "first" (oldest) function call from the log file.

Details

Displays a list of functions by type and their arguments from a log file. If no date is entered the most recent log file is pulled.

See Also

filter_summary

Examples

## Not run: 
function_summary("pollock")

## End(Not run)

Retrieve closure scenario by project

Description

Retrieve closure scenario by project

Usage

get_closure_scenario(project)

Arguments

project

Name of project.

Examples

## Not run: 
get_closure_scenario("pollock")

## End(Not run)

Return cached confidentiality tables

Description

This function lists the confidentiality "check" tables used to suppress values.

Usage

get_confid_cache(project, show = "all")

Arguments

project

Name of project

show

Output "all" tables, "last" table, or "first" table.

Value

A list of tables containing suppression conditions.

See Also

reset_confid_cache


Return the confidentiality settings

Description

This function returns the confidentiality settings from project settings file.

Usage

get_confid_check(project)

Arguments

project

Name of project

Value

A list containing the confidentiality parameters: check, v_id, rule, and value.

See Also

set_confid_check get_proj_settings


Retrieve grid log file

Description

Retrieves the grid log file for a project. The grid log shows which grid files are currently saved to the project data folder.

Usage

get_grid_log(project)

Arguments

project

Name of project.

Details

The grid log is a list containing information about the grid files currently saved to the project data folder. Each grid entry contains three fields: grid_name, closure_name, and combined_areas. grid_name is the name of the original grid object. If the other two fields are empty, this means that the grid file has not been altered and is the same as the original. closure_name is the name of a second grid file containing closure areas that were combined with grid_name. combined_areas are the names/IDs of the closures areas from the closure grid file that were combined with grid_name.

Examples

## Not run: 
get_grid_log("pollock")

## End(Not run)

Pull data from latest project file

Description

Pull data from latest project file

Usage

get_latest_projectfile(project, mod.name)

Arguments

project

Project name

mod.name

Model name

Examples

## Not run: 
get_latest_projectfile("pollock", "logit_mod1")

## End(Not run)

Retrieve project settings

Description

Retrieve project settings

Usage

get_proj_settings(project, format = FALSE)

Arguments

project

Name of project.

format

Logical, output project settings using pander. Useful for markdown documents.

Details

The project settings file includes confidentiality settings, the user output folder directory, and the default plot saving size.


Calculate and view Getis-Ord statistic

Description

Wrapper function to calculate global and local Getis-Ord by discrete area

Usage

getis_ord_stats(
  dat,
  project,
  varofint,
  zoneid,
  spat,
  cat,
  lon.dat = NULL,
  lat.dat = NULL,
  lon.spat = NULL,
  lat.spat = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

varofint

Numeric variable in dat to test for spatial high/low clustering.

zoneid

Variable in dat that identifies the individual zones or areas. Define if exists in dat and is not named 'ZoneID'. Defaults to NULL.

spat

Spatial data containing information on fishery management or regulatory zones. See load_spatial.

cat

Variable in spat defining the individual areas or zones.

lon.dat

Longitude variable in dat.Require if zoneid is not defined.

lat.dat

Latitude variable in dat. Require if zoneid is not defined.

lon.spat

Variable or list from spat containing longitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

lat.spat

Variable or list from spat containing latitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

Details

Calculates the degree, within each zone, that high or low values of the varofint cluster in space. Function utilizes the localG and knearneigh functions from the spdep package. The spatial input is a row-standardized spatial weights matrix for computed nearest neighbor matrix, which is the null setting for the nb2listw function. Requires a data frame with area as a factor, the lon/lat centroid for each area, the lat/lon outlining each area, and the variable of interest (varofint) or a map file with lat/lon defining boundaries of area/zones and variable of interest for weighting. Also required is the lat/lon defining the center of a zone/area. If the centroid is not included in the map file, then find_centroid can be called to calculate the centroid of each zone. If the variable of interest is not associated with an area/zone then the assignment_column function can be used to assign each observation to a zone. Arguments to identify centroid and assign variable of interest to area/zone are optional and default to NULL.

Value

Returns a plot and table. Both are saved to the output folder.

Examples

## Not run: 
getis_ord_stats(pcodMainDataTable, project = 'pcod', varofint = 'OFFICIAL_MT_TONS',
  spat = spatdat, lon.dat = 'LonLat_START_LON', lat.dat = 'LonLat_START_LAT', cat = 'NMFS_AREA')

## End(Not run)

View error output from discrete choice model for the defined project

Description

Returns error output from running the discretefish_subroutine function. The table argument must be the full name of the table name in the FishSET database. Use tables_databaseto view table names in FishSET database.

Usage

globalcheck_view(table, project)

Arguments

table

Table name in FishSET database. Should contain the project, the phrase 'LDGlobalCheck', and a date in YMD format (20200101). Table name must be in quotes.

project

Name of project

Examples

## Not run: 
globalcheck_view('pcodLDGlobalCheck20190604', 'pcod')

## End(Not run)

Create a within-group running sum variable

Description

Create a within-group running sum variable

Usage

group_cumsum(
  dat,
  project,
  group,
  sort_by,
  value,
  name = "group_cumsum",
  create_group_ID = FALSE,
  drop_total_col = FALSE
)

Arguments

dat

Primaryy data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

String, project name.

group

String, the grouping variable(s) to sum value by. Used to create the "group_total" variable.

sort_by

String, a date variable to order 'MainDataTable' by.

value

String, the value variable used to calculate cumulative sum. Must be numeric.

name

String, the name for the new variable. Defaults to "group_cumsum".

create_group_ID

Logical, whether to create a group ID variable using ID_var. Defaults to FALSE.

drop_total_col

Logical, whether to remove the "group_total" variable created to calculate percentage. Defaults to FALSE.

Details

group_cumsum sums value by group, then cumulatively sums within groups. For example, a running sum by trip variable can be made by entering variables that identify unique vessels and trips into group and a numeric variable (such as catch or # of hauls) into value. Each vessel's trip total is calculated then cumulatively summed. The "group_total" variable gives the total value by group and can be dropped by setting drop_total_col = TRUE. A group ID column can be created using the variables in group by setting create_group_ID = TRUE.

Examples

## Not run: 
group_cumsum(pollockMainDataTable, "pollock", group = c("PERMIT", "TRIP_ID"),
             sort_by = "HAUL_DATE", value = "OFFICIAL_TOTAL_CATCH")

## End(Not run)

Create a within-group lagged difference variable

Description

Create a within-group lagged difference variable

Usage

group_diff(
  dat,
  project,
  group,
  sort_by,
  value,
  name = "group_diff",
  lag = 1,
  create_group_ID = FALSE,
  drop_total_col = FALSE
)

Arguments

dat

Primary data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

String, project name.

group

String, the grouping variable(s) to sum value by. Used to create the "group_total" variable.

sort_by

String, a date variable to order 'MainDataTable' by.

value

String, the value variable used to calculate lagged difference. Must be numeric.

name

String, the name for the new variable. Defaults to "group_diff".

lag

Integer, adjusts lag length. Defaults to 1.

create_group_ID

Logical, whether to create a group ID variable using ID_var. Defaults to FALSE.

drop_total_col

Logical, whether to remove the "group_total" variable created to calculate percentage. Defaults to FALSE.

Details

group_diff creates a grouped lagged difference variable. value is first summed by the variable(s) in group, then the difference within-group is calculated. The "group_total" variable gives the total value by group and can be dropped by setting drop_total_col = TRUE. A group ID column can be created using the variables in group by setting create_group_ID = TRUE.

Examples

## Not run: 
group_diff(pollockMainDataTable, "pollock", group = c("PERMIT", "TRIP_ID"),
           sort_by = "HAUL_DATE", value = "HAUL")

## End(Not run)

Create a within-group percentage variable

Description

Create a within-group percentage variable

Usage

group_perc(
  dat,
  project,
  id_group,
  group = NULL,
  value,
  name = "group_perc",
  create_group_ID = FALSE,
  drop_total_col = FALSE
)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

String, project name.

id_group

String, primary grouping variable(s). Used to create the "total_value" variable which sums value by id_group. If group = NULL, then value is divided by "total_value".

group

String, secondary grouping variable(s). Used to create the "group_total" variable which sums value by id_group and group. Percentage is calculated by dividing "group_total" by "total_value". Defaults to NULL.

value

String, the value variable used to calculate percentage. Must be numeric.

name

String, the name for the new variable. Defaults to "group_perc".

create_group_ID

Logical, whether to create a group ID variable using ID_var. Defaults to FALSE.

drop_total_col

Logical, whether to remove the "total_value" and "group_total" variables created to calculate percentage. Defaults to FALSE.

Details

group_perc creates a within-group percentage variable using a primary group ID (id_group) and secondary group (group). The total value of id_group is stored in the "total_value" variable, and the within-group total stored in "group_total". The group percentage is calculated using these two function-created variables. "total_value" and "group_total" can be dropped by setting drop_total_col = TRUE. A group ID column can be created using the variables inid_group and group by setting create_group_ID = TRUE.

Examples

## Not run: 
group_perc(pollockMainDataTable, "pollock", id_group = "PERMIT", group = NULL, 
           value = "OFFICIAL_TOTAL_CATCH_MT")
           
group_perc(pollockMainDataTable, "pollock", id_group = "PERMIT",
           group = "DISEMBARKED_PORT", value = "HAUL")

## End(Not run)

Collapse data frame from haul to trip

Description

Collapse data frame from haul to trip

Usage

haul_to_trip(
  dat,
  project,
  fun.numeric = mean,
  fun.time = mean,
  tripID,
  haul_count = TRUE,
  log_fun = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

fun.numeric

How to collapse numeric or temporal data. For example, min, mean, max, sum. Defaults to mean.

fun.time

How to collapse temporal data. For example, min, mean, max. Cannot be sum for temporal variables.

tripID

Column(s) that identify the individual trip.

haul_count

Logical, whether to return a column of the number of hauls per trip.

log_fun

Logical, whether to log function call (for internal use).

Details

Collapses primary dataset from haul to trip level. Unique trips are defined based on selected column(s), for example, landing permit number and disembarked port. This id column is used to collapse the data to trip level. fun.numeric and fun.time define how multiple observations for a trip are collapsed. For variables that are not numeric or dates, the first observation is used.

Value

Returns the primary dataset where each row is a trip.

Examples

## Not run: 
pollockMainDataTable <- haul_to_trip("pollockMainDataTable","pollock",
    min, mean, "PERMIT", "DISEMBARKED_PORT"
    )

## End(Not run)

Create ID variable

Description

Create ID variable from one or more variables

Usage

ID_var(
  dat,
  project,
  vars,
  name = NULL,
  type = "string",
  drop = FALSE,
  sep = "_",
  log_fun = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

vars

Character string, additional column(s) in dat that define unique observations.

name

String, name of new ID column.

type

String, the class type of the new ID column. Choices are 'string“ or 'integar'. 'string' returns a character vector where each column in vars is combined and separated by sep. 'integer' returns an integer vector where each value corresponds to a unique group in vars.

drop

Logical, whether to drop columns in vars.

sep

Symbol used to combined variables.

log_fun

Logical, whether to log function call (for internal use).

Details

ID variable can be based on a single or multiple variables. Use sep = TRUE if dropping variables that create the ID variable.

Value

Returns the 'MainDataTable' with the ID variable included.

Examples

## Not run: 
pcodMainDataTable <- ID_var(pcodMainDataTable, "pcod", name = "PermitID", 
        vars = c("GEAR_TYPE", "TRIP_SEQ"), type = 'integar')
pcodMainDataTable <- ID_var(pcodMainDataTable, "pcod", name = "PermitID", 
        vars = c("GEAR_TYPE", "TRIP_SEQ"), type = 'string', sep="_")

## End(Not run)

Insert plot from user folder

Description

Insert plot from user folder

Usage

insert_plot(out, project)

Arguments

out

String, plot file name.

project

Name of project.

Examples

## Not run: 
insert_plot("pollock_plot.png")

## End(Not run)

Insert table from user folder

Description

Insert table from user folder

Usage

insert_table(out, project)

Arguments

out

String, table file name.

project

Name of project.

Examples

## Not run: 
insert_table("pollock_table.csv")

## End(Not run)

Jitter longitude and latitude variables

Description

Jitter longitude and latitude variables

Usage

jitter_lonlat(dat, project, lon, lat, factor = 1, amount = NULL)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Project name.

lon

String, variable name containing longitude.

lat

String, variable name containing latitude.

factor

Numeric, see jitter for details.

amount

Numeric, see jitter for details. Default (NULL): factor * d/5 where d is about the smallest difference between x values.

Details

This is one of the FishSET confidentiality functions. It "jitters" longitude and latitude using the base R function jitter.

Examples

## Not run: 
jitter_lonlat(pollockMainDataTable, "pollock",
              lon = "LonLat_START_LON", lat = "LonLat_START_LAT")

## End(Not run)

View list of all log files

Description

View list of all log files

Usage

list_logs(project = NULL, chron = FALSE, modified = FALSE)

Arguments

project

Project name. Displays all logs if NULL.

chron

Logical, whether to display logs in chronological order (TRUE) or reverse chronological order (FALSE).

modified

Logical, whether to include date modified.


Display FishSET database tables by type

Description

Show project table names by table type. To see all tables for all projects in the FishSETFolder, use fishset_tables.

Usage

list_tables(project, type = "main")

Arguments

project

A project name to show main tables by.

type

the type of fishset_db table to search for. Options include "main" (MainDataTable), "port" (PortTable), "spat" (SpatTable), "grid" (GridTable), "aux" (AuxTable) "ec" (ExpectedCatch), "altc" (AltMatrix), "info" (MainDataTableInfo), "gc" (ldglobalcheck), "fleet" (FleetTable), "filter" (FilterTable), "centroid" (Centroid or FishCentroid), "model" (ModelOut), "model data" or "model design" (ModelInputData), "outsample" (OutSampleDataTable).

Examples

## Not run: 
list_tables("pollock", type = "main")
list_tables("pollock", "ec")

## End(Not run)

Import, parse, and save auxiliary data to FishSET database

Description

Auxiliary data is additional data that connects the primary dataset. Function pulls the data, parses it, and then and saves the data to the FishSET database. A project must exist before running load_aux(). See load_maindata to create a new project.

Usage

load_aux(dat, aux, name, over_write = TRUE, project = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

aux

File name, including path of auxiliary data.

name

Name auxiliary data should be saved as in FishSET database.

over_write

Logical, If TRUE, saves data over previously saved data table in the FishSET database.

project

String, name of project.

Details

Auxiliary data is any additional data beyond the primary data and the port data. Auxiliary data can be any data that can be merged with the primary dataset (ex. prices by date, vessel characteristics, or fishery season). The auxiliary data does not have to be at a haul or trip level but must contain a variable to connect the auxiliary data to the primary dataset. The function checks that at least one column name of the auxiliary data matches a column name in the primary dataset. The function checks that each row is unique, that no variables are empty, and that column names are case-insensitive unique. There data issues are resolved before the data is saved to the database. The data is saved in the FishSET database as the raw data and the working data. The naming convention for auxiliary tables is "projectNameAuxTable". Date is also added to the name for the raw data. See table_view to view/load auxiliary tables into the working environment.

See Also

table_view, load_maindata, write_dat

Examples

## Not run: 
load_aux(pcodMainDataTable, name = 'FisherySeason', over_write = TRUE, 
         project = 'pcod')

## End(Not run)

Load data from FishSET database into the R environment

Description

Load data from FishSET database into the R environment

Usage

load_data(project, name = NULL)

Arguments

project

String, name of project.

name

Optional, name of table in FishSET database. Use this argument if pulling raw or dated table (not the working table).

Details

Pulls the primary data table from the FishSET database and loads it into the working environment as the project and MainDataTable. For example, if the project was pollock, then data would be saved to the working environment as 'pollockMainDataTable'.

Value

Data loaded to working environment as the project and ‘MainDataTable’.

Examples

## Not run: 
load_data('pollock')

load_data('pollock', 'pollockMainDataTable20190101')

## End(Not run)

Import, parse, and save gridded data to FishSET database

Description

Gridded data is data that varies by two dimensions. Column names must be zone names. Load, parse, and save gridded data to FishSET database. A project must exist before running load_grid(). See load_maindata to create a new project.

Usage

load_grid(dat, grid, name, over_write = TRUE, project = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

grid

File name, including path, of gridded data.

name

Name gridded data should be saved as in FishSET database.

over_write

Logical, If TRUE, saves dat over previously saved data table in the FishSET database.

project

String, name of project.

Details

Grid data is an optional data frame that contains a variable that varies by the map grid (ex. sea surface temperature, wind speed). Data can also vary by a second dimension (e.g., date/time). Both dimensions in the gridded data file need to be variables included in the primary data set. The grid locations (zones) must define the columns and the optional second dimension defines the rows. The row variable must have the exact name as the variable in the main data frame that it will be linked to. The function DOES NOT check that column and row variables match a variable in the primary data set. The function checks that each row is unique, that no variables are empty, and that column names are case-insensitive unique. These data issues are resolved before the data is saved to the database. The data is saved in the FishSET database as the raw data and the working data. In both cases, the table name is the project and the file name x. Date is attached to the name for the raw data. The naming convention for gridded tables is "projectNameGridTable". See table_view to view/load gridded tables into the working environment.

See Also

table_view, load_maindata, write_dat

Examples

## Not run: 
load_grid(dat = 'pcodMainDataTable', name = 'SeaSurfaceTemp', 
          over_write = TRUE, project = 'pcod')

## End(Not run)

Import, parse, and save data to the FishSET Database

Description

load_maindata() saves the main dataset to the FishSET Database (located in the FishSETFolder) and is a required step. The main data will also be loaded into the working environment as a dataframe named "projectMainDataTable". Running load_maindata() creates a new project directory in the FishSETFolder. To see a list of existing projects run projects() or open the FishSETFolder.

Usage

load_maindata(dat, project, over_write = FALSE, compare = FALSE, y = NULL)

Arguments

dat

Primary data containing information on hauls or trips. This can be the full path to the file, the name of a main table in the FishSET database, or a dataframe object in the working environment. Main tables in the FishSET database contain the string 'MainDataTable'. A complete list of FishSET tables can be display by running fishset_tables().

project

String, name of project. Cannot contain spaces.

over_write

Logical, If TRUE, saves data over previously saved data table in the FishSET database. Defaults to FALSE.

compare

Logical, whether to compare new dataframe to previously saved dataframe y. See fishset_compare.

y

Name of previously saved table in FishSET Database. y must be defined if compare = TRUE.

Details

The dataset is saved in the FishSET database as raw and working tables. The table name is the project and the table type, 'MainDataTable'. The raw table is the original, unedited table. The working table contains any changes made to the table after uploading. An eight digit date string is included in the name of the raw table (e.g. "pollockMainDataTable20220210"). The main data is loaded into the working environment as ‘projectMainDataTable’. The fishset_compare argument compares dat to an existing FishSET table in y and returns a message noting basic differences between the two. The column names are checked for case-insensitivity and uniqueness.

See Also

save_dat, write_dat, load_data, fishset_tables

Examples

## Not run: 
# upload data from filepath
load_maindata(dat = "PATH/TO/DATA", project = "pollock")

# upload from dataframe in working environment
load_maindata(dat = Mydata, project = 'pollock', over_write = TRUE, 
              compare = TRUE, y = 'MainDataTable01012011')
              
# upload from an exisitng FishSET main data table
looad_maindata(dat = "pollockMainDataTable", project = "pollock2020")

## End(Not run)

Import, parse, and save out-of-sample data to FishSET database

Description

load_outsample() saves out-of-sample dataset to the FishSET Database (located in the FishSETFolder) and the structure must match the main dataset. A project must exist before running load_outsample(). See load_maindata to create a new project. Note: if the data are out-of-sample temporally then upload a new datafile, if the data are only out-of-sample spatially then upload the main data file in this function.

Usage

load_outsample(dat, project, over_write = FALSE, compare = FALSE, y = NULL)

Arguments

dat

Out-of-sample data containing information on hauls or trips with same structure as the main data table. This can be the full path to the file, the name of a out-of-sample table in the FishSET database, or a dataframe object in the working environment. Out-of-sample tables in the FishSET database contain the string 'OutSampleDataTable'. A complete list of FishSET tables can be viewed by running fishset_tables().

project

String, name of project.

over_write

Logical, If TRUE, saves data over previously saved data table in the FishSET database. Defaults to FALSE.

compare

Logical, whether to compare new dataframe to previously saved dataframe y. See fishset_compare.

y

Name of previously saved table in FishSET Database. y must be defined if compare = TRUE.

Details

The out-of-sample dataset is saved in the FishSET database as raw and working tables. The table name is the project and the table type, 'OutSampleDataTable'. The raw table is the original, unedited table. The working table contains any changes made to the table after uploading. An eight digit date string is included in the name of the raw table (e.g. "pollockOutSampleDataTable20220210"). The out-of-sample data is loaded into the working environment as ‘projectOutSampleDataTable’. The fishset_compare argument compares dat to an existing FishSET table in y and returns a message noting basic differences between the two. The column names are checked for case-insensitivity and uniqueness.

See Also

load_maindata, save_dat, write_dat, load_data, fishset_tables

Examples

## Not run: 
# upload data from filepath
load_outsample(dat = "PATH/TO/DATA", project = "pollock")

# upload from dataframe in working environment
load_outsample(dat = MyData, project = 'pollock', over_write = TRUE, 
              compare = TRUE, y = 'OutSampleDataTable01012011')
              
# upload from an exisitng FishSET out-of-sample data table
load_outsample(dat = "pollockOutSampleDataTable", project = "pollock")

## End(Not run)

Import, parse, and save port data to FishSET database

Description

A project must exist before running load_port(). See load_maindata to create a new project.

Usage

load_port(
  dat,
  port_name,
  project,
  over_write = TRUE,
  compare = FALSE,
  y = NULL
)

Arguments

dat

Dataset containing port data. At a minimum, must include three columns, the port names, and the latitude and longitude of ports. dat can be a filepath, a existing FishSET table, or a dataframe in the working environment.

port_name

Variable containing port names. Names should match port names in primary dataset.

project

String, name of project.

over_write

Logical, if TRUE, saves over data table previously saved in the FishSET database.

compare

Logical, should new data be compared to previously saved dataframe y.

y

Name of previously saved table in FishSET database. y must be defined if compare is TRUE.

Details

Runs a series of checks on the port data. The function checks that each row is unique, that no variables are empty, and that column names are case-insensitive unique. There data issues are resolved before the data is saved to the database. If checks pass, runs the fishset_compare function and saves the new data frame to the FishSET database. The data is saved in the FishSET database as the raw data and the working data. The naming convention for port tables is "projectPortTable". Date is also attached to the name for the raw data. See table_view to view/load port tables into the working environment.

See Also

table_view, load_maindata, write_dat

Examples

## Not run: 
load_port(PortTable, over_write = TRUE, project  ='pollock',
          compare = TRUE, y = 'pollockPortTable01012011')

## End(Not run)

Import, parse, and save spatial data

Description

Saves a spatial table to the FishSETFolder as a geojson file. A project must exist before running load_spatial(). See load_maindata to create a new project.

Usage

load_spatial(
  spat,
  name = NULL,
  over_write = TRUE,
  project,
  data.type = NULL,
  lon = NULL,
  lat = NULL,
  id = NULL,
  ...
)

Arguments

spat

File name, including path, of spatial data.

name

Name spatial data should be saved as in FishSET project folder. Cannot be empty or contain spaces.

over_write

Logical, If TRUE, saves spat over previously saved data table in the FishSET project folder.

project

String, name of project.

data.type

Data type argument passed to read_dat. If reading from a shape folder use data.type = "shape".

lon

Variable or list from spat containing longitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

lat

Variable or list from spat containing latitude data. Required for csv files. Leave as NULL if spat is a shape or json file

id

Polygon ID column. Required for csv files. Leave as NULL if spat is a shape or json file.

...

Additional argument passed to read_dat.

Details

Function to import, parse, and saved project folder in 'FishSETFolder' directory. To export as shape file, use write_dat specifying ‘type=’shp''. load_spatial() performs basic quality check before saving spatial tables to the project data folder as a geojson file. To be saved, the spatial must pass the checks in check_spatdat. The spatial table is converted to an sf object, and checked for unique rows and empty columns. The naming convention for spatial tables is "projectNameSpatTable". See table_view to view/load spatial tables into the working environment.

See Also

table_view, load_maindata, write_dat

Examples

## Not run: 
# upload from filepath
load_spatial(spat = "FILE/PATH/TO/SPAT", name = 'tenMinSqr', 
             over_write = TRUE, project = 'pcod')

# upload from object in working environment
load_spatial(spat = NMFSAreas, name = "NMFS", project = "pcod")

# upload from an existing FishSET spatial table
load_spatial(spat = "pcodNMFSSpatTable", name = "NMFS", project = "pcod2020")

## End(Not run)

Log user-created functions or models

Description

Log user-created functions or models

Usage

log_func_model(x, project)

Arguments

x

Name of function.

project

Project name.

Details

Logs function name, arguments, and, call. Use this function to log user-defined likelihood functions.

Examples

## Not run: 
my_func <- function(a, b) {
  a + b
}
log_func_model(my_func)

## End(Not run)

Console function for rerunning project log

Description

Console function for rerunning project log

Usage

log_rerun(
  log_file,
  dat = NULL,
  portTable = NULL,
  aux = NULL,
  gridfile = NULL,
  spat = NULL,
  ind = NULL,
  run = FALSE
)

Arguments

log_file

String, name of the log file starting with the date (YYYY-MM-DD) and ending in ".json".

dat

String, new main data table to rerun log

portTable

String, name of port table. Defualts to NULL.

aux

String, name of auxiliary table. Defaults to NULL.

gridfile

String, name of gridded data table. Defaults to NULL.

spat

String, name of spatial data table. Defaults to NULL.

ind

Numeric, indices of function calls to rerun.

run

Logical, whether to run the logged function calls (TRUE) or simply list all function calls (FALSE).

See Also

log_rerun_gui

Examples

## Not run: 
log_rerun("pollock_2020-10-23.json", run = TRUE) # reruns entire log with original data table
# runs log with new data table
log_rerun("pollock_2020-10-23.json", dat = "pollockMainDataTable", run = TRUE) 

## End(Not run)

Interactive function for rerunning project log

Description

Interactive function for rerunning project log

Usage

log_rerun_gui()

See Also

log_rerun

Examples

## Not run: 
log_rerun_gui()

## End(Not run)

Reset log file

Description

Reset log file

Usage

log_reset(project, over_write = FALSE)

Arguments

project

Project name.

over_write

Logical, whether to over write an existing log file. This only applies if a log was created and reset in the same day for the same project. See "Details".

Details

Logs are saved by project name and date (date created, not date modified). For example, "pollock_2021-05-12.json". Calls to log functions are automatically appended to the existing project log file. Resetting the log file will create a new project log file with the current date. A log will not be reset if log_reset() is run the same day the log was created (or if the log is reset two or more times in a single day), unless over_write = TRUE. This will replace that day's log file.

See Also

list_logs project_logs

Examples

## Not run: 
log_reset("pollock")

## End(Not run)

Conditional logit likelihood

Description

Conditional logit likelihood

Usage

logit_c(starts3, dat, otherdat, alts, project, expname, mod.name)

Arguments

starts3

Starting values as a vector (num). For this likelihood, the order takes: c([alternative-specific parameters], [travel-distance parameters]).

The alternative-specific parameters and travel-distance parameters are of length (# of alternative-specific variables) and (# of travel-distance variables) respectively.

dat

Data matrix, see output from shift_sort_x, alternatives with distance.

otherdat

Other data used in model (as a list containing objects 'intdat' and 'griddat').

For this likelihood, ‘intdat' are ’travel-distance variables', which are alternative-invariant variables that are interacted with travel distance to form the cost portion of the likelihood. Each variable name therefore corresponds to data with dimensions (number of observations) by (unity), and returns a single parameter.

In ‘griddat' are ’alternative-specific variables', that vary across alternatives, e.g. catch rates. Each variable name therefore corresponds to data with dimensions (number of observations) by (number of alternatives), and returns a single parameter for each variable (e.g. the marginal utility from catch).

For both objects any number of variables are allowed, as a list of matrices. Note the variables (each as a matrix) within 'griddat' and ‘intdat' have no naming restrictions. ’Alternative-specific variables' may correspond to catches that vary by location, and 'travel-distance variables' may be vessel characteristics that affect how much disutility is suffered by traveling a greater distance. Note in this likelihood 'alternative-specific variables' vary across alternatives because each variable may have been estimated in a previous procedure (i.e. a construction of expected catch).

If there are no other data, the user can set 'griddat' as ones with dimension (number of observations) by (number of alternatives) and 'intdat' variables as ones with dimension (number of observations) by (unity).

alts

Number of alternative choices in model as length equal to unity (as a numeric vector).

project

Name of project

expname

Expected catch table

mod.name

Name of model run for model result output table

Value

ld: negative log likelihood

Graphical examples

Figure: logit\_c\_grid.png
Figure: logit\_c\_travel.png

Examples

## Not run: 
data(zi)
data(catch)
data(choice)
data(distance)
data(si)

optimOpt <- c(1000,1.00000000000000e-08,1,0)

methodname <- 'BFGS'

kk <- 4

si2 <- matrix(sample(1:5,dim(si)[1]*kk,replace=TRUE),dim(si)[1],kk)
zi2 <- sample(1:10,dim(zi)[1],replace=TRUE)

otherdat <- list(griddat=list(predicted_catch=as.matrix(predicted_catch),
    si2=as.matrix(si2)), intdat=list(zi=as.matrix(zi),
    zi2=as.matrix(zi2)))

initparams <- c(2.5, 2, -1, -2)

func <- logit_c

results <- discretefish_subroutine(catch,choice,distance,otherdat,
    initparams,optimOpt,func,methodname)

## End(Not run)

Full information model with Dahl's correction function

Description

Full information model with Dahl's correction function

Usage

logit_correction(starts3, dat, otherdat, alts, project, expname, mod.name)

Arguments

starts3

Starting values as a vector (num). For this likelihood, the order takes: c([marginal utility from catch], [catch-function parameters], [polynomial starting parameters], [travel-distance parameters], [catch sigma]).

The number of polynomial interaction terms is currently set to 2, so given the chosen degree 'polyn' there should be (((polyn+1)*2) + 2)*(k) polynomial starting parameters, where (k) equals the number of alternatives. The marginal utility from catch and catch sigma are of length equal to unity respectively. The catch-function and travel-distance parameters are of length (# of catch variables)*(k) and (# of cost variables) respectively.

dat

Data matrix, see output from shift_sort_x, alternatives with distance.

otherdat

Other data used in model (as a list containing objects 'griddat', 'intdat', 'startloc', 'polyn', and 'distance').

For catch-function variables ('griddat') alternative-invariant variables that are interacted with zonal constants to form the catch portion of the likelihood. Each variable name therefore corresponds to data with dimensions (number of observations) by (unity), and returns (k) parameters where (k) equals the number of alternatives. For travel-distance variables alternative-invariant variables that are interacted with travel distance to form the cost portion of the likelihood. Each variable name therefore corresponds to data with dimensions (number of observations) by (unity), and returns a single parameter. Any number of catch-function and travel-distance variables are allowed, as a list of matrices. Note the variables (each as a matrix) within 'griddat' and 'intdat' have no naming restrictions.

Catch-function variables may correspond to variables that affect catches across locations, or travel-distance variables may be vessel characteristics that affect how much disutility is suffered by traveling a greater distance. Note in this likelihood the catch-function variables vary across observations but not for each location: they are allowed to affect catches across locations due to the location-specific coefficients. If there are no other data, the user can set catch-function variables as ones with dimension (number of observations) by (number of alternatives) and travel-distance variables as ones with dimension (number of observations) by (unity).

The variable startloc is a matrix of dimension (number of observations) by (unity), that corresponds to the starting location when the agent decides between alternatives.

The variable polyn is a vector of length equal to unity corresponding to the chosen polynomial degree.

The variable distance is a matrix of dimension (number of observations) by (number of alternatives) corresponding to the distance to each alternative.

alts

Number of alternative choices in model as length equal to unity (as a numeric vector).

project

Name of project

expname

Expected catch table

mod.name

Name of model run for model result output table

Value

ld: negative log likelihood

Graphical examples

Figure: logit\_correction\_grid.png
Figure: logit\_correction\_travel.png
Figure: logit\_correction\_poly.png

Examples

## Not run: 
data(zi)
data(catch)
data(choice)
data(distance)
data(si)
data(startloc)

optimOpt <- c(1000,1.00000000000000e-08,1,0)

methodname <- 'BFGS'

polyn <- 3
kk <- 4

si2 <- sample(1:5,dim(si)[1],replace=TRUE)
zi2 <- sample(1:10,dim(zi)[1],replace=TRUE)

otherdat <- list(griddat=list(si=as.matrix(si),si2=as.matrix(si2)),
    intdat=list(zi=as.matrix(zi),zi2=as.matrix(zi2)),
    startloc=as.matrix(startloc),polyn=polyn,
    distance=as.matrix(distance))

initparams <- c(3, 0.5, 0.4, 0.3, 0.2, 0.55, 0.45, 0.35, 0.25,
    rep(0, (((polyn+1)*2) + 2)*kk), -0.3,-0.4, 3)

func <- logit_correction

results <- discretefish_subroutine(catch,choice,distance,otherdat,
    initparams,optimOpt,func,methodname)

## End(Not run)

Zonal logit with area-specific constants procedure

Description

Zonal logit with area-specific constants procedure

Usage

logit_zonal(starts3, dat, otherdat, alts, project, expname, mod.name)

Arguments

starts3

Starting values as a vector (num). For this likelihood, the order takes: c([area-specific parameters], [travel-distance parameters]).

The area-specific parameters and travel-distance parameters are of length (# of area-specific parameters)*(k-1) and (# of travel-distance variables respectively, where (k) equals the number of alternatives.

dat

Data matrix, see output from shift_sort_x, alternatives with distance.

otherdat

Other data used in model (as a list containing objects 'intdat' and 'griddat').

For this likelihood, ‘intdat' are ’travel-distance variables', which are alternative-invariant values that are interacted with travel distance to form the cost portion of the likelihood. Each variable name therefore corresponds to data with dimensions (number of observations) by (unity), and returns a single parameter.

In ‘griddat' are ’area-specific parameters' that do not vary across alternatives, e.g. vessel gross tonnage. Each constant name therefore corresponds to data with dimensions (number of observations) by (unity), and returns (k-1) parameters where (k) equals the number of alternatives, as a normalization of parameters is needed as the probabilities sum to one. Interpretation is therefore relative to the first alternative.

For both objects any number of variables are allowed, as a list of matrices. Note the variables (each as a matrix) within 'griddat' and ‘intdat' have no naming restrictions. ’Area-specific parametes ' may correspond to variables that impact average catches by location, or 'travel-distance variables' may be vessel characteristics that affect how much disutility is suffered by traveling a greater distance. Note in this likelihood the 'area-specific parameters' vary across observations but not for each location: they are allowed to affect alternatives differently due to the location-specific coefficients.

If there are no other data, the user can set 'griddat' as ones with dimension (number of observations) by (unity) and 'intdat' variables as ones with dimension (number of observations) by (unity).

alts

Number of alternative choices in model as length equal to unity (as a numeric vector).

project

Name of project

expname

Expected catch table

mod.name

Name of model run for model result output table

Value

ld: negative log likelihood

Graphical examples

Figure: logit\_zonal\_grid.png
Figure: logit\_zonal\_travel.png

Examples

## Not run: 
data(zi)
data(catch)
data(choice)
data(distance)
data(si)

optimOpt <- c(1000,1.00000000000000e-08,1,0)

methodname <- 'BFGS'

si2 <- sample(1:5,dim(si)[1],replace=TRUE)
zi2 <- sample(1:10,dim(zi)[1],replace=TRUE)

otherdat <- list(griddat=list(si=as.matrix(si),si2=as.matrix(si2)),
    intdat=list(zi=as.matrix(zi),zi2=as.matrix(zi2)))

initparams <- c(1.5, 1.25, 1.0, 0.9, 0.8, 0.75, -1, -0.5)

func <- logit_zonal

results <- discretefish_subroutine(catch,choice,distance,otherdat,
    initparams,optimOpt,func,methodname)

## End(Not run)

Assign longitude and latitude points to zonal centroid

Description

Assign longitude and latitude points to zonal centroid

Usage

lonlat_to_centroid(dat, project, lon, lat, spat, zone)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Project name.

lon

String, variable name containing longitude.

lat

String, variable name containing latitude.

spat

Spatial data table containing regulatory zones. This can be a "spatial feature" or sf object.

zone

String, column name contain the assigned zone. Must be the same for both the spatial data table and MainDataTable.

Details

This is one of the FishSET confidentiality functions. It replaces the selected longitude and latitude columns with the zonal centroid derived from a spatial data table.

Examples

## Not run: 
lonlat_to_centroid(pollockMainDataTable, "pollock", spatdat, 
                  lon = "LonLat_START_LON", lat = "LonLat_START_LAT",
                  zone = "NMFS_AREA")

## End(Not run)

View list of MainDataTables in FishSET database

Description

View list of MainDataTables in FishSET database

Usage

main_tables(project, show_all = TRUE)

Arguments

project

A project name to filter main tables by.

show_all

Logical, whether to show all main tables (including raw and final tables) or just editable tables.

Examples

## Not run: 
main_tables("pollock")

## End(Not run)

Make model design file

Description

Create a list containing likelihood function, parameters, and data to be pass to model call function

Usage

make_model_design(
  project,
  catchID,
  likelihood = NULL,
  initparams = NULL,
  optimOpt = c(100, 1e-08, 1, 1),
  methodname = "BFGS",
  mod.name = NULL,
  vars1 = NULL,
  vars2 = NULL,
  priceCol = NULL,
  expectcatchmodels = list("all"),
  startloc = NULL,
  polyn = NULL,
  crs = NULL,
  outsample = FALSE,
  CV_dat = NULL
)

Arguments

project

String, name of project.

catchID

String, variable from dat that contains catch data.

likelihood

String, name of likelihood function. A description of explanatory variables for each likelihood is provided below in the details sections. Information on likelihood- specific initial parameter specification can be found in discretefish_subroutine() documentation.

logit_c: Conditional logit likelihood
logit_zonal: Zonal logit with area-specific constants procedure
logit_correction: Full information model with Dahl's correction function
epm_normal: Expected profit model with normal catch function
epm_weibull: Expected profit model with Weibull catch function
epm_lognormal: Expected profit model with lognormal catch function
initparams

Vector or list, initial parameter estimates for revenue/location-specific covariates then cost/distance. The number of parameter estimate varies by likelihood function. See Details section for more information. The initial parameters will be set to 1 if initparams == NULL. If initparams is a single numeric value, it will be used for each parameter. If using parameter estimates from previous model, initparams should be the name of the model the parameter estimates should come from. Examples: initparams = 'epm_mod1', initparams = list('epm_mod1', 'epm_mod2').

optimOpt

String, optimization options (max function evaluations, max iterations, (reltol) tolerance of x, trace) Note: add optim reference here?.

methodname

String, optimization method (see stats::optim() options). Defaults to "BFGS".

mod.name

String, name of model run for model result output table.

vars1

Character string, additional ‘travel-distance’ variables to include in the model. These depend on the likelihood. See the Details section for how to specify for each likelihood function.

vars2

Character string, additional variables to include in the model. These depend on the likelihood. See the Details section for how to specify for each likelihood function. For likelihood = 'logit_c', vars2 should be the name of the gridded table saved to the FishSET Database, and should contain the string "GridTableWide". See format_grid() for details.

priceCol

Variable in dat containing price information. Required if specifying an expected profit model for the likelihood (epm_normal, epm_weibull, epm_lognormal).

expectcatchmodels

List, name of expected catch models to include in model run. Defaults to all models. Each list item should be a string of expected catch models to include in a model. For example, list(c('recent', 'older'), c('user1')) would run one model with the medium and long expected catch matrices, and one model with just the user-defined expected catch matrix. Choices are "recent", "older", "oldest", "logbook", "all", and "individual". See create_expectations() for details on the different models. Option "all" will run all expected catch matrices jointly. Option "individual" will run the model for each expected catch matrix separately. The final option is to select one more expected catch matrices to run jointly.

startloc

Variable in dat identifying the location when choice of where to fish next was made. Required for logit_correction likelihood. Use the create_startingloc() function to create the starting location vector.

polyn

Numeric, correction polynomial degree. Required for logit_correction() likelihood.

crs

coordinate reference system to be assigned when creating the distance matrix. Passed on to create_dist_matrix().

outsample

Logical, indicates whether the model design is for main data (FALSE) or out-of-sample data (TRUE). The default is outsample = FALSE.

CV_dat

Dataframe that contains training or testing data for k-fold cross validation. Defaults to CV_dat = NULL.

Details

Function creates the model matrix list that contains the data and modeling choices. The model design list is saved to the FishSET database and called by the discretefish_subroutine(). Alternative fishing options come from the Alternative Choice list, generated from the create_alternative_choice() function, and the expected catch matrices from the create_expectations() function. The distance from the starting point to alternative choices is calculated.

Variable names details:

vars1 vars2
logit_c:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"alternative-specific variables"
    vary across alternatives, e.g. catch rates.
    Each variable name therefore corresponds to data
    with dimensions (number of observations) by
    (number of alternatives), and returns a single
    parameter for each variable (e.g. the marginal
    utility from catch).
logit_zonal:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"average-catch variables" are
    alternative-invariant variables, e.g. vessel
    gross tonnage. Each variable name therefore
    corresponds to data with dimensions (number of
    observations) by (unity), and returns (k-1)
    parameters where (k) equals the number of
    alternatives, as a normalization of parameters
    is needed as the probabilities sum to one.
    Interpretation is therefore relative to the
    first alternative.
epm_normal:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the
    cost portion of the likelihood. Each variable
    name therefore corresponds to
    data with dimensions (number of observations)
    by (unity), and returns a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the
    catch portion of the likelihood. Each variable
    name therefore corresponds to data with
    dimensions (number of observations) by (unity),
    and returns (k) parameters where (k) equals
    the number of alternatives.
epm_lognormal:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the
    cost portion of the likelihood. Each variable
    name therefore corresponds to data with
    dimensions (number of observations) by (unity),
    and returns a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the
    catch portion of the likelihood. Each variable
    name therefore corresponds to data with
    dimensions (number of observations) by (unity),
    and returns (k) parameters where (k) equals
    the number of alternatives.
epm_weibull:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the catch
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    (k) parameters where (k) equals the number of
    alternatives.
logit_correction:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the catch
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    (k) parameters where (k) equals the number of
    alternatives.

Value

Function creates the model matrix list that contains the data and modeling choices. The model design list is saved to the FishSET database and called by the discretefish_subroutine(). Alternative fishing options come from the ⁠Alternative Choice⁠ list, generated from the create_alternative_choice() function, and the expected catch matrices from the create_expectations() function. The distance from the starting point to alternative choices is calculated.

Model design list:

likelihood: Name of likelihood function
catch: Data corresponding to actual zonal catch
catchID: Character for the name of the variable with catch data
choice: Data corresponding to actual zonal choice
initparms: Initial parameter values
optimOpt: Optimization options
methodname: Optimization method
mod.name: Model name for referencing
vars1: Character vector for variables with 'travel-distance' variables
vars2: Character vector for additional variables
priceCol: Variable in dat with price information
mod.date: Date the model was designed
startingloc: starting locations
scales: Scale vectors to put catch data, zonal data, and other data on same scale
distance: Data corresponding to distance
instances: Number of observations
alts: Number of alternative zones
epmDefaultPrice: Price data
dataZoneTrue: Vector of 0/1 indicating whether the data from that zone is to be included based on the minimum number of hauls.
typeOfNecessary: Whether data is at haul or trip level
altChoiceType: Function choice. Set to distance
altChoiceUnits: Units of distance
occasion: The choice occasion
occasion_var: Character for variable with choice occasion
alt_choice: Alternative choice matrix
bCHeader: Variables to include in the model that do not vary by zone. Includes independent variables and interactions
gridVaryingVariables: Variables to include in the model that do vary by zone such as expected catch (from create_expectations() function)
startloc: Variable in dat identifying location when choice of where to fish next was made
polyn: Numeric, correction polynomial degree
spat: A spatial data file
spatID: Variable in spat that identifies areas or zones
crs: coordinate reference system
gridVaryingVariables: Area-specific variables
expectcatchmodels: List of expected catch matrices

Examples

## Not run: 
make_model_design("pollock", catchID= "OFFICIAL_TOTAL_CATCH",  
  likelihood='logit_zonal', 
  vars1=NULL, vars2=NULL, initparams=c(-0.5,0.5),
  optimOpt=c(100000, 1.0e-08, 1, 1), methodname = "BFGS", mod.name = "logit4"
)

## End(Not run)

Add an area/polygon to spatial data

Description

Add an area/polygon to spatial data

Usage

make_spat_area(spat, project, coord, spat.id, new.id, combine)

Arguments

spat

Spatial dataset to add polygon too.

project

Name of project.

coord

Longitude and latitude coordinates forming a polygon. Can be a numeric vector of even length or a numeric matrix with two columns.

spat.id

The ID column in spat

new.id

The ID for new polygon.

combine

Whether to use combine_zone. This will turn the intersections between poly and spat into new polygons. Note that the new polygon IDs will be derived from spat and new.id will not be used.

Details

Adds an area/polygone to a spatial area

See Also

make_polygon add_polygon


Kernel density (hotspot) plot

Description

Kernel density (hotspot) plot

Usage

map_kernel(
  dat,
  project,
  latlon,
  type = "contours",
  group = NULL,
  facet = FALSE,
  date = NULL,
  filter_date = NULL,
  filter_value = NULL,
  minmax = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

latlon

Character string, specified as latitude then longitude, in decimal degrees.

type

String, plot type. Choices are "point", "contours", or "gradient". Note if you have a group, you must facet when choosing "gradient" (cannot overlap polygons clearly).

group

Optional group argument. Should be a factor with length of (# of observations), where each observation corresponds to the latlon coordinate of the same index. Recall that the legend will output the names of factor levels as you have named them (see ?factor).

facet

Optional facet parameter. TRUE if mapping each group as a separate facet. Defaults to FALSE.

date

Optional date variable to filter data by.

filter_date

Whether to filter data table by "year", "month", or "year-month". date and filter_value must be provided. Defaults to NULL.

filter_value

Integer (4 digits if year, 1-2 if month). The year, month, or year-month to filter data table by. Use a list if using "year-month", with the format: list(year(s), month(s)). For example, list(2011:2013, 5:7) will filter the data table from May to July, 2011-2013.

minmax

Optional map extent argument, a vector (num) of length 4 corresponding to c(minlat, maxlat, minlon, maxlon).

Value

Returns ggplot2 object. Map plot saved to Output folder.

Examples

## Not run: 
map_kernel(pollockMainDataTable, project = 'pollock', type = 'contours',
latlon = c('LonLat_START_LAT', 'LonLat_START_LON'), group = 'PORT_CODE',
facet = TRUE, minmax = NULL, date = 'FISHING_START_DATE',
filter_date = 'year-month', filter_value = list(2011, 2:4))

## End(Not run)

Map observed vessel locations

Description

Plot observed locations on a map. For large datasets, it is best to plot a subset of points. Use percshown to randomly subset the number of points. If the predefined map extent needs adjusting, use minmax.

Usage

map_plot(dat, project, lat, lon, minmax = NULL, percshown = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, project name.

lat

Variable in dat that defines latitude, in decimal degrees.

lon

Variable in dat that defines longitude, in decimal degrees.

minmax

Optional map extent argument, a vector (num) of length four corresponding to c(minlat, maxlat, minlon, maxlon).

percshown

Whole number, percent of points to show. Use this option if there are a lot of data points.

Value

mapout: ggplot2 object

Examples

## Not run: 
map_plot(pollockMainDataTable, 'pollock', 'LonLat_START_LAT', 'LonLat_START_LON', percshown=10)

## End(Not run)

Interactive vessel locations and fishery zones map

Description

View vessel locations and fishery zones on interactive map.

Usage

map_viewer(
  dat,
  project,
  spat,
  avd,
  avm,
  num_vars,
  temp_vars,
  id_vars,
  lon_start,
  lat_start,
  lon_end = NULL,
  lat_end = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

spat

Spatial data containing information on fishery management or regulatory zones. Shape, json, geojson, and csv formats are supported.

avd

Variable name in the primary data file that gives the unique ID associated to the polygon.

avm

The name of the property in the GeoJson file that identifies the polygon to cross reference to dat. Variable name in the spatial file that represents the unique ID.

num_vars

List, name of numeric variable(s) in dat to include for plotting.

temp_vars

List, name of temporal variable(s) in dat to include for plotting.

id_vars

List, name of categorical variable(s) in dat to group by.

lon_start

String, variable in dat that identifies a single longitude point or starting longitude decimal degrees.

lat_start

String, variable in dat that identifies a single latitude point or starting latitude decimal degrees.

lon_end

String, variable in dat that identifies ending longitude decimal degrees.

lat_end

String, variable in dat that identifies ending latitude decimal degrees.

Details

The map_viewer function creates the files required to run the MapViewer program. Users can map points or trip path. To plot points, leave lon_end and lat_end and NULL. After creating the inputs, a map with zones is opened in the default web browser. To close the server connection run servr::daemon_stop() in the console. Lines on the map represent the starting and ending lat/long for each observation in the data set color coded based on the selected variable. It can take up to a minute for the data to be loaded onto the map. At this time, the map can only be saved by taking a screen shot.

Examples

## Not run: 
# Plot trip path
map_viewer(scallopMainDataTable, 'scallop', "scallopTMSSpatTable", 
           avd = 'ZoneID', avm = 'TEN_ID', num_vars = 'LANDED_thousands', 
           temp_vars = 'DATE_TRIP', lon_start = 'previous_port_lon', 
           lat_start = 'previous_port_lat', lon_end = 'DDLON', 
           lat_end = 'DDLAT')
   
# Plot observed fishing locations        
map_viewer(scallopMainDataTable, 'scallop', "scallopTMSSpatTable", 
           avd = 'ZoneID', avm = 'TEN_ID', num_vars = 'LANDED_thousands', 
           temp_vars = 'DATE_TRIP', lon_start = 'DDLON', lat_start = 'DDLAT')

#Plot haul path
map_viewer(pollockMainDataTable, 'pollock', spat=spatdat, avd='NMFS_AREA',
avm='NMFS_AREA', num_vars=c('HAUL','OFFICIAL_TOTAL_CATCH'),
temp_vars='HAUL_DATE', id_vars=c('GEAR_TYPE', 'PORT'), 
       'Lon_Start', 'Lat_Start', 'Lon_End', 'Lat_End')

#Plot haul midpoint
map_viewer(pollockMainDataTable, 'pollock', spat=spatdat, avd='NMFS_AREA',
avm='NMFS_AREA', num_vars=c('HAUL','OFFICIAL_TOTAL_CATCH'),
temp_vars='HAUL_DATE', id_vars=c('GEAR_TYPE', 'PORT'), 'Lon_Mid', 'Lat_Mid')

## End(Not run)

Merge data tables using a left join

Description

Merge data tables using a left join

Usage

merge_dat(
  dat,
  other,
  project,
  main_key,
  other_key,
  other_type,
  merge_type = "left"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

other

A second data table to join to MainDataTable. Use string if referencing a table saved in the FishSET database.

project

Project name.

main_key

String, name of column(s) in MainDataTable to join by. The number of columns must match other_key.

other_key

String, name of column(s) in the other table to join by. The number of columns must match main_key.

other_type

String, the type of secondary data being merged. Options include "aux" (auxiliary), "grid" (gridded), "spat" (spatial), and "port".

merge_type

String, the type of merge to perform. "left" keeps all rows from dat and merges shared rows from other. "full" keeps all rows from each table.

Details

This function merges two datasets using a left join: all columns and rows from the MainDataTable are kept while only matching columns and rows from the secondary table are joined.

Examples

## Not run: 
 pollockMainDataTable <- 
    merge_dat("pollockMainDataTable", "pollockPortTable", "pollock", 
              main_key = "PORT_CODE", other_key = "PORT_CODE")

## End(Not run)

Merge expected catch

Description

Merge expected catch matrices to the primary dataset.

Usage

merge_expected_catch(
  dat,
  project,
  zoneID,
  date,
  exp.name,
  new.name = NULL,
  ec.table = NULL,
  log_fun = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

zoneID

zone ID Variable in dat that identifies the individual zones or areas.

date

Date variable used to create the expected catch matrix.

exp.name

Name(s) of expected catch matrix to merge into dat.

new.name

Optional, new name for exp.name. These should be in the same order as exp.name.

ec.table

Optional, the name of a specific expected catch table to use. Defaults to projectnameExpectedCatch.

log_fun

For internal use. Whether to log the function call.

Value

Merges an expected catch matrix created using create_expectations() to the primary dataset, dat.


Check if meta file exists for a project

Description

Check if meta file exists for a project

Usage

meta_file_exists(project)

Arguments

project

Project name.

Value

TRUE if project meta file exists, FALSE if not.


Print meta tables by project and/or type

Description

Print meta tables by project and/or type

Usage

meta_tables(project, tab.type = NULL)

Arguments

project

Name of project.

tab.type

String, table type. Optional, used to filter output. Options include "main", "spat" (spatial), "port", "grid" (gridded), and "aux" (auxiliary).


Import, create, and edit metadata

Description

metadata_gui allows users to import metadata from various file types, create and save new metadata, and edit metadata in a shiny application. Metadata is stored in the user's project folder.

Usage

metadata_gui()

Details

The app has two tabs: "Create" and "Edit". The Create tab allows users to create new metadata for a selected FishSET table. When a table is loaded, the app creates several text boxes that the user can fill. There are four metadata sections: About, Column Description, Contact Info, and Other.

About
  • Author The author of the data.

  • Date created The date data was created.

  • Date modified The last data the data was modified.

  • Version The current version of the data.

  • Confidentiality Whether the data contains confidential information.

Column Description

A text box for each column in the data. Include the data type, unit, and values (if categorical)

.

Contact Info
  • Person The primary contact.

  • Organization The primary contact's organization.

  • Address The primary contact's and/or organization's address.

  • Phone The primary contact's work phone number.

  • Email The primary contact's work email.

Other
  • License License for data.

  • Citation Citation for data.

  • Other Other relevant information.

Users can also import a metadata file from the Create tab, for example, an XML, CSV, or JSON file. This gets saved as "raw" metadata and is separate from the user-created metadata. To see a comprehensive list of accepted file types, see parse_meta and read_dat. To extract metadata from a data file (i.e. the data and metadata are both in the same file, but the metadata is not contained within the data itself), use the Reader parameters text box to selectively parse the file (see parse_meta for details).

The Edit tab allows users to view, edit, and/or delete metadata saved to FishSET.

Examples

## Not run: 
metadata_gui()

## End(Not run)

Get Model Design List

Description

Returns the Model Design list from the FishSET database.

Usage

model_design_list(project, name = NULL)

Arguments

project

Name of project.

name

Name of Model Design list in the FishSET database. The table name will contain the string "ModelInputData". If NULL, the default table is returned. Use tables_database to see a list of FishSET database tables by project.


Design hold-out model

Description

Use selected model design settings to create a model design for hold-out data. The hold-out data can be out-of-sample data or subsetted data for k-fold cross validation.

Usage

model_design_outsample(
  project,
  mod.name,
  outsample.mod.name = NULL,
  CV = FALSE,
  CV_dat = NULL,
  use.scalers = FALSE,
  scaler.func = NULL
)

Arguments

project

Name of project

mod.name

Name of saved model to use. Argument can be the name of the model or can pull the name of the saved "best" model. Leave mod.name empty to use the saved "best" model. If more than one model is saved, mod.name should be the numeric indicator of which model to use. Use table_view("modelChosen", project) to view a table of saved models.

outsample.mod.name

Name assigned to out-of-sample model design. Must be unique and not already exist in model design list. If outsample.mod.name = NULL then a default name will be chosen based on mod.name, which is the default value.

CV

Logical, Indicates whether the model design is being created for cross validation TRUE, or for simple out- of-sample dataset. Defaults to CV = TRUE.

CV_dat

Training or testing dataset for k-fold cross validation.

use.scalers

Input for create_model_input(). Logical, should data be normalized? Defaults to FALSE. Rescaling factors are the mean of the numeric vector unless specified with scaler.func.

scaler.func

Input for create_model_input(). Function to calculate rescaling factors.

Details

This function automatically pulls model settings from the selected model and creates an alternative choice matrix, expected catch/revenue matrices, and model design for a hold-out dataset. The hold-out data set can be an out-of-sample dataset or subset of main data for cross validation. If running out-of-sample data, this function requires that a filtered out-of-sample data file (.rds file) exists in the output folder. For cross validation, this function is called in the cross_validation() function. Note: the out-of-sample functions only work with a single selected model at a time. To run out-of-sample functions on a new out-of-sample dataset, start with load_outsample() if an entirely new dataset or filter_outsample().

Examples

## Not run: 

# For out-of-sample dataset
model_design_outsample("scallop", "scallopModName")


## End(Not run)

Load model comparison metrics to console for the defined project

Description

Load model comparison metrics to console. Metrics are displayed for each model that was fun. Metrics produced by discretefish_subroutine.

Usage

model_fit(project, CV = FALSE)

Arguments

project

String, name of project.

CV

Logical, CV = TRUE to get model fit for training data in k-fold cross validation routine.

Examples

## Not run: 
model_fit('pollock')

## End(Not run)

Return model names

Description

Returns model names saved to to the model design file.

Usage

model_names(project)

Arguments

project

Name of project.


Load discrete choice model output to console for the defined project

Description

Returns output from running the discretefish_subroutine function. The table parameter must be the full name of the table name in the FishSET database.

Usage

model_out_view(project, CV = FALSE)

Arguments

project

Name of project

CV

Logical, CV = TRUE when viewing model output from training data in k-fold cross validation

Details

Returns output from running discretefish_subroutine. The table argument must be the full name of the table name in the FishSET database. Output includes information on model convergence, standard errors, t-stats, etc.

Examples

## Not run: 
model_out_view('pcod')

## End(Not run)

Load model parameter estimates, standard errors, and t-statistic to console for the defined project

Description

Returns parameter estimates, standard errors, and t-statistic from running the discretefish_subroutine function. The table parameter must be the full name of the table name in the FishSET database.

Usage

model_params(project, output = "list")

Arguments

project

Name of project

output

Options include list, table, or print.

Details

Returns parameter estimates from running discretefish_subroutine. The table argument must be the full name of the table name in the FishSET database.

Examples

## Not run: 
model_params('pcod')

## End(Not run)

Calculate and view Moran's I statistic

Description

Wrapper function to calculate global and local Moran's I by discrete area.

Usage

moran_stats(
  dat,
  project,
  varofint,
  zoneid,
  spat,
  cat,
  lon.dat = NULL,
  lat.dat = NULL,
  lon.spat = NULL,
  lat.spat = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MaindataTable'.

project

String, name of project.

varofint

Numeric variable from dat to test for spatial autocorrelation.

zoneid

Variable in dat that identifies the individual zones or areas. Define if exists in dat and is not named 'ZoneID'. Defaults to NULL.

spat

Spatial data containing information on fishery management or regulatory zones. Shape, json, geojson, and csv formats are supported.

cat

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, cat should be name of list containing information on zones.

lon.dat

Longitude variable from dat.

lat.dat

Latitude variable from dat.

lon.spat

Variable or list from spat containing longitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

lat.spat

Variable or list from spat containing latitude data. Required for csv files. Leave as NULL if spat is a shape or json file.

Details

Measure degree of spatial autocorrelation. Function utilizes the localmoran and knearneigh functions from the spdep package. The spatial input is a row-standardized spatial weights matrix for computed nearest neighbor matrix, which is the null setting for the nb2listw function. The function requires a map file with lat/lon defining boundaries of area/zones and varofint for to test for spatial autocorrelation. If zonal centroid is not included in the map file, then the find_centroid function is called to calculate the centroid of each zone. If the variable of interest is not associated with an area/zone then assignment_column is called to assign each observation to a zone. Arguments to identify centroid and assign variable of interest to area/zone are optional and default to NULL.

Value

Returns a plot and map of Moran’s I. Output is saved to the Output folder.

Examples

## Not run: 
moran_stats(pcodMainDataTable, project='pcod', varofint='OFFICIAL_MT_TONS',
spat=spatdat, lon.dat='LonLat_START_LON', lat.dat ='LonLat_START_LAT', cat='NMFS_AREA')

## End(Not run)

Identify, remove, or replace NAs and NaNs

Description

Replaces NAs and NaNs in the primary data with the chosen value or removes rows containing NAs and NaNs.

Usage

na_filter(
  dat,
  project,
  x = NULL,
  replace = FALSE,
  remove = FALSE,
  rep.value = "mean",
  over_write = FALSE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Character string. Column(s) in dat in which to remove or replace NAs.

replace

Logical, if TRUE, replaces NAs in a vector with rep.value. Defaults to FALSE.

remove

Logical, if TRUE removes the entire row of the dat where NA is present in a dat. Defaults to FALSE.

rep.value

Value to replace all NAs in a numeric column. Defaults to the mean value of the column. Other options include "median" or a numeric value, e.g. rep.value = 0.

over_write

Logical, If TRUE, saves data over previously saved data table in the FishSET database.

Details

To check for NAs across dat run the function specifying only dat (na_filter(dataset, project)). The function will return a statement of which variables, if any, contain NAs. To remove NAs, use remove = TRUE. All rows containing NAs in x will be removed from dat. To replace NAs, use replace = TRUE. If replace = FALSE and rep.value is not defined, then NAs are replaced with mean value. The modified dataset will be returned if replace = TRUE or remove = TRUE. If both replace and remove are TRUE then replace is used. Save the modified data table to the FishSET database by setting over_write = TRUE).

Value

If replace and remove are FALSE then a statement of whether NAs are found is returned. If either replace or remove is TRUE the modified primary dataset is returned.

Examples

## Not run: 
na_filter(pcodMainDataTable, 'pcod', 'OFFICIAL_TOTAL_CATCH_MT')

mod.dat <- na_filter(pcodMainDataTable, 'pcod', 'OFFICIAL_TOTAL_CATCH_MT', 
                     replace = TRUE)
                     
mod.dat <- na_filter(pcodMainDataTable,'pcod', 'OFFICIAL_TOTAL_CATCH_MT',
                     replace = TRUE, rep.value = 0)
                     
mod.dat <- na_filter(pcodMainDataTable, 'pcod',
                     c('OFFICIAL_TOTAL_CATCH_MT', 'CATCH_VALUE'), 
                     remove = TRUE)

## End(Not run)

Check for unique and syntatcic column names

Description

Used for creating new columns.

Usage

name_check(dat, names, repair = FALSE)

Arguments

dat

Dataset that will contain new columns.

names

New names to be added to dataset.

repair

Logical, whether to return repaired column names (repair = TRUE) or just check for unique column names (repair = FALSE).

Details

name_check() first checks to see if new column names are unique and returns an error if not. When repair = TRUE, name_check() will check for unique column names and returns new column names that are unique and syntactic (see vec_as_names for details).


Identify, remove, or replace NaNs

Description

Replaces NaNs in the primary data with the chosen value or removes rows containing NaNs

Usage

nan_filter(
  dat,
  project,
  x = NULL,
  replace = FALSE,
  remove = FALSE,
  rep.value = "mean",
  over_write = FALSE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Character string of variables to remove or replace NaNs.

replace

Logical, If TRUE, NaNs are replaced. Defaults to FALSE.

remove

Logical, if TRUE, removes the entire row of the dataset where NaN is present. Defaults to FALSE.

rep.value

Value to replace all NaNs in a numeric column. Defaults to the mean value of the column. Other options include "median" or a numeric value, e.g. rep.value = 0.

over_write

Logical, If TRUE, saves data over previously saved data table in the FishSET database. Defaults to FALSE.

Details

To check for NaNs across dat run the function specifying only dat (nan_filter(dataset, project)). The function will return a statement of which variables, if any, contain NaNs. To remove NaNs, use remove = TRUE. All rows containing NaNs in x will be removed from dat. To replace NaNs, use replace = TRUE. If both replace and remove are TRUE then replace is used. If replace is FALSE and rep.value is not defined, then NaNs are replaced with mean value. The modified dataset will be returned if replace = TRUE or remove = TRUE. Save the modified data table to the FishSET database by setting over_write = TRUE).

Value

If replace and remove are FALSE then a statement of whether NaNs are found is returned. If either replace or remove is TRUE the modified primary dataset is returned.

Examples

## Not run: 
nan_filter(pcodMainDataTable, 'pcod', 'OFFICIAL_TOTAL_CATCH_MT')

mod.dat <- nan_filter(pcodMainDataTable, 'pcod', 'OFFICIAL_TOTAL_CATCH_MT', 
                      replace = TRUE)
                      
mod.dat <- nan_filter(pcodMainDataTable, 'pcod', 'OFFICIAL_TOTAL_CATCH_MT',
                      replace = TRUE, rep.value = 0)
                      
mod.dat <- nan_filter(pcodMainDataTable, 'pcod', 'OFFICIAL_TOTAL_CATCH_MT', 
                      remove = TRUE)

## End(Not run)

Identify NaNs and NAs

Description

Check whether any columns in the primary dataset contain NAs or NaNs. Returns column names containing NAs or NaNs.

Usage

nan_identify(dat, project)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

Details

Check whether any columns in the primary dataset contain NAs or NaNs.

Value

Returns names of columns containing NAs or NaNs, if any.

See Also

na_filter and nan_filter

Examples

## Not run: 
nan_identify(pcodMainDataTable, "pcod")

## End(Not run)

Create one or more binned frequency tables

Description

Create one or more binned frequency, relative frequency, or density table.

Usage

nfreq_table(
  dataset,
  var,
  group = NULL,
  bins = 30,
  type = "dens",
  v_id = NULL,
  format_lab = "decimal",
  format_tab = "wide"
)

Arguments

dataset

Primary data containing information on hauls or trips. Table in FishSET database should contain the string 'MainDataTable'.

var

String, name of numeric variable to bin.

group

String, name of variable(s) to group var by.

bins

Integer, the number of bins to create.

type

String, the type of binned frequency table to create. "freq" creates a frequency table, "perc" creates a relative frequency table, and "dens" creates a density table.

v_id

String, name of vessel ID column (used to detect confidential information).

format_lab

Formatting option for bin labels. Options include "decimal" or "scientific".

format_tab

Format table "wide" or "long"


Shape file for NMFS fishing zones

Description

Simple feature collection with 25 features and 2 fields

Format

shape file


Boxplot to assess outliers

Description

Boxplot to assess outliers

Usage

outlier_boxplot(dat, project, x = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Variables in dat to check for outliers. Leave as x = NULL to plot all numeric variables. To specify multiple variables use c('var1', 'var2')

Details

Creates a visual representation of five summary statistics: median, two hinges (first and third quartiles), two whiskers (extends to 1.5*IQR where IQR is the distance between the first and third quartiles. "Outlying" points, those beyond the two whiskers (1.5*IQR) are shown individually.

Value

Box and whisker plot for all numeric variables. Saved to 'output' folder.


Evaluate outliers in plot format

Description

Visualize spread of data and measures to identify outliers.

Usage

outlier_plot(
  dat,
  project,
  x,
  dat.remove = "none",
  sd_val = NULL,
  x.dist = "normal",
  date = NULL,
  group = NULL,
  pages = "single",
  output.screen = FALSE,
  log_fun = TRUE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

x

Variable in dat to check for outliers.

dat.remove

Outlier measure. Values outside the measure are removed. Users can use the predefined values (see below) or user-defined distance from the mean. For user-defined values, dat.remove should be a numeric value. For example, dat.remove = 6 would would result in value outside 6SD from the mean being class as outliers. User-defined standard deviations from the mean can also be applied using sd_val. Pre-defined choices: "none", "5_95_quant", "25_75_quant", "mean_2SD", "median_2SD", "mean_3SD", "median_3SD". See the Details section for more information.

sd_val

Optional. Number of standard deviations from mean defining outliers. Example, sd_val = 6 would mean values outside +/- 6 SD from the mean would be outliers.

x.dist

Distribution of the data. Choices include: "normal", "lognormal", "exponential", "Weibull", "Poisson", "negative binomial".

date

(Optional) date variable to group the histogram by year.

group

(Optional) additional variable to group the histogram by.

pages

Whether to output plots on a single page ("single", the default) or multiple pages ("multi").

output.screen

Logical, if true, return plots to the screen. If FALSE, returns plot to the 'output' folder as a png file.

log_fun

Logical, whether to log function call (for internal use).

Details

The function returns three plots: the data, a probability plot, and a Q-Q plot. The data plot returns x against row number. Red points are data points that would be removed based on dat.remove. Blue points are data points within the bounds of dat.remove. If dat.remove is "none", then only blue points will be shown. The probability plot is a histogram of the data, after applying dat.remove, with the fitted probability distribution based on x.dist. group groups the histogram by a variable from dat, date groups the histogram by year. The Q-Q plot plots are sampled quantiles against theoretical quantiles, after applying dat.remove.

The dat.remove choices are:

  • numeric value: Remove data points outside +/- 'x'SD of the mean

  • none: No data points are removed

  • 5_95_quant: Removes data points outside the 5th and 95th quantiles

  • 25_75_quant: Removes data points outside the 25th and 75th quantiles

  • mean_2SD: Removes data points outside +/- 2SD of the mean

  • median_2SD: Removes data points outside +/- 2SD of the median

  • mean_3SD: Removes data points outside +/- 3SD of the mean

  • median_3SD: Removes data points outside +/- 3SD of the median

The distribution choices are:

  • normal

  • lognormal

  • exponential

  • Weibull

  • Poisson

  • negative binomial

Value

Plot of the data

Examples

## Not run: 

outlier_plot(pollockMainDataTable, 'pollock', x = 'Haul', dat.remove = 'mean_2SD', 
             x.dist = 'normal', output.screen = TRUE)
# user-defined outlier        
outlier_plot(pollockMainDataTable, 'pollock', x = 'Haul', dat.remove = 6, 
             x.dist = 'lognormal', output.screen = TRUE)

## End(Not run)

Remove outliers from data table

Description

Remove outliers based on outlier measure.

Usage

outlier_remove(
  dat,
  project,
  x,
  dat.remove = "none",
  sd_val = NULL,
  over_write = FALSE
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Variable in dat containing potential outliers.

dat.remove

Defines measure to subset the data. Users can use the predefined values (see below) or user-defined standard deviations from the mean. For user-defined values, dat.remove should be a numeric value. For example, dat.remove=6 would would result in value outside 6SD from the mean being class as outliers. User-defined standard deviations from the mean can also be applied using sd_val. Predefined choices: "none", "5_95_quant", "25_75_quant", "mean_2SD", "median_2SD", "mean_3SD", "median_3SD".

sd_val

Optional. Number of standard deviations from mean defining outliers. For example, sd_val=6 would mean values outside +/- 6 SD from the mean would be outliers.

over_write

Logical, If TRUE, saves data over previously saved data table in the FishSET database.

Details

The dat.remove choices are:

  • numeric value: Remove data points outside +/- 'x'SD of the mean

  • none: No data points are removed

  • 5_95_quant: Removes data points outside the 5th and 95th quantiles

  • 25_75_quant: Removes data points outside the 25th and 75th quantiles

  • mean_2SD: Removes data points outside +/- 2SD of the mean

  • median_2SD: Removes data points outside +/- 2SD of the median

  • mean_3SD: Removes data points outside +/- 3SD of the mean

  • median_3SD: Removes data points outside +/- 3SD of the median

Value

Returns the modified primary dataset. Modified dataset will be saved to the FishSET database.

Examples

## Not run: 
pollockMainDataTable <- outlier_remove(pollockMainDataTable, 'pollock', 'dist', 
   dat.remove = 'mean_2SD', save.output = TRUE)

## End(Not run)

Evaluate outliers in Data

Description

outlier_table() returns a summary table which shows summary statistics of a variable after applying several outlier filters.

Usage

outlier_table(dat, project, x, sd_val = NULL, log_fun = TRUE)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

x

Variable or column number in dat to check for outliers.

sd_val

Optional. Number of standard deviations from mean defining outliers. For example, sd_val = 4 would mean values outside +/- 4 SD from the mean would be outliers.

log_fun

Logical, whether to log function call (for internal use).

Details

Returns a table of summary statistics (mean, median, standard deviation, minimum, maximum, number of NAs, and skew of the data) for x after values outside the outlier measure have been removed. Outlier measures include 5-95% quantiles, 25-75% quantiles, mean +/-2SD, mean +/-3SD, median +/-2SD, and median +/-3SD. Only one variable can be checked at a time. Table is saved to the Output folder.

Value

Table for evaluating whether outliers may exist in the selected data column.

Examples

## Not run: 
outlier_table(pollockMainDataTable, 'pollock', x = 'HAUL')

## End(Not run)

Parse metadata from a data file

Description

General purpose meta parsing function. parse_meta attempts to parse a file based on its file extension.

Usage

parse_meta(file, ..., simplify_meta = FALSE)

Arguments

file

String, file path.

...

Additional arguments passed to a parsing function based on file extension. See below.

simplify_meta

Logical, attempt to simplify the metadata output. This uses simplify_list. This can be useful if metadata is not tabular.

Details

Function supports xls, xlsx, csv, tsv, excel, json, and xlm extensions. #' Extension-specific notes:

txt:
⁠ ⁠sep Field separator character. defaults to comment = "#".
⁠ ⁠comment The comment character used to separate (or "comment-out") the metadata from the data. Only text that has been commented-out will be read.
⁠ ⁠d_list Logical, is metadata stored as a description list (i.e. Field: value, value format). If a colon (":") is used after the field name set this to TRUE.

xls, xlsx:
⁠ ⁠range The cell range to read from (e.g. "A1:C5"). See read_excel for more details.

See Also

parse_meta_delim, parse_meta_excel, parse_meta_json, parse_meta_xml


Selectively display a note section

Description

Selectively display a note section

Usage

parse_notes(project, date = NULL, section, output = "print")

Arguments

project

The project name.

date

Date to pull notes from. If NULL then the most recent version of notes from the project are retrieved.

section

The note section to display. Options include "upload" for Upload data, "quality" for Data quality evaluation, "explore" for Data exploration, "fleet" for Fleet functions, "analysis" for Simple analysis, "new_variable" for Create new variable, "alt_choice" for Alternative choice, "models", and "bookmark".

output

Output type. "print" returns formatted notes. "string" returns a character vector of the notes. "print" is recommended for displaying notes in a report.

Examples

## Not run: 
parse_notes("pollock", type = "explore")

## End(Not run)

Plot spatial dataset

Description

Simple plotting function for viewing spatial data.

Usage

plot_spat(spat)

Arguments

spat

Spatial dataset to view. Must be an object of class sf or sfc.

Examples

## Not run: 
plot_spat(pollockNMFSSpatTable)

## End(Not run)

Policy change metrics

Description

Policy change metrics

Usage

policy_metrics(
  dat,
  project,
  tripID = "row",
  vesselID,
  catchID,
  datevar = NULL,
  price = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Name of project

tripID

Trip identifier. Can be 'row' or the name or names of variables that define trips. If tripID='row' then each row of the primary dataset is considered to be a unique trip

vesselID

Vessel identifier. Variable name in primary dataset that contains unique vessel identifier.

catchID

Name of variable in primary dataset that contains catch data.

datevar

Name of variable containing date data. Used to split data into years.

price

Name of variable containing data on revenue or price data.

Details

The policy change metrics reflect the impact of proposed policies in the absence of changes in fisher behavior. Policy scenarios are defined using zone_closure() function. Percent of vessels is calculated from the unique vessel identifiers grouped by year and zone. Trips are identified using the tripID argument, otherwise each row is assumed to be a trip. If price is not defined then percent of revenue loss will be reported as NA.

Value

Tables containing basic metrics on effects of proposed zone closures.


Summarize predicted probabilities

Description

Create summary table and figures for the predicted probabilities of fishing per zone for each model and policy scenario. The table and figures include the base case scenario, which is the proportion of observations in each zone. The table also includes the squared error between the predicted probabilities and base case probabilities. The first figure option displays predicted probabilities for each model, and the second figure option shows predicted probabilities for each model and policy.

Usage

pred_prob_outputs(
  project,
  mod.name = NULL,
  zone.dat = NULL,
  policy.name = NULL,
  output_option = "table"
)

Arguments

project

Name of project

mod.name

Name of model

zone.dat

Variable in primary data table that contains unique zone ID.

policy.name

List of policy scenario names created in zone_closure function

output_option

"table" to return summary table (default); "model_fig" for predicted probabilities; or "policy_fig" to return predicted probabilities for each model/policy scenario ; "diff_table" to return difference between predicted probabilities between model and policy scenario for each zone.

Details

This function requires that model and prediction output tables exist in the FishSET database. If these tables are not present in the database to function with terminate and return an error message.

Value

A model prediction summary table (default), model prediction figure, or policy prediction figure. See output_option argument.

Examples

## Not run: 

pred_prob_outputs(project = "scallop")


## End(Not run)

Map of predicted probabilities

Description

Create a map showing predicted probabilities by zone

Usage

predict_map(
  project,
  mod.name = NULL,
  policy.name = NULL,
  spat,
  zone.spat,
  outsample = FALSE,
  outsample_pred = NULL
)

Arguments

project

Name of project

mod.name

Name of model

policy.name

Name of policy scenario

spat

A spatial data file containing information on fishery management or regulatory zones boundaries. 'sf' objects are recommended, but 'sp' objects can be used as well. See [dat_to_sf()] to convert a spatial table read from a csv file to an 'sf' object. To upload your spatial data to the FishSETFolder see [load_spatial()].

zone.spat

Name of zone ID column in 'spat'.

outsample

Logical, indicating if predict_map() is being used for creating map of out-of-sample predicted fishing probabilities outsample = TRUE or policy scenario outsample = FALSE.

outsample_pred

A dataframe with fishing location and predicted probabilities for out-of-sample data. outsample_pred = NULL by default and when plotting policy scenarios.

Details

This function requires that model and prediction output tables exist in the FishSET database when plotting policy scenario maps.

Value

A map showing predicted probabilities

Examples

## Not run: 

predict_map(project = "scallop", policy.name = "logit_c_mod1 closure_1", 
            spat = spat, zone.spat = "TEN_ID")


## End(Not run)

Predict out-of-sample data

Description

Calculate predicted probabilities for out-of-sample dataset

Usage

predict_outsample(
  project,
  mod.name,
  outsample.mod.name,
  use.scalers = FALSE,
  scaler.func = NULL
)

Arguments

project

Name of project

mod.name

Name of saved model to use. Argument can be the name of the model or can pull the name of the saved "best" model. Leave mod.name empty to use the saved "best" model. If more than one model is saved, mod.name should be the numeric indicator of which model to use. Use table_view("modelChosen", project) to view a table of saved models.

outsample.mod.name

Name of the saved out-of-sample model design.

use.scalers

Input for create_model_input(). Logical, should data be normalized? Defaults to FALSE. Rescaling factors are the mean of the numeric vector unless specified with scaler.func.

scaler.func

Input for create_model_input(). Function to calculate rescaling factors.

Details

This function predicts out-of-sample fishing probabilities and calculates model prediction performance (percent absolute prediction error).

Examples

## Not run: 

predict_outsample("scallop1", "logit_c_mod1", "logit_c_mod1_outsample")


## End(Not run)

Format numbers in table

Description

Format numeric columns.

Usage

pretty_lab(tab, cols = "all", type = "pretty", ignore = NULL)

Arguments

tab

Table to format.

cols

Character string of columns to format. defaults to "all" which will include all numeric variables in tab. If ignore = TRUE then the columns listed in cols will be not be formatted and all other columns in tab will be formatted.

type

The type of formatting to apply. "pretty" uses prettyNum which uses commas (",") to mark big intervals. "scientific" uses scientific notation. "decimal" simply rounds to two decimal places.

ignore

Logical, whether to exclude the columns listed in cols and apply formatting to all other columns in tab.


Format table for R Markdown

Description

Format table for R Markdown

Usage

pretty_tab(tab, full_width = FALSE)

Arguments

tab

Table to format.

full_width

Logical, whether table should fill out entire width of the page.


Scroll box for R Markdown table

Description

Allows tables to become scrollable. Useful for large tables.

Usage

pretty_tab_sb(tab, width = "100%", height = "500px", full_width = FALSE)

Arguments

tab

Table to format.

width

A character string indicating the width of the box. Can be in pixels (e.g. "50px") or as a percentage (e.g. "50%").

height

A character string indicating the height of the box. Can be in pixels (e.g. "50px") or as a percentage (e.g. "50%").

full_width

Logical, whether table should fill out entire width of the page.


Create Previous Location/Area Variable

Description

Creates a variable of the previous port/zone (previous area) or the previous longitude/latitude for a vessel.

Usage

previous_loc(
  dat,
  spat,
  project,
  starting_port,
  v_id,
  tripID,
  haulID,
  zoneID = NULL,
  spatID = NULL,
  date = NULL,
  lon = NULL,
  lat = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

spat

A spatial data file containing information on fishery management or regulatory zones boundaries. sf objects are recommended, but sp objects can be used as well. See dat_to_sf() to convert a spatial table read from a csv file to an sf object. To upload your spatial data to the FishSETFolder see load_spatial().

project

String, name of project.

starting_port

The name of the starting (or disembarking) port in dat.

v_id

The name of the variable in dat that uniquely identifies vessels.

tripID

Variable name in dat that uniquely identifies trips.

haulID

Variable name in dat that uniquely identifies hauls.

zoneID

Name of zone ID column in dat. Used to identify the previous area. Required for previous area variable.

spatID

Name of zone ID column in spat. spat is used to assign ports to spatial areas. Required for previous area variable.

date

Optional, a date variable to order hauls by.

lon

Longitude variable from dat. Required for previous location variable.

lat

Latitude variable from dat. Required for previous location variable.

Details

previous_loc() can create a previous area or location variable. "Previous area" is defined as the port or zone the vessel last visited. The first area for each trip is the disembarking port (starting_port). If a port is within a zone, the zone is returned. If a port is not within a zone, the name of the port is returned. "Previous location" is defined as the previous longitude and latitude of the vessel. The first set of coordinates is the location of the port. Users must have a port table saved to the FishSET database to use this function (see load_port()). This variable can be used to define the distance matrix (see create_alternative_choice()).


Check if option file exists for a project

Description

Check if option file exists for a project

Usage

proj_settings_exists(project)

Arguments

project

Project name.

Value

TRUE if project options file exists, FALSE if not.


List output files by project name

Description

List output files by project name

Usage

project_files(project)

Arguments

project

Project name

Examples

## Not run: 
project_files("pollock")

## End(Not run)

List logs by project

Description

List logs by project

Usage

project_logs(project, modified = FALSE)

Arguments

project

Name of project.

modified

Logical, whether to show modification date. Returns a data frame.


Display database table names by project

Description

Display database table names by project

Usage

project_tables(project, ...)

Arguments

project

Name of project.

...

String, additional characters to match by.

See Also

list_tables, fishset_tables

Examples

## Not run: 
project_tables("pollock")
project_tables("pollock", "main")

## End(Not run)

Display projects names

Description

Display projects names

Usage

projects()

Details

Lists the unique project names currently in the FishSET Database.

Examples

## Not run: 
projects()

## End(Not run)

Retrieve/display meta data by project

Description

Retrieve/display meta data by project

Usage

pull_meta(project, tab.name = NULL, tab.type = NULL, format = FALSE)

Arguments

project

Project name.

tab.name

String, table name. Optional, used to filter output to a specific table.

tab.type

String, table type. Optional, used to filter output. Options include "main", "spat" (spatial), "port", "grid" (gridded), and "aux" (auxiliary).

format

Logical, whether to format output using pander. Useful for displaying in reports.


Pull notes from output folder

Description

Pull notes from output folder

Usage

pull_notes(project, date = NULL, output = "print")

Arguments

project

String, the project name.

date

String, date to pull notes from. If NULL, most recent note file is retrieved.

output

Output type. "print" returns formatted notes. "string" returns a character vector of the notes. "print" is recommended for displaying notes in a report.

Details

Notes are saved to the output folder by project name and date. If date is not specified then the most recent notes file with the project name is pulled. Notes are are also saved by FishSET app session; if more than one session occurred in the same day, each session's notes are pulled and listed in chronological order.


Retrieve output file name by project, function, and type

Description

Retrieve output file name by project, function, and type

Usage

pull_output(project, fun = NULL, date = NULL, type = "plot", conf = TRUE)

Arguments

project

Name of project

fun

Name of function.

date

Output file date in " the most recent output file is pulled.

type

Whether to return the "plot" (.png), "table" (.csv), "notes" (.txt) or "all" files matching the project name, function, and date.

conf

Logical, whether to return suppressed confidential data. Unsuppressed output will be pulled if suppressed output is not available.

Examples

## Not run: 
pull_output("pollock", "species_catch", type = "plot")

## End(Not run)

Import and format plots to notebook file

Description

Import and format plots to notebook file

Usage

pull_plot(project, fun, date = NULL, conf = TRUE)

Arguments

project

Project name.

fun

String, the name of the function that created the plot.

date

the date the plot was created. If NULL, then the most recent version is retrieved.

conf

Logical, whether to return suppressed confidential data. Unsuppressed output will be pulled if suppressed output is not available.

Examples

## Not run: 
pull_plot("pollock", "density_plot")

## End(Not run)

Import and format table to notebook file

Description

Import and format table to notebook file

Usage

pull_table(project, fun, date = NULL, conf = TRUE)

Arguments

project

Project name.

fun

String, the name of the function that created the table.

date

the date the table was created. If NULL, then the most recent version is retrieved.

conf

Logical, whether to return suppressed confidential data. Unsuppressed output will be pulled if suppressed output is not available.

Examples

## Not run: 
pull_table("pollock", "vessel_count")

## End(Not run)

Randomize latitude and longitude points by zone

Description

Randomize latitude and longitude points by zone

Usage

randomize_lonlat_zone(dat, project, spat, lon, lat, zone)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Project name.

spat

Spatial data table containing regulatory zones. This can be a "spatial feature" or sf object.

lon

String, variable name containing longitude.

lat

String, variable name containing latitude.

zone

String, column name contain the assigned zone. Must be the same for both the spatial data table and MainDataTable.

Details

This is one of the FishSET confidentiality functions. It replaces longitude and latitude values with randomly sampled coordinates from the regulatory zone the observation occurred in.

Examples

## Not run: 
randomize_lonlat_zone(pollockMainDataTable, "pollock", spatdat, 
                   lon = "LonLat_START_LON", lat = "LonLat_START_LAT",
                   zone = "NMFS_AREA")

## End(Not run)

Randomize variable value by percentage range

Description

Randomize variable value by percentage range

Usage

randomize_value_range(dat, project, value, perc = NULL)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Project name.

value

String, name of variable to jitter.

perc

Numeric, a vector of percentages to randomly adjust a column of values by. Defaults to a range of 0.05 - 0.15 (i.e. 5-15 percent of original value).

Details

This is one of the FishSET confidentiality functions. It adjusts a value by adding or substracting (chosen at random for each value) a percentage of the value. The percentage is randomly sampled from a range of percentages provided in the "perc" argument.

Examples

## Not run: 
randomize_value_range(pollockMainDataTable, "pollock", "LBS_270_POLLOCK_LBS")

## End(Not run)

Randomize value between rows

Description

Randomize value between rows

Usage

randomize_value_row(dat, project, value)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Project name.

value

String, variable name to be randomly distributed between rows.

Details

This is one of the FishSET confidentiality functions. It is useful for randomly assigning ID values between observations.

Examples

## Not run: 
randomize_value_row(pollockMainDataTable, "pollock", "PERMIT")

## End(Not run)

Import data from local file directory or webpage into the R environment

Description

Import data from local file directory or webpage into the R environment

Usage

read_dat(
  x,
  data.type = NULL,
  is.map = FALSE,
  drv = NULL,
  dbname = NULL,
  user = NULL,
  password = NULL,
  ...
)

Arguments

x

Name and path of dataset to be read in. To load data directly from a webpage, x should be the web address.

data.type

Optional. Data type can be defined by user or based on the file extension. If undefined, data.type is the string after the last period or equal sign. data.type must be defined if x is the path to a shape folder, if the file is a Google spreadsheet use data.type = 'google', or if the correct extension cannot be derived from x. R, comma-delimited, tab-delimited, excel, Matlab, json, geojson, sas, spss, stata, and html, and XML data extensions do not have to be specified.

is.map

logical, for .json file extension, set is.map = TRUE if data is a spatial file. Spatial files ending in .json will not be read in properly unless is.map = TRUE.

drv

Use with sql files. Database driver.

dbname

Use with sql files. If required, database name.

user

Use with sql files. If required, user name for SQL database.

password

Use with sql files. If required, SQL database password.

...

Optional arguments

Details

Uses the appropriate function to read in data based on data type. Use write_dat to save data to the data folder in the project directory. Supported data types include shape, csv, json, matlab, R, spss, and stata files. Use data.type = 'shape' if x is the path to a shape folder. Use data.type = 'google' if the file is a Google spreadsheet.

For sql files, use data.type = 'sql'. The function will connect to the specified DBI and pull the table. Users must specify the DBI driver (drv), for example: RSQLite::SQLite(), RPostgreSQL::PostgreSQL(), odbc::odbc(). Further arguments may be required, including database name (dbname), user id (user), and password (password).

Additional arguments can be added, such as skip lines skip = 2 and header header = FALSE. To specify the separator argument for a delimited file, include tab-delimited, specify data.type = 'delim'.

For more details, see load for loading R objects, read_csv for reading in comma separated value files, read_tsv for reading in tab separated value files, read_delim for reading in delimited files, read_excel for reading in excel files (xls, xlsx), st_read for reading in geojson , GeoPackage files, and shape files, readMat for reading in matlab data files, read_dta for reading in stata data files, read_spss for reading in spss data files, read_sas for reading in sas data files, and fromJSON for reading in json files. read_xml for reading in XML files. Further processing may be required. read_html for reading in html tables. See read_sheet in range_read for reading in google spreadsheets. Google spreadsheets require data.type be specified. Use data.type = 'google'. read_ods for reading in open document spreadsheets.

Examples

## Not run: 
# Read in shape file
dat <- read_dat('C:/data/nmfs_manage_simple', data.type = 'shape')

# Read in spatial data file in json format
dat <- read_dat('C:/data/nmfs_manage_simple.json', is.map = TRUE)

# read in data directly from web page
dat <- read_dat("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.txt", 
                data.type = 'delim', sep = '', header = FALSE)

## End(Not run)

Remove a model design from list in ModelInputData table

Description

Remove a model design from list in ModelInputData table

Usage

remove_model_design(project, names)

Arguments

project

Name of project.

names

Names of model designs to be deleted from the table


Replace suppression code

Description

This function replaces the default suppression code in a table.

Usage

replace_sup_code(output, code = NA)

Arguments

output

Table containing suppressed values.

code

The replacement suppression code. code = NA by default; this is ideal for plotting as ggplot automatically removes NAs.

Details

Suppressed values are represented as ‘-999' by default. This isn’t ideal for plotting. NAs – the default in 'replace_sup_code()'– are a better alternative for plots as they can easily be removed.

Examples

## Not run: 
summary_tab <- replace_sup_code(summary_tab, code = NA)

## End(Not run)

Reset confidentiality cache tables

Description

This function deletes all confidentiality check tables stored in the "confid_cache.json" file located in the project output folder. Resetting this cache is recommended after a long period of use as check tables can accumulate over time.

Usage

reset_confid_cache(project)

Arguments

project

Project name

See Also

get_confid_cache


Apply moving average function to catch

Description

Apply moving average function to catch

Usage

roll_catch(
  dat,
  project,
  catch,
  date,
  group = NULL,
  combine = FALSE,
  k = 10,
  fun = "mean",
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  scale = "fixed",
  align = "center",
  conv = "none",
  tran = "identity",
  format_lab = "decimal",
  output = "tab_plot",
  ...
)

Arguments

dat

Main data frame over which to apply function. Table in FishSET database should contain the string 'MainDataTable'.

project

Name of project.

catch

Variable name or names containing catch data. Multiple variables can entered as a vector.

date

Date variable to aggregate by.

group

Variable name or names to group by. Plot will display up to two grouping variables.

combine

Whether to combine variables listed in group. This is passed to the "fill" or "color" aesthetic for plots.

k

The width of the window.

fun

The function to be applied to window. Defaults to mean.

filter_date

The type of filter to apply to table. The "date_range" option will subset the data by two date values entered in filter_val. Other options include "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument filter_value must be provided.

date_value

String containing a start and end date if using filter_date = "date_range", e.g. c("2011-01-01", "2011-03-15"). If filter_date = "period" or "year-period", use integers (4 digits if year, 1-2 if day, month, or week). Use a list if using a two-part filter, e.g. "year-week", with the format list(year, period) or a vector if using a single period, c(period). For example, list(2011:2013, 5:7) will filter the data table from weeks 5 through 7 for years 2011-2013 if filter_date = "year-week".c(2:5) will filter the data February through May when filter_date = "month".

filter_by

String, variable name to filter by.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by.

filter_expr

String, a valid R expression to filter 'MainDataTable' by using the variable in filter_by.

facet_by

Variable name to facet by. This can be a variable that exists in the dataset, or a variable created by roll_catch() such as "year", "month", or "species" if more than one variable is entered in catch.

scale

Scale argument passed to facet_grid. Options include "free", "free_x", "free_y". Defaults to "fixed".

align

Indicates whether results of window should be left-aligned ("left"), right-aligned ("right"), or centered ("center"). Defaults to "center".

conv

Convert catch variable to "tons", "metric_tons", or by using a function. Defaults to FALSE.

tran

A function to transform the y-axis. Options include log, log2, log10, sqrt.

format_lab

Formatting option for y-axis labels. Options include "decimal" or "scientific".

output

Whether to display "plot", "table", or both. Defaults to both ("tab_plot").

...

Additional arguments passed to rollapply

Examples

## Not run: 
roll_catch(pollockMainDataTable, project = "pollock", catch = "LBS_270_POLLOCK_LBS",
  date = "FISHING_START_DATE", group = "GEAR_TYPE", k = 15
)

roll_catch(pollockMainDataTable, project = "pollock", catch = c("LBS_270_POLLOCK_LBS", 
 "LBS_110_PACIFIC_COD_LBS"), date = "FISHING_START_DATE", group = "GEAR_TYPE", k = 5, 
 filter_date = "month", date_value = 4:6, facet_by = "month", conv = "tons"
)

## End(Not run)

Guided user interface for FishSET functions

Description

Runs functions associated with loading data, exploring data, checking for data quality issues, generating new variables, and basic data analysis function.

Usage

run_fishset_gui()

Details

Opens an interactive page that allows users to select which functions to run by clicking check boxes. Data can be modified and saved. Plot and table output are saved to the output folder. Functions calls are logged in the log file.

Examples

## Not run: 
run_fishset_gui()

## End(Not run)

Runs policy scenarios

Description

Estimate redistributed fishing effort and welfare loss/gain from changes in policy or change in other factors that influence fisher location choice.

Usage

run_policy(
  project,
  mod.name = NULL,
  policy.name = NULL,
  betadraws = 1000,
  marg_util_income = NULL,
  income_cost = NULL,
  zone.dat = NULL,
  group_var = NULL,
  enteredPrice = NULL,
  expected.catch = NULL,
  use.scalers = FALSE,
  scaler.func = NULL
)

Arguments

project

Name of project

mod.name

Model name. Argument can be the name of the model or the name can be pulled the 'modelChosen' table. Leave mod.name empty to use the name of the saved 'best' model. If more than one model is saved, mod.name should be the numeric indicator of which model to use. Use table_view("modelChosen", project) to view a table of saved models.

policy.name

List of policy scenario names created in zone_closure function

betadraws

Integer indicating the number of times to run the welfare simulation. Default value is betadraws = 1000

marg_util_income

For conditional and zonal logit models. Name of the coefficient to use as marginal utility of income.

income_cost

For conditional and zonal logit models. Logical indicating whether the coefficient for the marginal utility of income relates to cost (TRUE) or revenue (FALSE).

zone.dat

Variable in primary data table that contains unique zone ID.

group_var

Categorical variable from primary data table to group welfare outputs.

enteredPrice

Price data. Leave as NULL if using price data from primary dataset.

expected.catch

Required for conditional logit (logit_c) model. Name of expected catch table to use. Can be the expected catch from the short-term scenario (short), the medium-term scenario (med), the long-term scenario (long), or the user-defined temporal parameters (user).

use.scalers

Input for create_model_input(). Logical, should data be normalized? Defaults to FALSE. Rescaling factors are the mean of the numeric vector unless specified with scaler.func.

scaler.func

Input for create_model_input(). Function to calculate rescaling factors.

Details

run_policy is a wrapper function for model_prediction and welfare_predict. model_prediction estimates redistributed fishing effort after policy changes, and welfare_predict simulates welfare loss/gain.


Save modified primary data table to FishSET database

Description

Save modified primary data table to FishSET database

Usage

save_dat(dat, project)

Arguments

dat

Name of data frame in working environment to save to FishSET database.

project

String, name of project.

Details

Use function to save modified data to the FishSET database. The primary data is only saved automatically in data upload and data check functions. It is therefore advisable to save the modified data to the database before moving on to modeling functions. Users should use primary data in the working environment for assessing data quality issues, modifying the data, and generating new variables. Pulling the primary data from the FishSET database on each function without manually saving will result in a loss of changes.

Examples

## Not run: 
save_dat(pollockMainDataTable, 'pollock')

## End(Not run)

Save a meta data file to project folder

Description

Raw (i.e. original or pre-existing) meta data can be saved to the project folder. To add additional meta data (e.g. column descriptions), see ...

Usage

save_raw_meta(
  file,
  project,
  dataset = NULL,
  tab.name = NULL,
  tab.type,
  parse = FALSE,
  overwrite = FALSE,
  ...
)

Arguments

file

String, file path.

project

Project name.

dataset

Optional, the data.frame associated with the meta data. Used to add column names to meta file.

tab.name

The table name as it appears in the FishSET Database (e.g. "projectMainDataTable" if the main table).

tab.type

The table type. Options include "main", "spat" (spatial), "port", "grid" (gridded), and "aux" (auxiliary).

parse

Logical, whether to parse meta data from a data file. See parse_meta.

overwrite

Logical, whether to overwrite existing meta table entry.

...

Additional arguments passed to parse_meta.

See Also

parse_meta


Northeast Scallop Data

Description

A subset of anonymized scallop data

Usage

scallop

Format

'scallop' A data.frame with 10,000 rows and 31 columns:

TRIPID

Randomly assigned trip ID number.

DATE_TRIP

Date of landing.

PERMIT.y

Randomly assigned six-digit vessel fishing permit number.

TRIP_LENGTH

Days calculated from the elapsed time between the date-time sailed and date-time landed; this is a measure of days absent.

GEARCODE

Fishing gear used on the trip.

port_lat

Latitude of the geoid.

port_lon

longitude of the geoid.

previous_port_lat

Previous latitude of geoid.

previous_port_lon

Previous longitude of geoid.

Plan Code

Portion of the VMS declaration code that identifies the fishery being declared into for the trip.

Program Code

Portion of the VMS declaration code that identifies the program within the declared fishery. For scallops, the program code delineates LA and LAGC trips, as well as access area trips from other trips.

TRIP_COST_WINSOR_2020_DOL

The estimated or real composite trip cost for the VTR trip record generated using the methods described in the Commercial Trip Cost Estimation 2007-2019 PDF file. However, these values have been Winsorized by gear type as a method of avoiding unreasonably high or low trip costs, replacing any value within each gear-group that is less than the 1st percentile or greater than the 99th percentile with the 1st and 99th percentile value, respectively.

DDLAT

The latitude reported on a VTR (Vessel Trip Reports).

DDLON

The longitude reported on a VTR (Vessel Trip Reports).

NAME

Name of wind lease which is found within a given ten minute square.

ZoneID

FishSET's version of a ten minute square.

POUNDS

Live pounds.

LANDED

Landed pounds from the dealer report.

LANDED_OBSCURED

Landed pounds from the dealer report (jittered/obscured).

DOLLAR_OBSCURED

The value of catch paid by the dealer, from the dealer report (jittered/obscured).

DOLLAR_2020_OBSCURED

The value of catch paid by the dealer, from the dealer report (in 2020 dollars, jittered/obscured).

DOLLAR_ALL_SP_2020_OBSCURED

The value of catch for all species caught (in 2020 dollars, jittered/obscured).

Source

Add source here


Ports from the NE scallop fishery

Description

A dataset containing the names and lat/lon coordinates of ports used in the US northeast scallop fishery.

Usage

scallop_ports

Format

A data frame (tibble) with 40 observations and 3 variables.

[,1] Port names
[,2] Longitude
[,3] Latitude

Source

NEED TO ADD SOURCE DESCRIPTION


Create single binary fishery season identifier variable

Description

Create single binary fishery season identifier variable

Usage

seasonalID(
  dat,
  project,
  seasonal.dat = NULL,
  start,
  end,
  overlap = FALSE,
  name = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

Project name.

seasonal.dat

Data table containing date of fishery season(s). Data table can be pulled from the FishSET database. Leave seasonal.dat as NULL if supplying start and end dates with start and end arguments.

start

Date, supplied as a string (example: start='2011/04/22', start='04222011'), or variable in seasonal.dat which identifies start date of fishery season

end

DDate, supplied as a string (example: start='2011/04/22', start='04222011'), or variable in seasonal.dat which identifies end date of fishery season

overlap

Logical. Should trip or haul dates that start before or end after the fishery season date but starts or ends within the fishery season dates be included? FALSE indicates to inlude only hauls/trips that fall completely within the bounds of a fishery season date. Defaults to FALSE.

name

String Seasonal identifier name

Details

Uses a supplied dates or a table of fishery season dates to create fishery season identifier variables. Output is a binary variable called name or 'SeasonID' if name is not supplied.

For each row dat, the function matches fishery season dates provided in seasonal.dat to the earliest date variable in dat.

Value

Returns a binary variable of within (1) or outside (0) the fishery season.

Examples

## Not run: 
#Example using a table stored in the FishSET database
pcodMainDataTable <- season_ID("pcodMainDataTable", 'pcod', seasonal_dat='seasonTable', 
     start='SeasonStart', end='SeasonEnd', name='2001A')
#Example using manually entered dates
pcodMainDataTable <- season_ID("pcodMainDataTable", 'pcod', seasonal.dat=NULL, 
    start='04152011', end='06302011', name='2001A')

## End(Not run)

View model metrics and record best model interactively

Description

Model metrics are displayed as a table in an R Shiny application. Check boxes next to models allow users to record preferred or best model.

Usage

select_model(project, overwrite_table = FALSE)

Arguments

project

String, name of project.

overwrite_table

Logical, should best model table be written over? If table exists and value is FALSE, appends new results to existing table. Defaults to FALSE.

Details

Opens an interactive data table that displays model measures of fit for each model run saved in the model measures of fit table in the FishSET database. The name of this table should contain the string 'out.mod'. Users can delete models from the table and select the preferred model by checking the "selected" box. The table is then saved to the FishSET database with two new columns added, a TRUE/FALSE selected column and the date it was selected. The table is saved with the phrase 'modelChosen' in the FishSET database. The function can also be called indirectly in the discretefish_subroutine by specifying the select.model argument as TRUE. The 'modelChosen' table is not used in any functions. The purpose of this function and the 'modelChosen' table is to save a reference of the preferred model.

Examples

## Not run: 
select_model("pollock", overwrite_table = FALSE)

## End(Not run)

Interactive application to select variables to include/exclude in primary dataset

Description

Opens an R Shiny web application. With the application select on variables in the primary dataset that should be retained.

Usage

select_vars(dat, project)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

Details

Opens an interactive table that allows users to select which variables to be included by clicking check boxes. Data should be loaded into the FishSET database before running this function. Select variables that will be used to generate further variables, such as rates or cpue, and variables to be included in models. Removed variables can be added back into the dataset at a later date using the add_vars function.

Examples

## Not run: 
select_vars(pcodMainDataTable, "pcod")

## End(Not run)

Set confidentiality parameters

Description

This function specifics whether to check for confidentiality and which rule should be applied.

Usage

set_confid_check(project, check = TRUE, v_id = NULL, rule = "n", value = NULL)

Arguments

project

Name of project.

check

Logical, whether to check for confidentiality.

v_id

String, the column name containing the vessel identifier.

rule

String, the confidentiality rule to apply. See "Details" below. rule = "n" suppresses values containing fewer than n vessels. rule = "k" suppresses values where a single vessel contains k percent or more of the total catch.

value

The threshold for confidentiality. for rule = "n" must be an integer of at least 2. For rule = "k" any numeric value from 0 to 100.

Details

rule = "n" counts the number of vessel in each strata and suppresses values where fewer than n vessels are present. For rule = "k", or the "Majority allocation rule", each vessel's share of catch is calculated by strata. If any vessel's total catch share is greater than or equal to k percent the value is suppressed.

Examples

## Not run: 
set_confid_check("pollock", check = TRUE, v_id = "PERMIT", rule = "n", value = 3L)

## End(Not run)

Create factor variable from quantiles

Description

Create a factor variable from numeric data. Numeric variable is split into categories based on quantile categories.

Usage

set_quants(
  dat,
  project,
  x,
  quant.cat = c(0.1, 0.2, 0.25, 0.33, 0.4),
  custom.quant = NULL,
  name = "set_quants"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Variable to transform into quantiles.

quant.cat

Quantile options: "0.2", "0.25", "0.33", and "0.4"

  • 0.1: (0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%)

  • 0.2: (0%, 20%, 40%, 60%, 80%, 100%)

  • 0.25: (0%, 25%, 50%, 75%, 100%)

  • 0.33: (0%, 33%, 66%, 100%)

  • 0.4: (0%, 10%, 50%, 90%, 100%)

custom.quant

Vector, user defined quantiles.

name

String, name of created vector. Defaults to name of the function if not defined.

Value

Primary dataset with quantile variable added.

Examples

## Not run: 
pollockMainDataTable <- set_quants(pollockMainDataTable, 'pollock', 'HAUL', 
   quant.cat=.2, 'haul.quant')

## End(Not run)

Set user folder directory

Description

Set user folder directory

Usage

set_user_locoutput(loc_dir, project)

Arguments

loc_dir

Local user directory

project

Name of project.

Details

This function saves the local user directory to the project settings file with a valid folder directory. This directory path is used for inserting plots and tables from a folder outside the FishSET package into the FishSET RMarkdown Template.

See Also

insert_plot insert_table get_proj_settings


Evaluate sparsity in data over time in table format

Description

Create table of data sparsity by predefined time periods.

Usage

sparsetable(dat, project, timevar, zonevar, var)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

timevar

Variable in dat containing temporal data

zonevar

Variable in dat containing zone observation assigned to

var

Variable in dat containing catch data


Evaluate sparsity in data over time in plot format

Description

Evaluate sparsity in data over time in plot format

Usage

sparsplot(project, x = NULL)

Arguments

project

String, name of project.

x

Output from sparsetable. If x is null, the sparsity table will be pulled from the output folder if it exists.

Details

Returns a plot of sparsity values over time. Requires sparsity table generated by sparsetable.


GUI for spatial data checks

Description

Runs the spatial checks performed by spatial_qaqc in a shiny application.

Usage

spat_qaqc_gui(dataset, project, spatdat, checks = NULL)

Arguments

dataset

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Name of project.

spatdat

Spatial data containing information on fishery management or regulatory zones. See read_dat for details on importing spatial data.

checks

(Optional) A list of spatial data quality checks outputted by spatial_qaqc.

See Also

spatial_qaqc


Histogram of latitude and longitude by grouping variable

Description

Histogram of latitude and longitude by grouping variable

Usage

spatial_hist(dat, project, group = NULL)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

group

Column in dat containing grouping categories.

Details

Returns a histogram of observed lat/lon split by grouping variable. Output printed to console and saved to Output folder. Function is used to assess spatial variance/clumping of selected grouping variable.

Value

Returns histogram of latitude and longitude by grouping variable. Output returned to console and saved to Output folder.

Examples

## Not run: 
spatial_hist(pollockMainDataTable, 'pollock', 'GEAR_TYPE')

## End(Not run)

Spatial data quality checks

Description

This function performs spatial quality checks and outputs summary tables and plots. Checks include percent of observations on land, outside regulatory zone (spat), and on a zone boundary. If any observation occurs outside the regulatory zones then summary information on distance from nearest zone is provided. spatial_qaqc can filter out observations that are not within the distance specified in filter_dist.

Usage

spatial_qaqc(
  dat,
  project,
  spat,
  lon.dat,
  lat.dat,
  lon.spat = NULL,
  lat.spat = NULL,
  id.spat = NULL,
  epsg = NULL,
  date = NULL,
  group = NULL,
  filter_dist = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Name of project.

spat

Spatial data containing information on fishery management or regulatory zones. sf objects are recommended, but sp objects can be used as well. If using a spatial table read from a csv file, then arguments lon.spat and lat.spat are required. To upload your spatial data to the FishSETFolder see load_spatial.

lon.dat

Longitude variable in dat.

lat.dat

Latitude variable in dat.

lon.spat

Variable or list from spat containing longitude data. Required for spatial tables read from csv files. Leave as NULL if spat is an sf or sp object.

lat.spat

Variable or list from spat containing latitude data. Required for spatial tables read from csv files. Leave as NULL if spat is an sf or sp object.

id.spat

Polygon ID column. Required for spatial tables read from csv files. Leave as NULL if spat is an sf or sp object.

epsg

EPSG number. Manually set the epsg code, which will be applied to spat and dat. If epsg is not specified but is defined for spat, then the spat epsg will be applied to dat. In addition, if epsg is not specified and epsg is not defined for spat, then a default epsg value will be applied to spat and dat (epsg = 4326). See http://spatialreference.org/ to help identify optimal epsg number.

date

String, name of date variable. Used to summarize over year. If NULL the first date column will be used. Returns an error if no date columns can be found.

group

String, optional. Name of variable to group spatial summary by.

filter_dist

(Optional) Numeric, distance value to filter primary data by (in meters). Rows containing distance values greater than or equal to filter_dist will be removed from the data. This action will be saved to the filter table.

Value

A list of plots and/or dataframes depending on whether spatial data quality issues are detected. The list includes:

dataset

Primary data. Up to five logical columns will be added if spatial issues are found: "ON_LAND" (if obs fall on land), "OUTSIDE_ZONE" (if obs occur at sea but outside zone), "ON_ZONE_BOUNDARY" (if obs occurs on zone boundary), "EXPECTED_LOC" (whether obs occurs at sea, within a zone, and not on zone boundary), and "NEAREST_ZONE_DIST_M" (distance in meters from nearest zone. Applies only to obs outside zone or on land).

spatial_summary

Dataframe containing the percentage of observations that occur at sea and within zones, on land, outside zones but at sea, or on zone boundary by year and/or group. The total number of observations by year/group are in the "N" column.

outside_plot

Plot of observations outside regulatory zones.

land_plot

Plot of observations that fall on land.

land_out_plot

Plot of observations that occur on land and are outside the regulatory zones (combines outside_plot and land_plot if both occur).

boundary_plot

Plot of observations that fall on zone boundary.

expected_plot

Plot of observations that occur at sea and within zones.

distance_plot

Histogram of distance form nearest zone (meters) by year for observations that are outside regulatory grid.

distance_freq

Binned frequency table of distance values.

distance_summary

Dataframe containing the minimum, 1st quartile, median, mean, 3rd quartile, and maximum distance values by year and/or group.

Examples

## Not run: 
# run spatial checks
spatial_qaqc("pollockMainDataTable", "pollock", spat = NMFS_AREAS, 
             lon.dat = "LonLat_START_LON", lat.dat = "LonLat_START_LAT")
             
# filter obs by distance
spat_out <- 
     spatial_qaqc(pollockMainDataTable, "pollock", spat = NMFS_AREAS,
                  lon.dat = "LonLat_START_LON", lat.dat = "LonLat_START_LAT",
                  filter_dist = 100)
mod.dat <- spat_out$dataset

## End(Not run)

Summarize variable over data and time

Description

View summary and exploratory statistics of selected variable by date and zone.

Usage

spatial_summary(
  dat,
  project,
  stat.var = c("length", "no_unique_obs", "perc_total", "mean", "median", "min", "max",
    "sum"),
  variable,
  spat,
  lon.spat = NULL,
  lat.spat = NULL,
  lon.dat = NULL,
  lat.dat = NULL,
  cat
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

stat.var

Options are "length", "no_unique_obs", "perc_total", "mean", "median", "min", "max", and "sum".

variable

Variable in dat to summarize over date and zone.

spat

Spatial data containing information on fishery management or regulatory zones. Shape, json, geojson, and csv formats are supported. Leave as NULL if the variable ‘ZoneID’ assigning observations to zones exists in dat.

lon.spat

Variable or list from spat containing longitude data. Required for csv files. Leave as NULL if spat is a shape or json file or if the variable ‘ZoneID’ exists in dat.

lat.spat

Variable or list from spat containing latitude data. Required for csv files. Leave as NULL if spat is a shape or json file, or if the variable ‘ZoneID’ exists in dat.

lon.dat

Longitude variable in dat. Leave as NULL if the variable ‘ZoneID’ (zonal assignment) exists in dat.

lat.dat

Latitude variable in dat. Leave as NULL if the variable ‘ZoneID’ (zonal assignments) exists in dat.

cat

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, cat should be name of list containing information on zones. Leave as NULL if the variable ‘ZoneID’ exists in dat.

Details

stat.var details:

length: Number of observations
no_unique_obs: Number of unique observations
perc_total: Percent of total observations
mean: Mean
median: Median
min: Minimum
max: Maximum
sum: Sum

Value

Returns two plots, the variable aggregated by stat.var plotted against date and against zone.

Examples

## Not run: 
Example where ZoneID exists in dataset
   spatial_summary(pcodMainDataTable, project = 'pcod', 
      stat.var = "no_unique_obs", variable = 'HAUL')

Example where obs. have not been assigned to zones
    spatial_summary(pcodMainDataTable, project = 'pcod', stat.var = "no_unique_obs",
       variable = 'HAUL', spat = spatdat, lon.dat = 'MidLat', lat.dat = 'MidLat',
       cat = 'NMFS_AREA')

## End(Not run)

Summarize species catch

Description

species_catch summarizes catch (or other numeric variables) in the main table. It can summarize by period if date is provided, grouping variables, and filter by period or value. There are several options for customizing the table and plot output.

Usage

species_catch(
  dat,
  project,
  species,
  date = NULL,
  period = NULL,
  fun = "sum",
  group = NULL,
  sub_date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  type = "bar",
  conv = "none",
  tran = "identity",
  format_lab = "decimal",
  value = "count",
  position = "stack",
  combine = FALSE,
  scale = "fixed",
  output = "tab_plot",
  format_tab = "wide"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

species

Variable in dat containing the species catch or a vector of species variables (in pounds).

date

Variable in dat containing dates to aggregate by.

period

Time period to count by. Options include 'year', 'month', 'week' (week of the year), 'weekday', 'day' (day of the month), and 'day_of_year'. date is required.

fun

String, name of function to aggregate by. Defaults to sum.

group

Grouping variable name(s). Up to two grouping variables are available for line plots and one for bar plots. For bar plots, if only one species is entered the first group variable is passed to 'fill'. If multiple species are entered, species is passed to "fill" and the grouping variable is dropped. An exception occurs when facetting by species, then the grouping variable is passed to "fill". For line plots, the first grouping variable is passed to "fill" and the second to "linetype" if a single species column is entered or if facetting by species. Otherwise species is passed to "fill", the first group variable to "linetype", and second is dropped.

sub_date

Date variable used for subsetting, grouping, or splitting by date.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by using the variable in filter_by.

facet_by

Variable name to facet by. Accepts up to two variables. These can be variables that exist in the dat, or a variable created by species_catch() such as "year", "month", or "week" if a date variable is added to sub_date. Facetting by "species" is available if multiple catch columns are included in "species". The first variable is facetted by row and the second by column.

type

Plot type, options include "bar" (the default) and "line".

conv

Convert catch variable to "tons", "metric_tons", or by using a function entered as a string. Defaults to "none" for no conversion.

tran

A function to transform the y-axis. Options include log, log2, log10, sqrt.

format_lab

Formatting option for y-axis labels. Options include "decimal" or "scientific".

value

Whether to calculate raw "count" or "percent" of total catch.

position

Positioning of bar plot. Options include 'stack', 'dodge', and 'fill'.

combine

Whether to combine variables listed in group. This is passed to the "fill" or "color" aesthetic for plots.

scale

Scale argument passed to facet_grid. Defaults to "fixed".

output

Output a "plot" or "table". Defaults to both ("tab_plot").

format_tab

How table output should be formatted. Options include 'wide' (the default) and 'long'.

Value

species_catch() aggregates catch using one or more columns of catch data. Users can aggregate by time period, by group, or by both. When multiple catch variables are entered, a new column "species" is created and used to group values in plots. The "species" column can also be used to split (or facet) the plot. For table output, the "species" column will be kept if format_tab = "long", i.e. a column of species names ("species") and a column containing catch ("catch"). When format_tab = "wide", each species is given its own column of catch.

The data can be filtered by date and/or by a variable. filter_date specifies the type of date filter to apply–by date-range or by period. date_value should contain the values to filter the data by. To filter by a variable, enter its name as a string in filter_by and include the values to filter by in filter_value.

Plots can handle Up to two grouping variables, there is no limit for tables. Grouping variables can be merged into one variable using combine; in this case any number of variables can be joined, but no more than three is recommended.

For faceting, any variable (including ones listed in group) can be used, but "year", "month", "week" are also available provided a date variable is added to sub_date. Currently, combined variables cannot be faceted. A list containing a table and plot are printed to the console and viewer by default.

Examples

## Not run: 
# summarizing one catch column by one group variable
species_catch(pollockMainDataTable, species = "OFFICIAL_TOTAL_CATCH_MT",
              group = "GEAR_TYPE", ouput = "tab_plot")

# summarizing three catch columns by month
species_catch('pollockMainDataTable', 
              species = c('HAUL_LBS_270_POLLOCK_LBS', 
                          'HAUL_LBS_110_PACIFIC_COD_LBS', 
                          'HAUL_LBS_OTHER_LBS'), 
              date = 'HAUL_DATE', period = 'month_num', output = 'plot', 
              conv = 'tons')

# filtering by variable
species_catch(pollockMainDataTable, species = "OFFICAL_TOTAL_CATCH_MT",
              group = "GEAR_TYPE", filter_by = "PORT_CODE", 
              filter_value = "Dutch Harbor")
              
 # filtering by date
 species_catch(pollockMainDataTable, species = "OFFICAL_TOTAL_CATCH_MT",
               sub_date = "HAUL_DATE", filter_date = "month", date_value = 7:10)

## End(Not run)

Separate secondary data table from MainDataTable

Description

Separate secondary data table from MainDataTable

Usage

split_dat(dat, aux = NULL, project, split_by = NULL, key, output = "main")

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

aux

Auxiliary data table in fishset_db or environment. Use string if referencing a table saved in the FishSET database. The column names from "aux" will be used to find and separate the auxiliary table from the MainDataTable.

project

String, name of project.

split_by

String, columns in MainDataTable to split by. These columns will be separated from MainDataTable. Must contain values from "key".

key

String, the column(s) that link the main and auxiliary data tables. If using "aux" method, "key" must match a column in both MainDataTable and "aux" data table. If using "split_by", "key" must match a column in "MainDataTable and also be included in "split_by".

output

String, return either the "main" data table, "aux" data table, or "both" main and aux data tables in a list.

Details

This function separates auxiliary data (or gridded and port data) from the MainDatatable. Users can either input the secondary data table (from environment or fishset_db) to determine which columns to remove or by passing a string of columns names to "split_by". Use either the "aux" or the "split_by" method. Defaults to "aux" method if both arguments are used.

Examples

## Not run: 
split_dat("pollockMainDataTable", "pollock", aux = "pollockPortTable", key = "PORT_CODE")

## End(Not run)

View summary statistics

Description

View summary statistics in table format for entire dataset or for a specific variable.

Usage

summary_stats(dat, project, x = NULL, log_fun = TRUE)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Name of project

x

Optional. Variable in dat to view summary statistics for. If not defined, summary stats are displayed for all columns in the dataset.

log_fun

Logical, whether to log function call (for internal use).

Details

Prints summary statistics for each variable in the data set. If x is specified, summary stats will be returned only for that variable. Numeric variables are summarized by minimum, median, mean, maximum, and the number of NA's, unique values, and zeros. Non-numeric variables are summarized by first value and the number of NA's, unique values, and empty values. Function is called in the data_check function.

Examples

## Not run: 
summary_stats(pcodMainDataTable, project = "pcod")

summary_stats(pcodMainDataTable, project = "pcod", x = "HAUL")

## End(Not run)

Check if table exists in the FishSET database for the defined project

Description

Wrapper for dbExistsTable. Check if a table exists in the FishSET database.

Usage

table_exists(table, project)

Arguments

table

Name of table in FishSET database.Table name must be in quotes.

project

Name of project

Value

Returns a logical statement of table existence.

Examples

## Not run: 
table_exists('pollockMainDataTable', 'pollock')

## End(Not run)

Lists fields for FishSET database table

Description

Wrapper for dbListFields. View fields of selected table.

Usage

table_fields(table, project)

Arguments

table

String, name of table in FishSET database. Table name must be in quotes.

project

Project name

Examples

## Not run: 
table_fields('pollockMainDataTable', 'pollock')

## End(Not run)

Remove table from FishSET database

Description

Wrapper for dbRemoveTable. Remove a table from the FishSET database.

Usage

table_remove(table, project)

Arguments

table

String, name of table in FishSET database. Table name must be in quotes.

project

Name of project

Details

Function utilizes sql functions to remove tables from the FishSET database.

Examples

## Not run: 
table_remove('pollockMainDataTable', 'pollock')

## End(Not run)

Save an existing FishSET DB table

Description

table_save() updates existing FishSET DB tables. If the table doesn't exist, the user is reminded to use the appropriate load_ function.

Usage

table_save(table, project, type, name = NULL)

Arguments

table

A dataframe to save to the FishSET Database.

project

Name of project.

type

The table type. Options include, "main" for main data tables, "port" for port tables, "grid" for gridded tables, "aux" for auxiliary tables.

name

String, table name. Applicable only for gridded, auxiliary, and spatial tables.


View FishSET Database table

Description

Wrapper for dbGetQuery. View or call the selected table from the FishSET database.

Usage

table_view(table, project)

Arguments

table

String, name of table in FishSET database. Table name must be in quotes.

project

Name of project.

Details

table_view() returns a table from a project's FishSET Database.

See Also

list_tables to show existing tables by project and type. fishset_tables to show all tables in the FishSETFolder.

Examples

## Not run: 
head(table_view('pollockMainDataTable', project = 'pollock'))

## End(Not run)

View names of project tables

Description

Wrapper for dbListTables. View names of tables in a project's FishSET database.

Usage

tables_database(project)

Arguments

project

Project name

Examples

## Not run: 
tables_database('pollock')

## End(Not run)

Number of observations by temporal unit

Description

View the number of observations by year, month, and zone in table format

Usage

temp_obs_table(
  dat,
  project,
  x,
  zoneid = NULL,
  spat = NULL,
  lon.dat = NULL,
  lat.dat = NULL,
  cat = NULL,
  lon.spat = NULL,
  lat.spat = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

x

Variable in dat containing date variable.

zoneid

Variable in dat that identifies the individual zones or areas. Defaults to NULL. Define if the name of the zone identifier variable is not 'ZoneID'.

spat

Spatial data containing information on fishery management or regulatory zones. Shape, json, geojson, and csv formats are supported. Required if zoneid does not exist in dat.

lon.dat

Longitude variable in dat. Required if zoneid does not exist in dat.

lat.dat

Latitude variable in dat. Required if zoneid does not exist in dat.

cat

Variable or list in spat that identifies the individual areas or zones. If spat is class sf, cat should be name of list containing information on zones. Required if zoneid does not exist in dat.

lon.spat

Variable or list from spat containing longitude data. Required if zoneid does not exist in dat and spat is a csv file. Leave as NULL if spat is a shape or json file.

lat.spat

Variable or list from spat containing latitude data. Required if zoneid does not exist in dat and spat is a csv file. Leave as NULL if spat is a shape or json file.

Details

Prints tables displaying the number of observations by year, month, and zone. assignment_column is called to assign observations to zones if zoneid does not exist in dat. Output is not saved.

Examples

## Not run: 
temp_obs_table(pollockMainDataTable, spat = map2, x = "DATE_FISHING_BEGAN",
  lon.dat = "LonLat_START_LON", lat.dat = "LonLat_START_LAT", cat = "NMFS_AREA",
  lon.spat = "", lat.spat = ""
  )

## End(Not run)

Plot variable by month/year

Description

Returns three plots showing the variable of interest against time (as month or month/year). Plots are raw points by date, number of observations by date, and measures of a representative observation by date.

Usage

temp_plot(
  dat,
  project,
  var.select,
  len.fun = "length",
  agg.fun = "mean",
  date.var = NULL,
  alpha = 0.5,
  pages = "single",
  text.size = 8
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

var.select

Variable in dat to plot against a date variable.

len.fun

Method, "length" returns the number of observations, "unique" returns the number of unique observations, "percent" returns the percentage of total observations.

agg.fun

Method to aggregate var.select by date. Choices are "mean", "median", "min", "max", or "sum".

date.var

Date variable in dat. Defaults to first date variable in dat set if not defined.

alpha

The opaqueness of each data point in scatterplot. 0 is total transparency and 1 is total opaqueness. Defaults to .5.

pages

Whether to output plots on a single page ("single", the default) or multiple pages ("multi").

text.size

Text size of x-axes.

Details

Returns three plots showing the variable of interest against time (as month or month/year). Plots are raw points by time, number of observations by time, and aggregated variable of interest by time.

Value

Returns plot to R console and saves output to the Output folder.

Examples

## Not run: 
temp_plot(pollockMainDataTable, project='pollock', 
          var.select = 'OFFICIAL_TOTAL_CATCH_MT', len.fun = 'percent', 
          agg.fun = 'mean', date.var = 'HAUL_DATE')
          
temp_plot(pollockMainDataTable, project='pollock', 
          var.select = 'OFFICIAL_TOTAL_CATCH_MT', len.fun = 'length',
          agg.fun = 'max')

## End(Not run)

Transform units of date variables

Description

Creates a new temporal variable by extracting temporal unit, such as year, month, or day from a date variable.

Usage

temporal_mod(
  dat,
  project,
  x,
  define.format = NULL,
  timezone = NULL,
  name = NULL,
  log_fun = TRUE,
  ...
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

Project name.

x

Time variable to modify from dat.

define.format

Format of temporal data. define.format should be NULL if converting timezone for x but not changing format. Format can be user-defined or from pre-defined choices. Format follows as.Date format. See Details for more information.

timezone

String, defaults to NULL. Returns the date-time in the specified time zone. Must be a recognizable timezone, such as "UTC", "America/New_York", "Europe/Amsterdam".

name

String, name of created variables. Defaults to 'TempMod'.

log_fun

Logical, whether to log function call (for internal use).

...

Additional arguments. Use tz='' to specify time zone.

Details

Converts a date variable to desired timezone or units using as.Date. date_parser is also called to ensure the date variable is in an acceptable format for as.Date. define.format defines the format that the variable should take on. Examples include "%Y%m%d", "%Y-%m-%d %H:%M:%S". Users can define their own format or use one of the predefined ones. Hours is 0-23. To return a list of time-zone name in the Olson/IANA database paste OlsonNames() to the console.

Predefined formats:

  • year: Takes on the format "%Y" and returns the year.

  • month: Takes on the format "%Y/%m" and returns the year and month.

  • day: Takes on the format "%Y/%m/%d" and returns the year, month, and day.

  • hour: Takes on the format "%Y/%m/%d %H" and returns the year, month, day and hour.

  • minute: Takes on the format "%Y/%m/%d %H:%M" and returns the year, month, day, hour, and minute.

For more information on formats, see https://www.stat.berkeley.edu/~s133/dates.html.

Value

Primary data set with new variable added.

Examples

## Not run: 
pcodMainDataTable <- temporal_mod(pcodMainDataTable, "pcod", 
   "DATE_LANDED", define.format = "%Y%m%d")
pcodMainDataTable <- temporal_mod(pcodMainDataTable, "pcod", 
   "DATE_LANDED", define.format = "year")

## End(Not run)


# Change to Year, month, day, minutes

Northeast Ten Minute Squares

Description

Northeast Ten Minute Squares

Usage

tenMNSQR

Format

'tenMNSQR' A simple feature COLLECTION with 5267 features and 9 fields:

AREA
PERIMETER
TEN_
TEN_ID
LL
LAT
LON
TEMP
LOC

Trip duration table and plot

Description

Display trip duration and value per unit effort

Usage

trip_dur_out(
  dat,
  project,
  start,
  end,
  units = "days",
  vpue = NULL,
  group = NULL,
  combine = TRUE,
  haul_count = TRUE,
  sub_date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  type = "hist",
  bins = 30,
  density = TRUE,
  scale = "fixed",
  tran = "identity",
  format_lab = "decimal",
  pages = "single",
  remove_neg = FALSE,
  output = "tab_plot",
  tripID = NULL,
  fun.time = NULL,
  fun.numeric = NULL
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database should contain the string 'MainDataTable'.

project

String, name of project.

start

Date variable containing the start of vessel trip.

end

Date variable containing the end of vessel trip.

units

Time unit, defaults to "days". Options include "secs", "mins", "hours", "days", or "weeks".

vpue

Optional, numeric variable in dat for calculating value per unit effort (VPUE).

group

Optional, string names of variables to group by. By default, grouping variables are combined unless combine = FALSE and type = "freq_poly" (frequency polygon). combine = TRUE will not work when type = "hist" (histogram). Frequency polygon plots can use up to two grouping variables if combine = FALSE: the first variable is assigned to the "color" aesthetic and second to the "linetype" aesthetic.

combine

Logical, whether to combine the variables listed in group for plot.

haul_count

Logical, whether to include hauls per trip in table and/or plot (this can only be used if collapsing data to trip level using tripID. If data is already at trip level, add your haul frequency variable to vpue).

sub_date

Date variable used for subsetting, grouping, or splitting by date.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by.

facet_by

Variable name to facet by. Facetting by "year", "month", or "week" provided a date variable is added to sub_date.

type

The type of plot. Options include histogram ("hist", the default) and frequency polygon ("freq_poly").

bins

The number of bins used in histogram/freqency polygon.

density

Logical, whether densities or frequencies are used for histogram. Defaults to TRUE.

scale

Scale argument passed to facet_grid. Defaults to "fixed". Other options include "free_y", "free_x", and "free_xy".

tran

Transformation to be applied to the x-axis. A few options include "log", "log10", and "sqrt". See scale_continuous for a complete list.

format_lab

Formatting option for x-axis labels. Options include "decimal" or "scientific".

pages

Whether to output plots on a single page ("single", the default) or multiple pages ("multi").

remove_neg

Logical, whether to remove negative trip durations from the plot and table.

output

Options include 'table', 'plot', or 'tab_plot' (both table and plot, the default).

tripID

Column(s) that identify the individual trip.

fun.time

How to collapse temporal data. For example, min, mean, max. Cannot be sum for temporal variables.

fun.numeric

How to collapse numeric or temporal data. For example, min, mean, max, sum. Defaults to mean.

Value

trip_dur_out() calculates vessel trip duration given a start and end date, converts trip duration to the desired unit of time (e.g. weeks, days, or hours), and returns a table and/or plot. There is an option for calculating vpue (value per unit of effort) as well. The data can be filtered by date and/or by a variable. filter_date specifies the type of date filter to apply–by date-range or by period. date_value should contain the values to filter the data by. To filter by a variable, enter its name as a string in filter_by and include the values to filter by in filter_value. If multiple grouping variables are given then they are combined into one variable unless combine = FALSE and type = "freq_poly". No more than three grouping variables is recommended if pages = "single". Any variable in the dataset can be used for faceting, but "year", "month", and "week" are also available. Distribution plots can be combined on a single page or printed individually with pages.

See Also

haul_to_trip

Examples

## Not run: 
trip_dur_out(pollockMainDataTable,
  start = "FISHING_START_DATE", end = "HAUL_DATE",
  units = "days", vpue = "OFFICIAL_TOTAL_CATCH", output = "plot",
  tripID = c("PERMIT", "TRIP_SEQ"), fun.numeric = sum, fun.time = min
)


## End(Not run)

Check rows are unique

Description

Check for and remove non-unique rows from primary dataset.

Usage

unique_filter(dat, project, remove = FALSE)

Arguments

dat

Primary data containing information on hauls or trips. Table in the FishSET database contains the string 'MainDataTable'.

project

String, name of project.

remove

Logical, if TRUE removes non-unique rows. Defaults to FALSE.

Details

Output is determined by remove. If remove = TRUE then non-unique rows are removed. If remove = FALSE then only a statement is returned regarding the number of rows that are not unique.

Value

Returns the modified primary dataset with non-unique rows removed if remove = TRUE.

Examples

## Not run: 
# check for unique rows
unique_filter(pollockMainDataTable)

# remove non-unique rows from dataset
mod.dat <- unique_filter(pollockMainDataTable, remove = TRUE)

## End(Not run)

Summarize active vessels

Description

vessel_count counts the number of active vessels in the main table. It can summarize by period if date is provided, group by any number of grouping variables, and filter by period or value. There are several options for customizing table/plot output.

Usage

vessel_count(
  dat,
  project,
  v_id,
  date = NULL,
  period = NULL,
  group = NULL,
  sub_date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  combine = FALSE,
  position = "stack",
  tran = "identity",
  format_lab = "decimal",
  value = "count",
  type = "bar",
  scale = "fixed",
  output = "tab_plot"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

v_id

Variable in dat containing vessel identifier to count.

date

Date variable to aggregate by.

period

Time period to aggregate by. Options include "year", "month", "week" (weeks in the year), "weekday", "weekday_abv", "day_of_month", "day_of_year", and "cal_date" (calender date).

group

Names of grouping variables. For line plots (type = "line") two grouping variables can be entered, the first is passed to "color" and second to "linetype". Only one grouping variable can be used for barplots (type = "bar"), which is passed to "fill". When combine = TRUE all variables in group will be joined. Grouping by "year", "month", and "week" are available if a date variable is added to sub_date.

sub_date

Date variable used for subsetting, grouping, or splitting by date.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by using the variable in filter_by.

facet_by

Variable name to facet by. Accepts up to two variables. These can be variables that exist in dat, or a variable created by vessel_count() such as "year", "month", or "week" if a date variable is added to sub_date. The first variable is facetted by row and the second by column.

combine

Whether to combine variables listed in group. This is passed to the "fill" or "color" aesthetic for plots.

position

Positioning of bar plot. Options include 'stack', 'dodge', and 'fill'.

tran

A function to transform the y-axis. Options include log, log2, log10, and sqrt.

format_lab

decimal or scientific

value

Whether to return "count" or "percent" of active vessels. Defaults to "count".

type

Plot type, options include "bar" (the default) and "line".

scale

Scale argument passed to facet_grid. Options include "free", "free_x", "free_y". Defaults to "fixed".

output

Whether to display "plot", "table". Defaults to both ("tab_plot").

Details

vessel_count gives the number (or percent) of active vessels using a column of unique vessel IDs. The data can be filtered by date and/or by a variable. (console users may want to use a separate filtering function, like dplyr::filter, before running vessel_count: note that this is okay but will lead to different output if using log_rerun). filter_date specifies the type of date filter to apply–by date-range or by period. date_value should contain the values to filter the data by. To filter by a variable, enter its name as a string in filter_by and include the values to filter by in filter_value.

Up to two grouping variables can be entered. Grouping variables can be merged into one variable using combine = TRUE; in this case any number of variables can be joined, but no more than three is recommended.

For faceting, any variable (including ones listed in group) can be used, but "year", "month", "week" are also available provided a date variable is added to sub_date. Currently, combined variables cannot be faceted.

Value

When output = "tab_plot" a list containing a table and plot are returned. If output = "table" only the summary table is returned, if output = "plot" only the plot.

Examples

## Not run: 
# grouping by two variables
vessel_count(pollockMainDataTable, v_id = "VESSEL_ID", 
             group = c("GEAR_TYPE", "IFQ"))
             
# filter by variable
vessel_count(pollockMainDataTable, v_id = "VESSEL_ID", group = "GEAR_TYPE",
             filter_by = "IFQ", filter_value = "Y")
             
# filter by month
vessel_count(pollockMainDataTable, v_id = "VESSEL_ID", group = "GEAR_TYPE",
             sub_date = "HAUL_DATE", date_filter = "month", date_value = 1:5)
             
#' # filter by date
vessel_count(pollockMainDataTable, v_id = "VESSEL_ID", group = "GEAR_TYPE",
             sub_date = "HAUL_DATE", date_filter = "date_range", 
             date_value = c("2011-01-01", "2011-02-05"))

# summarize by month
vessel_count(pollockMainDataTable, v_id = 'VESSEL_ID', date = 'DATE_FISHING_BEGAN', 
             period = 'month', group = 'DISEMBARKED_PORT', position = 'dodge', 
             output = 'plot')

## End(Not run)

View the most recent fleet table by project

Description

View the most recent fleet table by project

Usage

view_fleet_table(project)

Arguments

project

The name of project.

Examples

## Not run: 
view_fleet_table("pollock")

## End(Not run)

Visualize gridded data on a map

Description

Visualize gridded data on a map

Usage

view_grid_dat(
  grid,
  project,
  lon,
  lat,
  value,
  split_by = NULL,
  group = NULL,
  agg_fun = "mean"
)

Arguments

grid

Gridded data table to visualize. Use string if visualizing a gridded data table in the FishSET Database.

project

String, project name.

lon

String, variable name containing longitude.

lat

String, variable name containing latitude.

value

String, variable name containing gridded values, e.g. sea surface temperature, wind speed, etc.

split_by

String, variable in gridded data table to split by.

group

String, variable in gridded data table to group value by. In addition to the variable(s) in group, value is also aggregated by each longitude-latitude pair. The string "lonlat" is a shortcut for group = c("lon", "lat") which aggregates the value for each longitude-latitude pair across the entire dataset.

agg_fun

Aggregating function applied to group. Defaults to mean.


View

Description

View

Usage

view_lon_lat(dat, lon, lat, id = NULL, crs = 4326)

Arguments

dat

Data containing lon and lat columns.

lon

Name of Longitude column.

lat

Name of Lattitude column.

id

Optional, name of an ID variable that is paired with lon and lat columns

crs

Optional, coordinate reference system to use. Defaults to EPSG code 4326 (WGS 84).


View model design file in database

Description

View model design file in database

Usage

view_model_design(project, date = NULL)

Arguments

project

Project name.

date

String, date model design file was created.


View interactive map of spatial data

Description

View interactive map of spatial data

Usage

view_spat(spat, id = NULL, type = "polygon")

Arguments

spat

Spatial dataset to view. Must be an object of class sf or sfc.

id

Optional, name of spatial ID column to view with spatial data.

type

Can be "polygon", "line", or "point".

Examples

## Not run: 
view_spat(pollockNMFSSpatTable, id = "NMFS_AREA")

## End(Not run)

Summarize weekly catch

Description

weekly_catch summarizes catch (or other numeric variables) in the main table by week. It can summarize by grouping variables and filter by period or value. There are several options for customizing the table and plot output.

Usage

weekly_catch(
  dat,
  project,
  species,
  date,
  fun = "sum",
  group = NULL,
  sub_date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  type = "bar",
  conv = "none",
  tran = "identity",
  format_lab = "decimal",
  value = "count",
  position = "stack",
  combine = FALSE,
  scale = "fixed",
  output = "tab_plot",
  format_tab = "wide"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

species

A variable in dat containing the species catch or a vector of species variables.

date

Variable in dat containing dates to aggregate by.

fun

Name of function to aggregate by. Defaults to sum.

group

Grouping variable names(s). Up to two grouping variables are available for line plots and one for bar plots. For bar plots, if only one species is entered the first group variable is passed to "fill". If multiple species are entered, species is passed to "fill" and the grouping variable is dropped. An exception occurs when faceting by species, then the grouping variable is passed to "fill". For line plots, the first grouping variable is passed to "fill" and the second to "linetype" if a single species column is entered or if faceting by species. Otherwise, species is passed to "fill", the first group variable to "linetype", and second is dropped.

sub_date

Date variable used for subsetting, grouping, or splitting by date.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by using the variable in filter_by.

facet_by

Variable name to facet by. Accepts up to two variables. These can be variables that exist in the dataset, or a variable created by species_catch() such as "year", "month", or "week" if a date variable is added to sub_date. Facetting by "species" is available if multiple catch columns are included in "species". The first variable is facetted by row and the second by column.

type

Plot type, options include "bar" (the default) and "line".

conv

Convert catch variable to "tons", "metric_tons", or by using a function entered as a string. Defaults to "none" for no conversion.

tran

A function to transform the y-axis. Options include log, log2, log10, sqrt.

format_lab

Formatting option for y-axis labels. Options include "decimal" or "scientific".

value

Whether to calculate raw "count" or "percent" of total catch.

position

Positioning of bar plot. Options include 'stack', 'dodge', and 'fill'.

combine

Whether to combine variables listed in group. This is passed to the "fill" or "color" aesthetic for plots.

scale

Scale argument passed to facet_grid. Defaults to "fixed".

output

Return output as "plot", "table", or both "tab_plot". Defaults to both ("tab_plot").

format_tab

How table output should be formatted. Options include 'wide' (the default) and 'long'.

Value

weekly_catch() aggregates catch by week using one or more columns of catch data. When multiple catch variables are entered, a new column "species" is created and used to group values in plots. The "species" column can also be used to split (or facet) the plot. For table output, the "species" column will be kept if format_tab = "long", i.e. a column of species names ("species") and a column containing catch ("catch"). When format_tab = "wide", each species is given its own column of catch. The data can be filtered by date and/or by a variable. filter_date specifies the type of date filter to apply–by date-range or by period. date_value should contain the values to filter the data by. To filter by a variable, enter its name as a string in filter_by and include the values to filter by in filter_value. Up to two grouping variables can be entered. Grouping variables can be merged into one variable using combine; in this case any number of variables can be joined, but no more than three is recommended. For faceting, any variable (including ones listed in group) can be used, but "year", "month", "week" are also available provided a date variable is added to sub_date. Currently, combined variables cannot be faceted. A list containing a table and plot are printed to the console and viewer by default.

Examples

## Not run: 
weekly_catch(pollockMainDataTable,
  species = c(
    "HAUL_LBS_270_POLLOCK_LBS",
    "HAUL_LBS_110_PACIFIC_COD_LBS",  "HAUL_LBS_OTHER_LBS"
  ), date = "DATE_FISHING_BEGAN",
  conv = "tons", year = 2011, output = "plot"
)

## End(Not run)

Summarize average CPUE by week

Description

weekly_effort summarizes CPUE (or other numeric variables) in the main table by week. It can summarize by grouping variables and filter by period or value. There are several options for customizing the table and plot output.

Usage

weekly_effort(
  dat,
  project,
  cpue,
  date,
  group = NULL,
  sub_date = NULL,
  filter_date = NULL,
  date_value = NULL,
  filter_by = NULL,
  filter_value = NULL,
  filter_expr = NULL,
  facet_by = NULL,
  conv = "none",
  tran = "identity",
  format_lab = "decimal",
  combine = FALSE,
  scale = "fixed",
  output = "tab_plot",
  format_tab = "wide"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

cpue

Variable(s) in dat containing catch per unit effort.

date

A variable in dat containing dates to aggregate by.

group

Grouping variable name(s). Up to two grouping variables are available. For plotting, if a single CPUE column is entered the first grouping variable is passed to the "color" aesthetic and the second to "linetype". If multiple CPUE columns are entered, a new variable named "species" is created and passed to "fill", the first group variable to "linetype", and second is dropped.

sub_date

Date variable used for subsetting, grouping, or splitting by date.

filter_date

The type of filter to apply to 'MainDataTable'. To filter by a range of dates, use filter_date = "date_range". To filter by a given period, use "year-day", "year-week", "year-month", "year", "month", "week", or "day". The argument date_value must be provided.

date_value

This argument is paired with filter_date. To filter by date range, set filter_date = "date_range" and enter a start- and end-date into date_value as a string: date_value = c("2011-01-01", "2011-03-15").

To filter by period (e.g. "year", "year-month"), use integers (4 digits if year, 1-2 digits if referencing a day, month, or week). Use a vector if filtering by a single period: date_filter = "month" and date_value = c(1, 3, 5). This would filter the data to January, March, and May.

Use a list if using a year-period type filter, e.g. "year-week", with the format: list(year, period). For example, filter_date = "year-month" and date_value = list(2011:2013, 5:7) will filter the data table from May through July for years 2011-2013.

filter_by

String, variable name to filter 'MainDataTable' by. the argument filter_value must be provided.

filter_value

A vector of values to filter 'MainDataTable' by using the variable in filter_by. For example, if filter_by = "GEAR_TYPE", filter_value = 1 will include only observations with a gear type of 1.

filter_expr

String, a valid R expression to filter 'MainDataTable' by using the variable in filter_by.

facet_by

Variable name to facet by. Accepts up to two variables. Facetting by "year" is available if a date variable is added to sub_date. Facetting by "species" is available if multiple cpue columns are included in "cpue". The first variable is facetted by row and the second by column.

conv

Convert catch variable to "tons", "metric_tons", or by using a function entered as a string. Defaults to "none" for no conversion.

tran

A function to transform the y-axis. Options include log, log2, log10, sqrt.

format_lab

Formatting option for y-axis labels. Options include "decimal" or "scientific".

combine

Whether to combine variables listed in group. This is passed to the "color" aesthetic for plots.

scale

Scale argument passed to facet_grid. Defaults to "fixed".

output

Whether to display "plot", "table". Defaults to both ("tab_plot").

format_tab

How table output should be formatted. Options include 'wide' (the default) and 'long'.

Value

weekly_effort() calculates mean CPUE by week. This function doesn't calculate CPUE; the CPUE variable must be created in advance (see cpue). When multiple CPUE variables are entered, a new column named "species" is created and used to group values in plots. The "species" column can also be used to split (or facet) the plot. For table output, the "species" column will be kept if format_tab = "long", i.e. a column of species names ("species") and a column containing the mean CPUE ("mean_cpue"). When format_tab = "wide", each CPUE variable is given its own value column. The data can be filtered by date and/or by a variable. filter_date specifies the type of date filter to apply–by date-range or by period. date_value should contain the values to filter the data by. To filter by a variable, enter its name as a string in filter_by and include the values to filter by in filter_value. Up to two grouping variables can be entered. Grouping variables can be merged into one variable using combine; in this case any number of variables can be joined, but no more than three is recommended. For faceting, any variable (including ones listed in group) can be used, but "year" and "species" are also available. Facetting by "year" requires a date variable be added to sub_date. Currently, combined variables cannot be faceted. A list containing a table and plot are printed to the console and viewer by default.

Examples

## Not run: 
weekly_effort(pollockMainDataTable, "CPUE", "DATE_FISHING_BEGAN", filter_date = "year", 
              date_value = 2011, output = "table")

## End(Not run)

Welfare plots and tables

Description

Generate plots and tables for welfare simulations

Usage

welfare_outputs(
  project,
  mod.name,
  closures,
  betadraws = 1000,
  zone.dat = NULL,
  group_var = NULL
)

Arguments

project

Name of project

mod.name

Model name. Argument can be the name of the model or the name can be pulled the 'modelChosen' table. Leave mod.name empty to use the name of the saved 'best' model. If more than one model is saved, mod.name should be the numeric indicator of which model to use. Use table_view("modelChosen", project) to view a table of saved models.

closures

Closure scenarios

betadraws

Integer indicating the numer of times to run the welfare simulation. Default value is betadraws = 1000

zone.dat

Variable in primary data table that contains unique zone ID.

group_var

Categorical variable from primary data table to group welfare outputs.

Details

Returns a list with (1) plot showing welfare loss/gain for all scenarios in dollars, (2) plot showing welfare loss/gain as percentage, (3) dataframe with welfare summary stats in dollars, (4) dataframe with welfare summary stats as percentages, and (5) dataframe with welfare details such as number of trips, mean loss per trip, and mean of the total welfare loss across all trips.


Welfare analysis

Description

Simulate the welfare loss/gain from changes in policy or changes in other factors that influence fisher location choice.

Usage

welfare_predict(
  project,
  mod.name,
  closures,
  betadraws = 1000,
  marg_util_income = NULL,
  income_cost = NULL,
  expected.catch = NULL,
  enteredPrice = NULL
)

Arguments

project

Name of project

mod.name

Name of selected model (mchoice)

closures

Closure scenarios

betadraws

Integer indicating the numer of times to run the welfare simulation. Default value is betadraws = 1000

marg_util_income

For conditional and zonal logit models. Name of the coefficient to use as marginal utility of income

income_cost

For conditional and zonal logit models. Logical indicating whether the coefficient for the marginal utility of income relates to cost (TRUE) or revenue (FALSE)

expected.catch

Name of expectedchatch table to use

enteredPrice

Price for welfare

Details

To simulate welfare loss/gain, the model coefficients are sampled 1000 times using a multivariate random number generator (mvgrnd) and the welfare loss/gain for each observation is calculated (see section 9.3 in the user manual) for each of the sampled coefficients, and all of the estimated welfare values are saved to a file in the project outputs folder.

Note that this function is called by run_policy.


Northeast wind closure areas

Description

Northeast wind closure areas

Usage

windLease

Format

'windLease' Simple features collection with 32 features and 1 field:

NAME

Name of wind lease.


Write a data table to local file directory

Description

Write a data table to local file directory

Usage

write_dat(dat, project, path = NULL, file_type = "csv", ...)

Arguments

dat

Name of data frame in working environment to save to file.

project

String, project name.

path

String, path or connection to write to. If left empty, the file will be written to the dat folder in the project directory.

file_type

String, the type of file to write to. Options include "csv", "txt" (tab-separated text file), "xlsx" (excel), "rdata", "json", "stata", "spss", "sas", and "matlab".

...

Additional arguments passed to writing function. See "details" for the list of functions.

Details

Leave path = NULL to save dat to the data folder in the project directory See write.table for csv and tab-separated files, save for R data files, write.xlsx, read_json for json files, st_write for geojson files, read_dta for Stata files, read_spss for SPSS files, read_sas for SAS files, and writeMat for Matlab files, and st_write for shape files.

Examples

## Not run: 
# Save to the default data folder in project directory
write_dat(pollockMainDataTable, type = "csv", "pollock")

# Save to defined directory location
write_dat(pollockMainDataTable, path = "C://data/pollock_dataset.csv", 
          type = "csv", "pollock")
          
# Save shape file
write_dat(ST6, path = "C://data//ST6.shp", type = "shp", project = 'Pollock')

## End(Not run)

Plot relationship of two variables

Description

Evaluate relationship of two variables in a plot format. Plots first variable against second variable.

Plot of var1 against var 2

Usage

xy_plot(dat, project, var1, var2, regress = FALSE, alpha = 0.5)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

project

String, name of project.

var1

First variable in dat.

var2

Second variable in dat.

regress

Logical, if TRUE, returns plot with fitted linear regression line. Defaults to FALSE.

alpha

The opaqueness of each data point in scatterplot. 0 is total transparency and 1 is total opaqueness. Defaults to .5.

Value

Returns plot output to R console and saves plot to Output folder.

Examples

## Not run: 
xy_plot(pollockMainDataTable, var1 = 'OFFICIAL_TOTAL_CATCH_MT',
        var2 = 'HAUL', regress = TRUE)

## End(Not run)

Define zone closure scenarios

Description

Define zone closure scenarios

Usage

zone_closure(
  project,
  spatdat,
  cat,
  lon.spat = NULL,
  lat.spat = NULL,
  epsg = NULL
)

Arguments

project

Required, name of project.

spatdat

Required, data file or character. spatdat is a spatial data file containing information on fishery management or regulatory zones boundaries. Shape, json, geojson, and csv formats are supported. geojson is the preferred format. json files must be converted into geoson. This is done automatically when the file is loaded with read_dat with is.map set to true. spatdat cannot, at this time, be loaded from the FishSET database.

cat

Variable in spatdat that identifies the individual areas or zones.

lon.spat

Required for csv files. Variable or list from spatdat containing longitude data. Leave as NULL if spatdat is a shape or json file.

lat.spat

Required for csv files. Variable or list from spatdat containing latitude data. Leave as NULL if spatdat is a shape or json file.

epsg

EPSG number. Set the epsg to ensure that spatdat have the correct projections. If epsg is not specified but is defined for spatdat. See http://spatialreference.org/ to help identify optimal epsg number.

Details

Define zone closure scenarios. Function opens an interactive map. Define zone closures by clicking on one or more zones and clicking the 'Close zones' button. To define another closure scenario, unclick zones and then click the desired zones. Press the 'Save closures' button to save choices. The saved choices are called in the policy scenario function.

Value

Returns a yaml file to the project output folder.


Summarize zones, closure areas

Description

'zone_summary' counts observations and aggregates values in 'dat' by regulatory zone or closure area.

Usage

zone_summary(
  dat,
  spat,
  project,
  zone.dat,
  zone.spat,
  count = TRUE,
  var = NULL,
  group = NULL,
  fun = NULL,
  breaks = NULL,
  n.breaks = 10,
  bin_colors = NULL,
  na.rm = TRUE,
  dat.center = TRUE,
  output = "plot"
)

Arguments

dat

Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.

spat

A spatial data file containing information on fishery management or regulatory zones boundaries. 'sf' objects are recommended, but 'sp' objects can be used as well. See [dat_to_sf()] to convert a spatial table read from a csv file to an 'sf' object. To upload your spatial data to the FishSETFolder see [load_spatial()].

project

Name of project.

zone.dat

Name of zone ID column in 'dat'.

zone.spat

Name of zone ID column in 'spat'.

count

Logical. if 'TRUE', then the number observations per zone will be returned. Can be paired with 'fun = "percent"' and 'group'. 'zone_summary' will return an error if 'var' is include and 'count = TRUE'.

var

Optional, name of numeric variable to aggregate by zone/closure area.

group

Name of grouping variable to aggregate by zone/closure area. Only one variable is allowed.

fun

Function name (string) to aggregate by. '"percent"' the percentage of observations in a given zone. Other options include "sum", "mean", "median", "min", and "max".

breaks

A numeric vector of breaks to bin zone frequencies by. Overrides 'n.breaks' if entered.

n.breaks

The number of break points to create if breaks are not given directly. Defaults to 10.

bin_colors

Optional, a vector of colors to use in plot. Must be same length as breaks. Defaults to 'fishset_viridis(10)'.

na.rm

Logical, whether to remove zones with zero counts.

dat.center

Logical, whether the plot should center on 'dat' ('TRUE') or 'spat' ('FALSE'). Recommend 'dat.center = TRUE' when aggregating by regulatory zone and 'dat.center = FALSE' when aggregating by closure area.

output

Output a '"plot"', '"table"', or both ('"tab_plot"'). Defaults to '"plot"'.

Details

Observations in 'dat' must be assigned to regulatory zones to use this function. See [assignment_column()] for details. 'zone_summary' can return: the number of observations per zone ('count = TRUE', 'fun = NULL', 'group = NULL'), the percentage of observations by zone ('count = TRUE', 'fun = "percent"', 'group = NULL'), the percentage of observations by zone and group ('count = TRUE', 'fun = "percent"', 'group = "group"'), summary of a numeric variable by zone ('count = FALSE', 'var = "var"', 'fun = "sum"', 'group = NULL'), summary of a numeric variable by zone and group ('count = FALSE', 'var = "var"', 'fun = "sum"', 'group = "group"'), share (percentage) of a numeric variable by zone ('count = FALSE', 'var = "var"', 'fun = "percent"', 'group = NULL'), share (percentage) of a numeric variable by zone and group ('count = FALSE', 'var = "var"', 'fun = "percent"', 'group = "group"').

Examples

## Not run: 

# count # of obs
zone_summary(pollockMainTable, spat = nmfs_area, zone.dat = "ZoneID", 
            zone.spat = "NMFS_AREA")
            
# percent of obs
zone_summary(pollockMainTable, spat = nmfs_area, zone.dat = "ZoneID", 
            zone.spat = "NMFS_AREA", count = TRUE, fun = "percent")

# count by group
zone_summary(pollockMainTable, spat = nmfs_area, zone.dat = "ZoneID", 
            zone.spat = "NMFS_AREA", group = "GEAR_TYPE")   

# total catch by zone           
zone_summary(pollockMainTable, spat = nmfs_area, zone.dat = "ZoneID", 
            zone.spat = "NMFS_AREA", var = "OFFICIAL_TOTAL_CATCH_MT",
            count = FALSE, fun = "sum")  

# percent of catch by zone           
zone_summary(pollockMainTable, spat = nmfs_area, zone.dat = "ZoneID", 
            zone.spat = "NMFS_AREA", var = "OFFICIAL_TOTAL_CATCH_MT",
            count = FALSE, fun = "percent")         
            

## End(Not run)