--- title: "Examples of using data and functions" author: "Eric J. Ward" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Examples of using data and functions} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE, comment = "#>" ) library(kwdemog) library(dplyr) data(orca) year.end = as.numeric(substr(Sys.Date(),1,4)) refit_models = TRUE report_start = year.end-5 # format whale names just for consistency format_names = function(x) { for(i in 1:2) { indx = which(substr(x,2,2) == "0") x[indx] = paste0(substr(x[indx],1,1), substr(x[indx],3,nchar(x[indx]))) } return(x) } orca$animal = format_names(orca$animal) orca$pod = format_names(orca$pod) orca$matriline = format_names(orca$matriline) orca$mom = format_names(orca$mom) orca$dad = format_names(orca$dad) report_dir = "projections/" ``` The SRKW orca data is included as a dataframe. Each row is an individual, with birth and death information. ```{r} data(orca) head(orca) ``` But for modeling, it's more useful to work with an expanded version, where every animal-year combination gets its own row. The expand() function does this step, ```{r} data(ages2stages) whaleData = kwdemog::expand(orca, ages2stages =ages2stages) ``` Then we can do all kinds of filtering for various analyses -- e.g. ```{r} whaleData = whaleData[whaleData$birth > 1970,] # A handful of unknown sexes need to be randomly filled in whaleData$sexF1M2[which(whaleData$sexF1M2==0)] = sample(c(1,2), size=length(which(whaleData$sexF1M2==0)), replace=T) whales_since76 = as.character(orca$animal[orca$birth > 1970]) sub = whaleData[which(whaleData$animal%in%whales_since76 & whaleData$population=="SRKW" & !is.na(whaleData$gave_birth) & whaleData$sexF1M2=="1" & whaleData$age>=10 & whaleData$age< 43), ] sub$animal = as.factor(sub$animal) # filter out females alive in current year currently_alive = dplyr::filter(sub, year == year.end, is.na(death)) knitr::kable(currently_alive) # find last birth for each of these last_birth = group_by(dplyr::filter(sub, gave_birth==1), animal) %>% dplyr::summarise(last_birth = max(year[which(gave_birth==1)])) last_birth$last_birth[!is.finite(last_birth$last_birth)] = NA currently_alive = dplyr::left_join(currently_alive, last_birth) %>% dplyr::mutate(unlikely_future_mom="") %>% dplyr::select(animal,age,last_birth,unlikely_future_mom) %>% data.frame() ``` ### Notes on demographic modeling As a cautionary note, there may be interest in using these data to test correlations with prey (salmon) indices or other environmental drivers. There are several very nuanced and important things to note about different subsets of these data. First, ages on animals in the beginning of the surveys were often 'guessed', and in the cases of females, assigned an age of 40 based on the last observed birth. This procedure will generate bias (many of the animals being < 40 during their last birth), so these animals have been excluded from fecundity-at-age analyses, or only used in stage-based survival estimates. The `includeSurv` and `includeFec` columns are used to designate most of these cases (1 = include, 0 = exclude). Second, a handful of deaths are either tied to human disturbance (direct or indirectly associated with ship strikes) or are not independent. Calves that died at the same time as their moms are not independent (J24, J54, K39, L97, L104). Other animals are associated with vessel strikes (L98), satellite tag infections (L95) or unknown trauma (L112). For the latter three animals, it is possible to include the years they survive as data, and replace the death data with an NA. For example: ```{r} whaleData[which(whaleData$animal %in% c("L095","L098","L112") & whaleData$alive==0),] = NA ```