Methods to analyze fish rescue data

Instructions on how to load, analyze, and interpret annual fish rescue data in order to write annual reports submitted to the U. S. Bureau of Reclamation.

Author

Thomas Archdeacon

Published

December 2, 2024


Purpose

The purpose of this document is to assist with writing annual reports submitted to the U. S. Bureau of Reclamation. This document will give instructions on how to locate and load the appropriate data and R packages needed for summarzing the data and producing the tables and figures for the annual report. The code will need to be updated slightly each year to incorporate new data. Code blocks are given for R that will produce the needed summaries.


Packages

First you will need to load a few packages to make sure you have all the tools you need.

options(repos = c(
  CRAN = "https://cloud.r-project.org"
  ))
install.packages(c("dataRetrieval", "tidyverse", "glmmTMB"))
package 'dataRetrieval' successfully unpacked and MD5 sums checked
package 'tidyverse' successfully unpacked and MD5 sums checked
package 'glmmTMB' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\tarchdeacon\AppData\Local\Temp\1\RtmporizQD\downloaded_packages
library(dataRetrieval)
library(tidyverse)
library(glmmTMB)

Data

Fish Rescue Data

Data from fish rescue will need to be entered into excel spreadsheets. Although there are alternative methods, the easiest way to load into R is to save as a CSV file.

There are multiple files you will need. You need the file of all the fish rescue data, with the current years’ data appended. From that, you can make the 3 tables. After you make the tables, you need to update the other datasheets. There is a datasheet that breaks down the number of each life-stage (or hatchery fish) and includes the amount of drying. You need to update the last line for the current year

Data from the current year as well as previous years are needed to make all the figures and tables in the annual report.

Probably the easiest way is to save this qmd file in the directory with the data and open the file from there. Then using the drop-down “Sessions” menu in RStudio, set to “source file location.” If you get all of the data loaded correctly at the front end, the remaining code should only need small changes.

Caution

Make sure to change the working directory to where your data are stored

setwd("C:/Users/tarchdeacon/OneDrive - DOI/Rio Grande/Salvage/Annual Reports & Data")

#Load in the whole data set
Fish <- read.csv("Fish Rescue 2009_2023.csv")

Streamflow Data

Streamflow data is from the USGS website. You will need to download the data from the appropriate gages. The important gauges are Albuquerque (08330000), Bosque Farms (08331160) and San Acacia (08354900). Make sure to update the ending dates as appropriate for the report.

#This retrieves daily data from Albuquerque
discharge.ABQ <- readNWISdv(siteNumbers = "08330000",
                        parameterCd = "00060",
                        startDate = "2009-01-01",
                        endDate = "2024-12-31" ) #Change this to the right date

"q"->names(discharge.ABQ)[4]

You will also need the monthly average for May for all three gages. If you run this too early, the current year May average might not be available.

#Albuquerque
May.ABQ <- readNWISdata(siteNumbers = "08330000",
                        parameterCd = "00060",
                    
                        service = "stat",
                        statReportType = "monthly") %>% filter(month_nu==5)
"q"->names(May.ABQ)[4]

#Bosque Farms in Isleta
May.IS <- readNWISdata(siteNumbers = "08331160",
                        parameterCd = "00060",
                    
                        service = "stat",
                        statReportType = "monthly") %>% filter(month_nu==5)
"q"->names(May.IS)[4]

#San Acacia
May.SA <- readNWISdata(siteNumbers = "08354900",
                        parameterCd = "00060",
                    
                        service = "stat",
                        statReportType = "monthly") %>% filter(month_nu==5)
"q"->names(May.SA)[4]

Tables

There are three tables to make for the annual report. Two are based on the current year’s data, and one requires all the data.

Table 1 sums the adults, YOY, hatchery, and dead fish for each reach from the current year.

#Filter to current year
Fish.current <-subset(Fish, year==2023)

#aggregate by reach
#YOY alive
yoy <- aggregate(yoy.alive ~ reach, data=Fish.current, sum)

#Adult alive
adult <- aggregate(adult.alive ~ reach, data=Fish.current, sum)

#Hatchery alive
hatchery <- aggregate(hatchery.alive ~ reach, data=Fish.current, sum)

#Dead
dead <- aggregate(total.dead ~ reach, data=Fish.current, sum)

#merge all to one frame
table_1 <- merge(yoy,
    merge(adult,
      merge(hatchery,dead)))

#change all to numbers
table_1[,2] <- as.numeric(table_1[,2])
table_1[,3] <- as.numeric(table_1[,3])
table_1[,4] <- as.numeric(table_1[,4])
table_1[,5] <- as.numeric(table_1[,5])
        
#row and column sums
table_1$total <- apply(table_1[,2:4],1,sum)
table_1 <- rbind(table_1, c("Total", colSums(table_1[,2:6])))

print(table_1)
       reach yoy.alive adult.alive hatchery.alive total.dead total
1     Isleta       493           0              0         23   493
2 San Acacia      3037           6             85        260  3128
3      Total      3530           6             85        283  3621

Table 2 is a bit more difficult as you have to count the days and pools.

Caution

You need to calculate the kilometers rescued from the data by hand to account for areas that got skipped over. The annual spreadsheets have this information, you can’t calculate it from here because of the breaks between areas. For the extent of drying, you need to see the River Eyes reports.

#create function to count unique values
len.unique <-function(X){
length(unique(X))
}

#count number of days of rescue by reach
days <- aggregate(date~reach, len.unique, data=Fish.current)
names(days)[2] <-"days"

#count number of pools by reach
pools <- aggregate(date~reach, data=Fish.current, length)
names(pools)[2] <-"pools"

table_2 <- merge(days, pools)

#row and column sums
table_2$total <- apply(table_2[,2:3],1,sum)


print(table_2)
       reach days pools total
1     Isleta    2    61    63
2 San Acacia   11   453   464

Table 3 is just the annual extent of drying and rescued (copy from table 2 for the new year), and sum of pools and minnows.

fish_year <- aggregate(total~year, data=Fish, sum)

names(fish_year)[2] <-"total.minnows"

pools_year <- aggregate(total~year, data=Fish, length)
names(pools_year)[2] <-"total.pools"

merge(fish_year, pools_year)
   year total.minnows total.pools
1  2007         14792        1048
2  2009         24372         522
3  2010         11312        1215
4  2011          8913        1974
5  2012          5014        2864
6  2013          1492        1059
7  2014           629         755
8  2015          1319         396
9  2016         29222         549
10 2017         64948         299
11 2018         93981        1435
12 2019          1127         126
13 2020          4048         772
14 2021           869         935
15 2022          2326         852
16 2023          3904         514
17 2024          2590         444

Figures

There are 3 figures. The first figure is a map and you need to make it in ArcGIS. The remaining figures come from the data you loaded.

The first figure requires loading an annual summary table that must be updated.

You will need to un-comment the jpeg() function and the dev.off() function to actually export the figure.

Figure 2

Caution

You need to update the Adult.marked.yoy.csv file with the current year data. You also need to uncomment (remove the #) the jpeg() and dev.off() to actually export the figure.

# Minnows per km barplot over discharge at ABQ gauge

#read in annual catch data from 2009-present. This table is updated each year using the summary tab on the Fish Rescue database

read.csv("Adult.marked.yoy.csv")-> annual_dat

#standardize each count by number of rkm recued

annual_dat[,2:4] <- annual_dat[,2:4]/annual_dat[,5]

annual_dat <- t(annual_dat)

#colors for the plot
colors=c("white","grey85","grey55")


#~~~create plot~~~#
###*to update annually, add in the new axis references on line 139 (row # from the discharge table for January 1 of that year) and add a new "Jan XXXX" label on line 141*###

#jpeg(filename="yoyadultgrey4.jpg", units="in", quality=100, width=6, height=8, res=600)


{
  par(mar=c(1,5,4,1), mfrow = c(2,1), cex.axis=1.25,cex.lab=1.5,cex.main=1.25)
  
  barplot(as.matrix(annual_dat[2:4,]+1),
          axes=TRUE,
          axisnames=(TRUE),
          col=rev(colors),
          border=T,
          xlab="", 
          ylab="", 
          names.arg=annual_dat[1,],
          cex.names=1, 
          ylim=c(1,5000), 
          log="y",
          las=1
          )
  
  abline(h = mean(annual_dat[2:4,]), col = "red")

  mtext(side=2, line=3, cex=1.5, "RGSM per km", las=0)
  #added average catch line
  
  legend(x=0, y=6500, cex=1.5, 
         fill=(colors), legend=c("YOY", "Hatchery", "Adult"), 
         border="black", ncol=1, bty = "n")
  box(bty="l", lwd=2)
  
  par(mar = c(4,5,1,1), font = 2, cex.axis=1.25,cex.lab=1.5,cex.main=1.25)
  
  
  plot(discharge.ABQ$Date, 
       discharge.ABQ$q/35.315, type = "h",
       xlab = "Year", ylab = "", axes = FALSE)

label_position <- discharge.ABQ$Date[round(seq(1,length(annual_dat[1,]))*365.25)]

label_names <-min(year(discharge.ABQ$Date)):max(year(discharge.ABQ$Date))

axis(1, at=label_position, labels=label_names)
  
  axis(2, las = 1)
  
  mtext(expression(paste("Discharge (", m^3, "/s)")), side = 2, line = 3, 
        cex = 1.5, adj = 0.5, las = 0)
  
  box(bty = "l", lwd = 2)
}  

#dev.off()

Figure 3

#Relation between flow and counts plot

#*calculate May Mean Discharge for YearlyComparison CSV for scatter plot. add in means, which are calculated below, to csv file before importing. Code below calculates it for each reach*###

#subset ABQ discharge table to May 2022 dates and calculate mean
#Represents Angostura Reach
#add month column

#sometimes the monthly statistics don't go far enough

###*change the year to the current year*###
discharge.ABQ <- subset(discharge.ABQ, year(Date) == "2023" & month(Date) == "5" )

mean(discharge.ABQ$q)
[1] 4464.839
#4464.839


#site 08331160
#Rio Grande near Bosque Farms
#represents Isleta Reach
#34.870556,  -106.72
###*update dates to reflect beginning and end of salvage*###
bf.discharge <- readNWISdv(siteNumbers = "08331160",
                           parameterCd = "00060",
                           startDate = "2023-05-01",
                           endDate = "2023-05-31")
"q"->names(bf.discharge)[4]

mean(bf.discharge$q)
[1] 4381.935
#site 08354900
#Rio Grande Floodway at San Acacia
#represents San Acacia Reach
#34.256389, -106.890833
###*update dates to reflect beginning and end of salvage*###
sa.discharge <- readNWISdv(siteNumbers = "08354900",
                           parameterCd = "00060",
                           startDate = "2023-05-01",
                           endDate = "2023-05-31")
"q"->names(sa.discharge)[4]

mean(sa.discharge$q)
[1] 4010.323
#*update YearlyComparison CSV with May Discharge Means, km sampled, yoy rescued, and yoy per km for each reach*
#yearly comparison file is appended annually with new info. it contains info for year, reach sampled, number of kilometers sampled, RGSM YOY found, and RGSM YOY/km. 
#The km reflects the number of total km sampled, so if a km was sampled multiple times, it is counted each time. i.e. if a km was sampled 3 times over the course of a season, that's 3km. 
#this total km sampled per reach was calculated in 2022 by Thomas in the YearlyComparisonKM.csv file in the 2022 report folder. Miles sampled were calculated and then converted to km. This total km for each reach was then copied into the yearly comparison file. 


#read in yearly comparison file. Update with reach information for each year. 
read.csv("YearlyComparison.csv", header = TRUE)->Yearly

#make year and reach a factor 
Yearly$m <- as.factor(Yearly$m)
Yearly$reach <- as.factor(Yearly$reach)


#~~~plot with no reach effect~~~#
#changed Yearly$year to Yearly$m
#uncomment jpeg() and dev.off()

{
  #jpeg("q-rgsm_plot.jpeg", unit="in",width=7, height=7, res=600)
  par(mar=c(6,6,1,1))
  plot((rgsm.km)~(may.mean.q), data=Yearly, pch=pch,  
       axes=F, ylab=NA, xlab=NA, cex=2.5, lwd=2, col=rgb(0,0,0,.25))
  box(bty="l", lwd=4)
  axis(1, cex.axis=1.5)
  mtext(side=1, line=3, "Average May Discharge (cfs)", cex=2)
  axis(2, cex.axis=1.5)
  
  mtext(side=2, line=3, "YOY per kilometer", cex=2)
  #this line adds a subset of that years points in black
  with(subset(Yearly, Yearly$m==2023), points(y=rgsm.km, x=may.mean.q,pch=c(16,17), cex=2.5))
  legend(legend=c("San Acacia", "Isleta", "Angostura"), pch=16:18, "topleft", cex=2, col="gray", bty="n")
  
  #add model line to plot
  glmmTMB(yoy.rgsm~offset(log(km))+(may.mean.q), data=Yearly, family="nbinom2")->mod1
  text(x=4000, y=1000, expression(italic(P)==0.002))
  text(x=4000, y=900, expression(italic(R)^2==0.76))
  #q is equal to the flow and goes from 1-4000
  q=(1:4000)
  #predict the model using the new data of q and where km = 1
  predict(mod1, newdata=list(may.mean.q=q, km=rep(1,4000)), type="response")->pred
  #add line to plot
  lines(x=(q),(pred))
}

#dev.off()