---
title: "Things that can go wrong"
author: "Carl James Schwarz"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    number_sections: yes
    toc: yes
    md_extensions: [ 
      "-autolink_bare_uris" 
    ]
vignette: >
  %\VignetteIndexEntry{Things that can go wrong} 
  %\VignetteEngine{knitr::rmarkdown} 
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(SPAS)  

```

# Things that can go wrong.
There are several types of problem that can occur that make the use of 
SPAS problematic.

## Rows are proportional
If the row in the movement matrix are proportional, then there an infinite
number of solutions. Note it is the singularity of the movement matrix 
that is problematic; the number of tags releases and not recovered (last column of
the input data) can be proportional without affecting the fit.

The model may converge to different solutions depending on your set up.
There is no automatic "flagging" of this situation and you need
to be vigilant. A diagnostic tool is the condition number of
$XX'$ where $X$ is the movement matrix. If the rows are exactly 
proportional (or more generally if the rows are colinear) the
condition number will be $+\infty$. 

The solution is to pool the row(s) that are proportional.

Here are some examples:

```{r }

test.data.csv <- textConnection("
 160   ,   120   ,     72   ,     82   ,   3592
  80   ,    60   ,     36   ,     41   ,    532
7960   ,  9720   ,   6264   ,   7934   ,   0  ")

test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data

mod..1 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=1:2, col.pool.in=1:4)

SPAS.print.model(mod..1)
```

In this case, it appears that the model has converged, but this is ONE of many possible 
solutions and cannot be relied upon.

The condition number is very large (!) as expected and shown in the above output:
```{r }
# Compute the condition number of XX'

XX <- test.data[1:2, 1:4] %*% t(test.data[1:2, 1:4])
XX
cat("\n\nCondition number is\n")
kappa(XX)
```

We will either physically or logically pool rows:
```{r echo=TRUE}
mod..2 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=c(1,1), col.pool.in=1:4, 
                       row.physical.pool=FALSE)

SPAS.print.model(mod..2)
```

This has solved the colinearity problem.

## Rows are approximately proportional
Of course with real data, it is highly unlikely that the rows in the
recovery matrix will be exactly proportional. 

The model may converge to different solutions depending on your set up.
There is no automatic "flagging" of this situation and you need
to be vigilant. The condition number of $XX'$ may be useful.

The solution is to pool the row(s) that are proportional.

Here are some examples where the entries in row 2 are modified
slightly to make the rows only approximately proportional

```{r }

test.data.csv <- textConnection("
 160   ,   120   ,     72   ,     82   ,   3592
  75   ,    62   ,     38   ,     35   ,    532
7960   ,  9720   ,   6264   ,   7934   ,   0  ")

test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data

mod..2 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=1:2, col.pool.in=1:4)

SPAS.print.model(mod..2)

# Compute the condition number of XX'

XX <- test.data[1:2, 1:4] %*% t(test.data[1:2, 1:4])
XX
cat("\n\nCondition number is\n")
kappa(XX)
```

Now there is only one solution, but the estimate is very sensitive
to small changes in the data. Notice that the condition number
is still very large.


# Columns that are all zero
In theory this should have no influence on the fit (see the 
section on pooling columns). 

```{r }

test.data.csv <- textConnection("
 160   ,   120   ,     72   ,     82   ,   0, 3592
 100   ,    45   ,     39   ,     90   ,   0,  532
7960   ,  9720   ,   6264   ,   7934   ,   0,    0  ")

test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data

mod..3 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=1:2, col.pool.in=1:5)

SPAS.print.model(mod..3)

XX <- test.data[1:2, 1:5] %*% t(test.data[1:2, 1:5])
XX
cat("\n\nCondition number is\n")
kappa(XX)
```

Notice that the column of 0's does not affect the fit
and has no impact on the condition number of $XX'$.


# References
Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260.
https://www.jstor.org/stable/2332748

Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60.
https://www.jstor.org/stable/2533994

Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296.
https://doi.org/10.1139/f97-238