108 Views

Download Presentation
## Model Selection for Selectivity in Fisheries Stock Assessments

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Model Selection for Selectivity in Fisheries Stock**Assessments André Punt, Felipe Hurtado-Ferro, Athol Whitten 13 March 2013; CAPAM Selectivity workshop**Overview**• What is the problem we want to solve? • Can selectivity be estimated anyway? • Fleets and how we choose them • Example assessments • Alternative methods: • fit diagnostics • model selection and model weighting • What do simulation studies tell us? • Final thoughts**Definitions of Selectivity**• Selectivity : • Is the relative probability of being captured by a fleet (as a function of age / length) • Depends on how “fleet” is defined • Selectivity is NOT: • Gear selectivity • Availability**Some of the key questions-I**• Should there be multiple fleets and, if so, how do we • choose them? • More fleets (may) make the assumption of time- • invariant selectivity more valid. • More fleets lead to more parameters (and potentially • model instability).**Some of the key questions-II**• Given a fleet structure: • What functional form to assume? • Should selectivity change with time? • Parametric or non-parameteric?**Some of the key questions-III**• Given time-varying selectivity: • Blocked or unblocked • Which parameters of the selectivity function (or all) • should change? Annual Five-year blocks Age-at-50% selex**Caveat – Can selectivity be estimated anyway-I?**• Selectivity is confounded with: • Trends in recruitment (with time) • Trends in natural mortality (with age / time)**Caveat – Can selectivity be estimated anyway-II?**Declining recruitment? Declining selectivity? High F Low recruitment? Low selectivity? Age**Caveat – Can selectivity be estimated anyway-III?**Fit of various selectivity-related models to a theoretical age- composition.**Caveat – Can selectivity be estimated anyway-III?**• The Solution: MAKE ASSUMPTIONS: • Natural mortality is time- and age-invariant • Selectivity follows a functional form. • Selectivity is non-parametric, but there are penalties • on changes in selectivity with age/ length**Example Stocks**Pink ling Pacific sardine**Example Stocks(fleet structure)**2010 2011**Example Stocks(fleet structure)**• Fleets: • Trawl vs Non-trawl • Zones 10,20,30 • Onboardvs port samples Pink Ling One fleet or many**Sensitivity to Assumptions**• Largest impacts: • Is selectivity time-varying or static? • Number of fleets / treatment of spatial structure • Is selectivity asymptotic or dome-shaped?**Selection of Fleets**• Definition: • Ideally – group of vessels fishing in the same spatio- • temporal stratum using the same gear and with the • same targeting practices • In practice – depends on data availability, • computational resources, model stability, trends in • monitored data.**Fleets as areas-I**• It is common to represent “space” by “fleets” (e.g. • pink ling): • what does this assume? • does it work? • Key Assumptions: • The population is fully mixed over its range • Differences in age / length compositions are due to • differences in selectivity.**Fleets as areas-II(does it work)**In theory “no” – in practice “perhaps”! Simulations suggest that treating fleets as areas can reduce bias (Ferro-Hurtado et al.) but that spatial models may perform better (if the data exist – and perhaps not) Clearly, the differences in length and age structure among regions is due to differences in population structure; not selectivity! Self- evidently then the approach is wrong but M probably isn’t age and time-invariant either! Cope and Punt (2011) Fish Res. 107: 22-38**The State of the Art (as I see it)**• Disaggregate data when including them in any assessment (it is easy to aggregate the data when fitting the model). • Test for fleet structure early in the model development process. • Apply clustering-type methods to combine areas / gear types (not statistical tests, which will lead to 100s of fleets).**Residual Analysis**• In principle this is easy: • Plot the data • Compute some statistics • Compare alternative assumptions… EBS Tanner crab**We know how to do this for index data (well)**• It gets trickier for compositional data (and hence selecting functional forms for selectivity) Fits to aggregated length data for pink ling when selectivity is assumed to be independent of zone**BUT!**• Evaluating mis-specification for compositional data is usually not this easy: • The fit may be correct “on average” but there are clear problems. • It may not be clear whether the model is mis-specified**And this?**Is this acceptable?**BUT!**• Evaluating mis-specification for compositional data is usually not this easy: • The fit may be correct “on average” but there are clear problems. • It may not be clear whether the model is mis-specified • Comparing time-varying and static selectivity can be even more challenging because it depends on how much selectivity can vary [Maunder and Harley identify an approach based on cross-validation to help with this]**Using profiles to identify mis-specification**Spatially-disaggregated Spatially-aggregated Plot the negative log-likelihood [compositional data only] for each fleet to identify fleets whose compositional data are “unduly” informative Fleets 2 and 13 (left) and 2 and 5 (right): fleet 13 (a) and 5 (b) are the same fleet and have only two length-frequencies… Should we learn this much?**Automatic Residual Analysis**Two sample Kologorov- Smirnov test applied to artificial data sets Punt & Kinzey: NPFMC crab modelling workshop**The State of the Art (as I see it)-I**• Always: • examine plots of residuals • compare expected effective sample sizes with input values • But: • Viewing plots of residuals can be difficult • How to define / test for time-varying selectivity is tough • Residual patterns in fits to compositions need not be due to choices related to selectivity • There is no automatic approach for evaluating residuals plots for compositional data. • No testing of methods based on residual plots has occurred (yet?)**The State of the Art (as I see it)-II**Aggregated compositions Observed vs expected compositions**Model Selection**No-one would say that model selection (and model averaging) are not part of the tool box of analysts BUT do we know how well they work for stock assessment models? • Model selection methods used: • Maximum Likelihood • F-tests / likelihood ratio tests • AIC, BIC, AICc • Bayesian • DIC**Examples of Model Selection**• AIC: • Butterworth et al. [2003]: is selectivity for southern bluefin tuna time-varying? • Butterworth & Rademeyer [2008]: is selectivity for Gulf of Maine cod dome-shaped or asymptotic ? • DIC • Bogards et al. [2009]: is selecticity for North Sea spatially-varying or not?**Examples of Model Selection(Issues)**• AIC, BIC and DIC are too subtle: • Often fits for two models are negligibly different “by eye”, but highly “statistically significant” (AIC>200). • All these metrics depend on getting the likelihood “right”, in particular the effective sample sizes for the compositional data.**Model Selection and weights**So which model fits the data best? And if we accidentally copied the data file twice?**Effective Sample Sizes-I**• Many assessments: • Pre-specify EffNs. • Use the “McAllister-Ianelli” • approach. • But • Residuals are seldom • independent • An alternative is Chris Francis’ • approach, but that may fail when • there is time-varying selectivity.**Effective Sample Sizes-II**• Maunder [2011] compared various likelihood formulations including: • Multinomial • Fournier et al. with observed rather than expected proportions • Punt-Kennedy (with observed proportions)* • Dirichlet • Iterative (essentially the “McAllister-Ianelli” method) • Multivariate normal Estimated effective sample size**AIC, BIC and Random Effects**Most (almost all) assessments using an “errors in variables” formulation of the likelihood function: rather than the correct (marginal) likelihood: How this impacts the performance of model selection methods is unknown.**The State of the Art (as I see it)**• AIC, BIC, and DIC are commonly used. • But: • Do we need an analogue to the “1% rule” as is the case for CPUE standardization? • We need to get the effective sample sizes right! Using a likelihood function for which the effective sample size can be estimated is a good start! • Performance also depends on treatment of random effects (recruitment, selectivity) • What is the value of looking at retrospective patterns? Can we identify when the cause of a retrospective pattern is definitely selectivity?**Simulation Testing**Operating Model Operating Model Method 1 Method n Method 1 Method n ….. ….. Model Selection Performance measures Performance measures**Simulation Testing**• Caveats before we start: • Simulations are only as good as the operating model • Most simulation studies assume that the likelihood function is known (as is M) • Few simulation studies allow for over-dispersion. • No simulation studies simulate the “meta” aspects of stock assessments (such as how fleets are selected). • Avoid too many generalizations – most properties of estimators will be case-specific**Overdispersal?**How often do the data generated in simulation studies look like this? How much does it matter?**Overview of Broad Results**• Getting selectivity assumptions wrong matters! HOWEVER, other factors (data quality, contrast, M) may be MORE important. • Estimating time-varying selectivity when selectivity is static is safer than ignoring it when selectivity is time-varying. • Model selection methods can discriminate among selectivity functions very well (do I really believe this – why then does it seem so hard in reality?)**The State of the Art (as I see it)**• The structure of most (perhaps all) operating models is too simple and leads to simulated data sets looking “too good” • Andre’s suggestion: if you show someone 99 simulated data sets and the real data set, could they pick it out? • Future simulation studies should: • Include model and fleet selection. • Focus on length-structured models. • Examine whether selectivity is length or age-based.**Final Thoughts**• Methods development • Non-additive models? • State-space models? • Residuals and model selection • Weighting philosophy • Simulation studies • Standards for what constitutes a “decent” operating model? • Compare methods for implementing time-varying selectivity (blocked vs annual) • Consider length-structured models**Final Thoughts**• Ignore “space” at your peril! • What about model mis-specification in general.**Final Points to Ponder!**• Should guidelines be developed for when to: • downweight compositional data rather than modelling time-varying selectivity • fix selectivity and not estimate it! • use retrospective patterns in model selection / bootstrapping • conduct model selection when the selectivity pattern is “non-parameteric” • apply time-varying selectivity • Model selection • Fixing / estimating sigma • trump AIC, BIC and DIC using “by eye” residual patterns.**Support for this paper was provided by NOAA:**• The West Coast Groundfish project • Development of ADMB libraries • Simulation testing of assessment models Questions?