Model Selection for Selectivity in Fisheries Stock Assessments André Punt, Felipe Hurtado-Ferro, Athol Whitten 13 March 2013; CAPAM Selectivity workshop
Overview • What is the problem we want to solve? • Can selectivity be estimated anyway? • Fleets and how we choose them • Example assessments • Alternative methods: • fit diagnostics • model selection and model weighting • What do simulation studies tell us? • Final thoughts
Definitions of Selectivity • Selectivity : • Is the relative probability of being captured by a fleet (as a function of age / length) • Depends on how “fleet” is defined • Selectivity is NOT: • Gear selectivity • Availability
Some of the key questions-I • Should there be multiple fleets and, if so, how do we • choose them? • More fleets (may) make the assumption of time- • invariant selectivity more valid. • More fleets lead to more parameters (and potentially • model instability).
Some of the key questions-II • Given a fleet structure: • What functional form to assume? • Should selectivity change with time? • Parametric or non-parameteric?
Some of the key questions-III • Given time-varying selectivity: • Blocked or unblocked • Which parameters of the selectivity function (or all) • should change? Annual Five-year blocks Age-at-50% selex
Caveat – Can selectivity be estimated anyway-I? • Selectivity is confounded with: • Trends in recruitment (with time) • Trends in natural mortality (with age / time)
Caveat – Can selectivity be estimated anyway-II? Declining recruitment? Declining selectivity? High F Low recruitment? Low selectivity? Age
Caveat – Can selectivity be estimated anyway-III? Fit of various selectivity-related models to a theoretical age- composition.
Caveat – Can selectivity be estimated anyway-III? • The Solution: MAKE ASSUMPTIONS: • Natural mortality is time- and age-invariant • Selectivity follows a functional form. • Selectivity is non-parametric, but there are penalties • on changes in selectivity with age/ length
Example Stocks Pink ling Pacific sardine
Example Stocks(fleet structure) 2010 2011
Example Stocks(fleet structure) • Fleets: • Trawl vs Non-trawl • Zones 10,20,30 • Onboardvs port samples Pink Ling One fleet or many
Sensitivity to Assumptions • Largest impacts: • Is selectivity time-varying or static? • Number of fleets / treatment of spatial structure • Is selectivity asymptotic or dome-shaped?
Selection of Fleets • Definition: • Ideally – group of vessels fishing in the same spatio- • temporal stratum using the same gear and with the • same targeting practices • In practice – depends on data availability, • computational resources, model stability, trends in • monitored data.
Fleets as areas-I • It is common to represent “space” by “fleets” (e.g. • pink ling): • what does this assume? • does it work? • Key Assumptions: • The population is fully mixed over its range • Differences in age / length compositions are due to • differences in selectivity.
Fleets as areas-II(does it work) In theory “no” – in practice “perhaps”! Simulations suggest that treating fleets as areas can reduce bias (Ferro-Hurtado et al.) but that spatial models may perform better (if the data exist – and perhaps not) Clearly, the differences in length and age structure among regions is due to differences in population structure; not selectivity! Self- evidently then the approach is wrong but M probably isn’t age and time-invariant either! Cope and Punt (2011) Fish Res. 107: 22-38
The State of the Art (as I see it) • Disaggregate data when including them in any assessment (it is easy to aggregate the data when fitting the model). • Test for fleet structure early in the model development process. • Apply clustering-type methods to combine areas / gear types (not statistical tests, which will lead to 100s of fleets).
Residual Analysis • In principle this is easy: • Plot the data • Compute some statistics • Compare alternative assumptions… EBS Tanner crab
We know how to do this for index data (well) • It gets trickier for compositional data (and hence selecting functional forms for selectivity) Fits to aggregated length data for pink ling when selectivity is assumed to be independent of zone
BUT! • Evaluating mis-specification for compositional data is usually not this easy: • The fit may be correct “on average” but there are clear problems. • It may not be clear whether the model is mis-specified
And this? Is this acceptable?
BUT! • Evaluating mis-specification for compositional data is usually not this easy: • The fit may be correct “on average” but there are clear problems. • It may not be clear whether the model is mis-specified • Comparing time-varying and static selectivity can be even more challenging because it depends on how much selectivity can vary [Maunder and Harley identify an approach based on cross-validation to help with this]
Using profiles to identify mis-specification Spatially-disaggregated Spatially-aggregated Plot the negative log-likelihood [compositional data only] for each fleet to identify fleets whose compositional data are “unduly” informative Fleets 2 and 13 (left) and 2 and 5 (right): fleet 13 (a) and 5 (b) are the same fleet and have only two length-frequencies… Should we learn this much?
Automatic Residual Analysis Two sample Kologorov- Smirnov test applied to artificial data sets Punt & Kinzey: NPFMC crab modelling workshop
The State of the Art (as I see it)-I • Always: • examine plots of residuals • compare expected effective sample sizes with input values • But: • Viewing plots of residuals can be difficult • How to define / test for time-varying selectivity is tough • Residual patterns in fits to compositions need not be due to choices related to selectivity • There is no automatic approach for evaluating residuals plots for compositional data. • No testing of methods based on residual plots has occurred (yet?)
The State of the Art (as I see it)-II Aggregated compositions Observed vs expected compositions
Model Selection No-one would say that model selection (and model averaging) are not part of the tool box of analysts BUT do we know how well they work for stock assessment models? • Model selection methods used: • Maximum Likelihood • F-tests / likelihood ratio tests • AIC, BIC, AICc • Bayesian • DIC
Examples of Model Selection • AIC: • Butterworth et al. : is selectivity for southern bluefin tuna time-varying? • Butterworth & Rademeyer : is selectivity for Gulf of Maine cod dome-shaped or asymptotic ? • DIC • Bogards et al. : is selecticity for North Sea spatially-varying or not?
Examples of Model Selection(Issues) • AIC, BIC and DIC are too subtle: • Often fits for two models are negligibly different “by eye”, but highly “statistically significant” (AIC>200). • All these metrics depend on getting the likelihood “right”, in particular the effective sample sizes for the compositional data.
Model Selection and weights So which model fits the data best? And if we accidentally copied the data file twice?
Effective Sample Sizes-I • Many assessments: • Pre-specify EffNs. • Use the “McAllister-Ianelli” • approach. • But • Residuals are seldom • independent • An alternative is Chris Francis’ • approach, but that may fail when • there is time-varying selectivity.
Effective Sample Sizes-II • Maunder  compared various likelihood formulations including: • Multinomial • Fournier et al. with observed rather than expected proportions • Punt-Kennedy (with observed proportions)* • Dirichlet • Iterative (essentially the “McAllister-Ianelli” method) • Multivariate normal Estimated effective sample size
AIC, BIC and Random Effects Most (almost all) assessments using an “errors in variables” formulation of the likelihood function: rather than the correct (marginal) likelihood: How this impacts the performance of model selection methods is unknown.
The State of the Art (as I see it) • AIC, BIC, and DIC are commonly used. • But: • Do we need an analogue to the “1% rule” as is the case for CPUE standardization? • We need to get the effective sample sizes right! Using a likelihood function for which the effective sample size can be estimated is a good start! • Performance also depends on treatment of random effects (recruitment, selectivity) • What is the value of looking at retrospective patterns? Can we identify when the cause of a retrospective pattern is definitely selectivity?
Simulation Testing Operating Model Operating Model Method 1 Method n Method 1 Method n ….. ….. Model Selection Performance measures Performance measures
Simulation Testing • Caveats before we start: • Simulations are only as good as the operating model • Most simulation studies assume that the likelihood function is known (as is M) • Few simulation studies allow for over-dispersion. • No simulation studies simulate the “meta” aspects of stock assessments (such as how fleets are selected). • Avoid too many generalizations – most properties of estimators will be case-specific
Overdispersal? How often do the data generated in simulation studies look like this? How much does it matter?
Overview of Broad Results • Getting selectivity assumptions wrong matters! HOWEVER, other factors (data quality, contrast, M) may be MORE important. • Estimating time-varying selectivity when selectivity is static is safer than ignoring it when selectivity is time-varying. • Model selection methods can discriminate among selectivity functions very well (do I really believe this – why then does it seem so hard in reality?)
The State of the Art (as I see it) • The structure of most (perhaps all) operating models is too simple and leads to simulated data sets looking “too good” • Andre’s suggestion: if you show someone 99 simulated data sets and the real data set, could they pick it out? • Future simulation studies should: • Include model and fleet selection. • Focus on length-structured models. • Examine whether selectivity is length or age-based.
Final Thoughts • Methods development • Non-additive models? • State-space models? • Residuals and model selection • Weighting philosophy • Simulation studies • Standards for what constitutes a “decent” operating model? • Compare methods for implementing time-varying selectivity (blocked vs annual) • Consider length-structured models
Final Thoughts • Ignore “space” at your peril! • What about model mis-specification in general.
Final Points to Ponder! • Should guidelines be developed for when to: • downweight compositional data rather than modelling time-varying selectivity • fix selectivity and not estimate it! • use retrospective patterns in model selection / bootstrapping • conduct model selection when the selectivity pattern is “non-parameteric” • apply time-varying selectivity • Model selection • Fixing / estimating sigma • trump AIC, BIC and DIC using “by eye” residual patterns.
Support for this paper was provided by NOAA: • The West Coast Groundfish project • Development of ADMB libraries • Simulation testing of assessment models Questions?