Validation and Hypothesis Testing in Statistics

Week 4. Validation and hypothesistesting MScMethodologySeminar I Felipe Orihuela-Espina

VALIDATION INAOE

Contents • Ground truth • Goldstandard • Types of validity • Mechanismsforshowingvalidity INAOE

Recommendedreadings • ¡Wikipedia! • http://en.wikipedia.org/wiki/Validity_%28statistics%29 • For a gentleoverview. • A bit more formal: • Cronbach, L. J.; Meehl, P. E. (1955). "Constructvalidity in psychologicaltests". PsychologicalBulletin 52 (4): 281–302 • >6000 citas • Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. • http://www.socialresearchmethods.net/kb/introval.php INAOE

Validity and validation • Validity and validation: • Validityisthedegreetowhich a certainmeasurementormodelrepresentswhatitissupposedtorepresent; itsequivalent in the real worldor in nature. [selfdefinition] • Validityisthebestapproximationpossibletothetruth of a givenproposition, inferenceorconclusion [http://www.socialresearchmethods.net] • Validationistheprocessormechanismforassessingvalidity. INAOE

Validity and validation • Validity and validation: • Strictly; “Measures, samples and designs don't 'have' validity -- only propositions can be said to be valid. Technically, we should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference or conclusion that can 'have' validity.” • [http://www.socialresearchmethods.net/kb/introval.php] • Yet, forthisunit I willabstractmyself of thisprecision and willtalkaboutvalidity as correspondingtomeasurementsormodels. INAOE

GROUND TRUTH INAOE

Groundtruth • Thegroundtruthreferstothe real truth of the observable phenomenonortherealityoccuring in nature. [Selfdefinition] • Somealternativedefinitions: • “theclassificationtruth of eachvoxel” [Zou KH et al (2002), MICCAI, LNCS 2488, pp. 315–322,], • …thatis, thetruth in each experimental unitfortheeffects of classification (positive ornegative). • Thisisparticularlynicefor machine learning. • “themostusefulrepresentation of theinformationthatonewishestoconvey”. [Kolaczyk ED (2009) Statistical Analysis of Network Data: Methods and Models, Springer, pg.76] • I likethisonebutrequirestheknowledge of theconstruct (i.e.that “informationthatonewishestoconvey”) INAOE

Groundtruth • Some more alternativedefinitions: • “información collectedonlocation” [http://en.wikipedia.org/wiki/Ground_truth] • Personally I do notlikethisdefinition as itassumesthatwhatyoumeasure in thefield (“onlocation”) ismeasuredwithoutbias, and ifthereisbias, thenthatwouldimplythatthegroundtruthdoesincorporatethosemeasurementerrors • …plus, itdoesnotallowroomforsyntheticdatasets. • Fordifferentreasons, butotherauthorsalso share someconcernsaboutthisdefinition: • “ground-truth data made by hand is usually inefficient and hard to compare from one’s work to another” [Jun-qiang et al (2011), The Journal of China Universities of Posts and Telecommunications, 18(Suppl. 1): 106-111] • NOTE: “madebyhand” herereferstotheinformationcollectedbytheresearcheronlocation; nottiosynthetic data • “The usual way to obtain the ground truth is fragile, inefficient and not directly comparable from one’s work to another” [Canini et al 2009, TMA] INAOE

Groundtruth • Evensome more alternativedefinitions : • “ancillary data” [https://www.fas.org/irp/imint/docs/rst/Sect13/Sect13_1.html] • BesttranslationtoSpanish of theword “ancillary” is auxiliar, subordinado or secundario • …I cannotbe more in disagreementwiththisdefinition… • “a representation of the agreed correct result of the ideal layout analysis method (i.e. the result of the method that, if existed, would put an end to the research problem).” [Antonacopoulos et al (2006) Document Analysis Systems, 11pgs] • Thisdefinition has 2 clearweaknesses: • Assumesthattruthdependsonwhattheannotatorsthink (in total oppositiontoobjectivism). • Forinstance; if a fewcliniciansagreedthat a womanisnotpregnant, thensheisnotpregnantdespitetheobviousfactthatthisdoesnotdependontheiropinion. • Assumesthateverymethod has itsowngroundtruth, whichforobviousreasonsdoesnothold… INAOE

Groundtruth • Yet a few more alternativedefinitions: • “the actual facts of a situation and is used to determine, with certainty, whether information is accurate” [Vrij 2000 en Toma et al Pers Soc Psychol Bull 2008; 34; 1023] • Hot! Bestto my taste, and ifI’mallowedcloseto my owndefinition INAOE

Groundtruth • Regardless of thedefinition: theavailability of a groundtruthpermitsobjectiveevaluation of criteria. • Ifitisavailable, validationinvolvescomparinghowfarisourmetricormodelfromthegroundtruth (groundtruthapproach). • Withsynthetic data, groundtruthisalwaysavailable. INAOE

Groundtruth • Problem: • With real data, thegroundtruthisnotalwaysavailable, and oftenitsacquisitionisfarfrom trivial. • Severalmethodshavebeendevelopedforestimatingthegroundtruthwhenitisnotavailable: • Beware! No matterhowgoodisyourestimationitis no longerthegroundtruth. • A fewmethodsforestimatingthegroundtruth [LiX 2010]: • Votingbyseveralannotatorsorjudges. • Often, experts in thefields • Maximum posterior probability • Minimization of variance INAOE

Groundtruth • Syntheticgroundtruth: • Althoughbeingabletogeneratesyntheticgroundtruthmeansthatyou are truly in posession of thegroundtruth, thisisnotexempt of problems; • “ground-truth data made by hand is usually inefficient and hard to compare from one’s work to another” [Jun-qiang et al (2011), The Journal of China Universities of Posts and Telecommunications, 18(Suppl. 1): 106-111] • NOTE: Heretheauthors are referringstrictlytoinformationcollectedonlocationby a researcher INAOE

GOLD Standard INAOE

Goldstandard • Thegoldstandardrefersto a test, metric, ormodelthatiswidelyacceptedbythecommunity as a currentlyvalidrepresentation of thereality. [Selfdefinition] • Otherdefinitions: • “another measure that has been used and accepted in the field” [Young et al 1995, Arch Phys Med Rehabil 76:913-918] • Relativelycloseto mine. INAOE

Goldstandard • More alternativedefinitions : • “a relatively irrefutable standard that constitutes recognized and accepted evidence that a certain disease exists.” [Brown et al 1996, NEJM 335(14):1049-1053] • ¡Verygood! ...shouldn’titwasbecausethey use theword “standard” to define “standard”. • …yet, itis NEJM! • “a benchmark that is regarded as definitive” [Noel et al 2009, ATM] • No goldstandardisdefinitive; onlyuntilweget a more accurateone… INAOE

Goldstandard • A few more alternativedefinitions: • “the best available method, offering accuracy, reproducibility, feasibility and a justifiable cost-benefit interrelation” [Mariath et al (2007), JAOS 15(6):529-33] • I miss theneedtocommunityconsensus • Also, I do notundestandverywellwhyincludingthecost-efficiencyrelation. • “based on the judgments of expert PIs and represent the carrier’s definition of what constitutes acceptable/unacceptable aircrew performance” [Baker y Dismukes (2003), NASA, NASA/TM—2003–212809] • A bit limitedtothecontext, butreadingbetweenlines, a verygooddefinition. INAOE

Goldstandard • Yet a few more definitions: • “the most accurate method, procedure, or measurement that is known to represent the true value of what is being tested.” [Hudson, MSc thesis, Texas A&M University, pg 5] • Verycloseto mine. (NOTE: I gave mine wellbeforeknowingthisone…) • “a diagnostic test or benchmark that is the best available under reasonable conditions. It does not have to be necessarily the best possible test for the condition in absolute terms.” [http://en.wikipedia.org/wiki/Gold_standard_(test)] • I miss theneedtocommunityconsensus. INAOE

Goldstandard • Whengroundtruthisnotavailable, validationinvolvescomparinghowfarisourmetricormodelfromthegoldstandard (goldstandardapproach). • Thisoften links toconvergentvalidity and to a extentnomologicalvalidity • Note thatyou compare yourselfnotagainstthestate of the art, butagainstwhatitisacceptedbythecommunity. • Problem: Still, thereisn’t a goldstandardalwaysacceptedbythecommunity. INAOE

TYPES OF VALIDITY INAOE

Types of validity • Types of validity: • Constructvalidity • Convergentvalidity • Criterionvalidity • Concurrentvalidity • Predictive o empiricalvalidity • Discriminantvalidity • Content validity • Facevalidity • Representationvalidity • Intrinsicvalidity • Internalvalidity • Externalvalidity • Logicalvalidity • Statisticalconclusionvalidity • Ecologicalvalidity • Diagnosticvalidity • Nomologicalvalidity INAOE

Types of validity Fidelitytothephenomenon Representative of thepopulation Fidelitytoothermetrics INAOE

Types of validity Figura de: [http://www.socialresearchmethods.net/kb/introval.php] INAOE

Types of validity Figura de: [www.analytictech.com] INAOE

Types of validity • Constructvalidity: • A constructis “somepostulatedattribute of peopleassumedtobereflected in the test performance” [Cronbach y Meehl, 1955]. • Cronbach and Meehl seminal paperwaswrittenforpsychology, so whentheyspeak of: • people, theyrefertothephenomenonunderstudy. • Test performance, theyreferto a modelormetriccapturingsomeaspect of thephenomenon • Thedevelopment of theconstructisnoteasy. Toonarrowortoobroad and itmaynullifyyourexperiment. INAOE

Types of validity • Constructvalidity: • Construct: • Example: Researchprocess and validationaccordingtopositivistresearchtradition Figure from: [Johnston y Smith http://epress.anu.edu.au/apps/bookworm/view/ Information+Systems+Foundations%3A+The+Role+of+Design+Science/5131/ch02.xhtml] INAOE

Types of validity • Constructvalidity: • “…constructvalidity […] perceived as themost fundamental and embracing of alltypes of validity” [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 26)] • Capacity of a metricormodeltomeasureorrepresentfaithfullythephenomenonunderstudy and itslegitimateinferences. • In otherwords, thatyou are measuringormodellingwhatyoushouldbemeasuringormodelling, and thatyou are free of bias (seealsointernalvalidity) • Itbacomesspeciallycriticalwhen a concomitantcriterionoruniverse of contentislacking. [Cronbach y Meehl, 1955] INAOE

Types of validity • Constructvalidity: • Constructvalidityshouldbeequivalentto: “Tellthetruth, allthetruth, and nothingbutthetruth” • [http://www.socialresearchmethods.net/kb/considea.php] • Wewantourmetricormodeltorepresent “theconstruct, alltheconstruct [contentvalidity], and nothingbuttheconstruct” • [http://www.socialresearchmethods.net/kb/considea.php] INAOE

Types of validity • Constructvalidity: • Constructvalidityis a process, not a method [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 26)] • Itrequiresmanylines of evidence • Itcannotbeexpressedby a single figure orcoefficient • Itrequiresbothquantitative and qualitativeevidence INAOE

Types of validity Figure from: [http://www.socialresearchmethods.net/kb/considea.php] INAOE

Types of validity • Constructvalidity: • Constructvalidityiswhatallowsustobuild a universal truth. Consequentlyit can becompromisedbymanyfactors: • Inadequateorambiguousdefinition of theconstruct • Alteredbehaviour of theconstruct (e.g.appearance of new treatments) • Biasincludingthat of theresearcher • Confusionorcontaminationby non-controlledorlatent variables • Confusionorcontaminationby factor interaction (non-maineffects) • Confusionorcontaminationbyotherconstructs • Breaching of theblindingbyhypothesisguessingfrompatientorco-researchers • Mono-operationbias (thatismeasuring a single dependent variable) • A single dimensionwillbeinsufficienttoexpresstheconstruct • Ofuscation/apprehensionbyevaluation (changes un the responses of thephenomenonjustbecauseitisbeingmeasured/observed) • Remember; youcan’tmeasure a phenomenonwithoutdistortingit! INAOE

Types of validity INAOE Figure from: [http://cmapspublic.ihmc.us/rid=1148264198734_1533533750_4916/Validez%20de%20Constructo.cmap]

Types of validity • Convergentvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenon. • Watchout! Theremaybedifferentconstructsfromthesamephenomenon, and thusthemodelmaybeexplainingjustslightlydifferentthings • Itis a subtype of constructvalidity • …but note howitis similar tocriterionvalidity. • Allmetrics/models of thesamephenomenonmust converge to a uniquegroundtruth. INAOE

Types of validity • Criterionvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenon. • Yep! Apparentlythesame as convergentvalidity • Dependingonthe temporal relation of thesamplingbetweenthemetricsormodels, itmaybe: • Concurrent: Allobservations are taken at once • Predictive:Observations of thedifferentmetricsormodels are acquired at different times • Allmetrics/models of thesamephenomenonmustagree (to a extent) withthegoldstandard. INAOE

Types of validity • Criterionvalidity: • Example: Supposethatyouacquire a number of tests (salivarycortisol, heartratevariability, skinconductance, pupildilation), toestimate a person’s stress. Let’sassumeyougetstrongcorrelationsbetweeneachpair of these. • Itisreasonabletothinkthatthesemetrics share somecommon factor. • Observation 1: Do nottakeforgranted, thatthecommon factor istheoneyouintendedtomeasureoriginally (construct) • Observation 2: Themetrics do notmeasurethesamefeature of theconstruct (stress). Indeed, theymeasuredifferentconstructs, yetallevidencepointstothesamedirection. Figure from [positivemed.com] INAOE

Types of validity • Predictiveorempiricalvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenontaken at different times. • A subtype of criterionvalidty. • Closelyrelatedtoconcurrentvalidity; • Predictivevalidity has to do with a priori predictionsmadebythemodel • Ontheotherhand, concurrentvalidity has to do withthecorrelationbetweenalreadyexistingobservations (orthosemade at thesame time) and thea posterioriestimation of themodel INAOE

Types of validity • Predictiveorempiricalvalidity: • Althoughmost times predictive and empiricalvalidity are defined as one [e.g.http://www.britannica.com/EBchecked/topic/186144/empirical-validity], personally I thinktheymaybesubtlydifferent: • Predictivevalidityisoftenregarded in terms of observationsmadewithothermetrics/models • Empiricalvalidityisoftenregarded in terms of observationsmade in differentcontexts (e.g.different experimental conditions), and in thisviewisrelatedtoexternalvalidity. INAOE

Types of validity • Concurrentvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenonacquired at thesame time. INAOE

Types of validity • Discriminantvalidity: • Degreetowhichobservationsobtainedfromourmetricormodeldifferfromthoseobtainedfromothermetricsormodelbuiltfor a differentconstruct. • Implies a lowcorelationorhighstatisticalindependencewithothermetrics. • i.e.youshould resemble thosemetricsbuilttomeasureyoursameconstructbutdifferclearlyfromthosebuiltfordifferentconstructs. • Thisvalidityiscriticaltodelimittheconstruct. • “discriminantvalidityis […] perhaps a stronger test […] thanconvergentvalidity, becuaseitimplies a challengefrom a plausible rival hypothesis” [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 27)] INAOE

Types of validity • Content validity: • Capacity of themetricormodeltorepresentthewholeuniverse (orpopulation) of thephenomenon. • Note thatyoumayhave a constructintendedtoonlypartially describe thephenomenon. • Maybeyourmodelisvalidonlyfor a smallportion of thesamplespaceoruniverse. Itis a partial, non universal, truth, althoughstillvalidwithinitsboundaries. • Thisisoftenobtainedby non-statisticalmethods, and itisnotnecessarilysolvedbythe experimental paradigm/design. • Severalexperts decide onwhethertheobservations are representative of the target universeorpopulation. • ¡Watchout! Expertsmaystillbewrong … INAOE

Types of validity • Facevalidity: • Degreetowhich a metricormodelappearstobemeasuringtheconstructorphenomenon. • Facevalidityisonlytheentrancepointtothecontentvalidity. Itdoesnotguaranteethatyou are reallymeasuringthephenomenon.. • Itoftenincorporates a subjective load fromexpert/s • NOTE: It has beensuggestedthatfacevalidityshouldbeexpressedoragreedspecificallyby non-expertsratherthanexperts. • [Holden, Ronald B. (2010). "Face validity". In Weiner, Irving B.; Craighead, W. Edward. The CorsiniEncyclopedia of Psychology (4th ed.). Hoboken, NJ: Wiley. pp. 637–638.] INAOE

Types of validity • Representationvalidity: • Limits of theconversionfrom a theoreticalconstructto a practicalspecificmetricormodel. • Representationvalidityis a measure of abstraction; howfeasibleisthemodel as a surrogate of thetheoreticalconstruct? INAOE

Types of validity • Intrinsicvalidity: • (Cor-)relationwith a criterion (expert) that has beenaccepted as correct. [Gulliksen (1950) American Psychologist, 5(10):511-517] • Closelyrelatedtofacevalidity, althoughherethereseemstobe a consensusthattherelationmustbewithanexpert. • …subjectivity has beenreduced in thesensethatit has been “accepted” bythecommunity. • Theexpertmaybe a goldstandard. INAOE

Types of validity • Internalvalidity: • Quality of a metricormodeltoallowsampling free of bias. • Differentlyfromtheconstructvalidity, itdoesnotimplythatwhatyou are measuringisrelatedtothemodelledphenomenon; itisjustconcernedwithmeasurementormodellingbias. • Internalvalidityisachievedfullywhenthere are irrefutable argumentsshowingthattheintervention has had (orhadn’t) a certaineffect. • More oftenthannot, itrequires a controlledexperiment (with a control group) • Remember, theremaybeconfusion; e.g.otheralternativehypothesis, and thustherewillbe no constructvalidity, butyoustillmayhaveinternalvalidity. • Itconfirmsthatyourexperimentiscorrectlyperformed • Itisconcernedwithcausality (yetitdoesnotrepresentcausality!) • Example: • Everytimeyouchange A underconditions C lead to a change in B (internalvalidity). That’sdifferent of sayingthat B iscausedby A (constructvalidty). INAOE

Types of validity • Internalvalidity: • Internalvalidityguaranteesthatevidence can becommunicateddirectly. • Internalvaliditymaybe at riskwhen: • Theanalysisdoesnotsupport causal relationsadequately • Groupsbeingcompared are notsufficientlyhomogeneous • Resultsmaynotreachstatisticalsignificance • [http://ec.europa.eu/europeaid/evaluation/methodology/methods/mth_vld_es.htm#05] INAOE

Types of validity • Externalvalidity: • Quality of a metricormodeltopermitobservationsthat can begeneralizedtoothermetrics, models, groups, areas, periods, etc • Externalvalidityisachievedfullywhenitisdemonstratedthat a similar interventionwillget similar effects in a differentcontextbutstillunderthesameconditions. • Normally, itrequireslargenumber of observations, multi-center studies, randomeffectsmodels, differentdatasets, etc. • Externalvaliditypermit transfer of knowledge and scientificevidence INAOE

Types of validity • Internal and externalvalidityseemstobe in conflict; • Internalvalidityrequiresyouto control as much as you can (e.g.allintervining variables) • …butthat reduces thegeneralizationcapabilities, i.e.theexternalvalidity. • (and sometimescollaterallytheecologicalvalidity) Figure from: [http://prpj.wordpress.com/2012/03/11/threats-to-validity-of-experimental-research/] INAOE

Types of validity • Logicalordeductivevalidity: • A metricormodel has logicalvalidityif and onlyif can bededuced/ abduced/ inducedby a logicalsystem. INAOE

Types of validity • Logicalsystem: • A set of elements and objectsallowingustakingdecisions. • Itiscomposedby: • Analphabet of symbols orprimitives • A grammarwithconstruction rules madefromelements of thealphabet • A set of axioms • …which in turn are alsowellformed rules • A set of inference rules • A formal interpretation Syntactic Semantic Dr. Felipe Orihuela Espina

Validation and Hypothesis Testing in Statistics

Validation and Hypothesis Testing in Statistics

Presentation Transcript

Hypothesis Testing

Testing Hypothesis

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing:

Hypothesis testing

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Hypothesis testing

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Hypothesis testing

Hypothesis Testing

Hypothesis and Testing of Hypothesis

Hypothesis testing

Hypothesis Testing