Approximate Randomization tests

1 / 25

# Approximate Randomization tests - PowerPoint PPT Presentation

Approximate Randomization tests. February 5 th , 2013. Classic t-test. Why ar testing ?. Classic tests often assume a given distribution (student t, normal , …) of the variable This is ≈ok for recall , but not for precision or F-score

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Approximate Randomization tests' - ashby

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### ApproximateRandomization tests

February 5th, 2013

Why ar testing?
• Classic tests oftenassume a givendistribution (student t, normal, …) of the variable
• This is ≈ok forrecall, but notforprecision or F-score
• Possible hypotheses to test with non-parametric tests is limited
Illustration
• 30,000 runs, 1000 instances, 500 of class A
• True positives (TP): 400 (stdev:80)
• Falsepositives (FP): 60 (stdev: 15)
• Assumption: trueandfalsepositivesfor class A are normallydistributed. Thisis alreadyanapproximationsince TP and FP are restrictedby 0 and the number of instances.
Definitions
• Recall = trulypredicted A / A in reference = trulypredicted A / CteIf A is normal, recall is normal.
• Precision = trulypredicted A / A in system A in system is a non-linearcombination of TP and FP. Precision is notnormal.
• F-score: non-linearcombination of recallandprecisionNotnormal.
Approximaterandomization test
• No assumption on distribution
• Can handle complicatedstatistics
• Onlyassumption: independencebetweenshuffledelements
• References:
• Computer Intensive MethodsforTesting Hypotheses, Noreen, 1989.
• More accurate tests for the statisticalsignificance of resultsdifferences, Yeh, 2000.
Basic idea
• Exact randomization test
Exact probability

H0: expert is independent of contents

P(ncorrect ≥ 2) = 7/24

= 0.29

Thus, do notreject H0 because the probability is largerthanalpha=0.05.

Approximateprobability
• The number of permutations is n! => quickincrease of number of permutations
• Iftoomuchpermutationstocompute: approximation: P = (nge + 1) / (NS + 1)
• nge : number of timespseudostatistic ≥ actualstatistic
• NS: number of shuffles
• +1: correctionforvalidity
Translationtoinstances
• Eachglass is aninstance
• Contents and expert are twolabeling systems
• Contents has anaccuracy of 100%, expert has anaccuracy of 50%
• Statistic is precision, f-score, recall, … instead of accuracy
Stratifiedshuffling
• For labeledinstances, itmakes no sense toshuffle the class label of oneinstancetoanother
• Onlyshufflelabels per instance
MBT
• Assumpton of independencebetweeninstances
• Shuffle per sentenceratherthan per token
Term extraction
• Shufflingextractedtermsbetween output of two term extraction systems
Script
• http://www.clips.ua.ac.be/~vincent/software.html#art
• http://www.clips.ua.ac.be/scripts/art
• Options:
• Exact andapproximaterandomization tests
• Instancebased, alsofor MBT
• Term extractionbased
• StratifiedShuffling
• Twosided / one-sided (check code!)
Remarks on usage
• It makes no sense toshuffleif exact randomizationcanbecomputed
• The value of p depends on NS. The larger NS, the lower p canbe
• Validity check
• Sign-test