- By
**ashby** - Follow User

- 138 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Approximate Randomization tests' - ashby

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### ApproximateRandomization tests

February 5th, 2013

Why ar testing?

- Classic tests oftenassume a givendistribution (student t, normal, …) of the variable
- This is ≈ok forrecall, but notforprecision or F-score
- Possible hypotheses to test with non-parametric tests is limited

Illustration

- 30,000 runs, 1000 instances, 500 of class A
- True positives (TP): 400 (stdev:80)
- Falsepositives (FP): 60 (stdev: 15)
- Assumption: trueandfalsepositivesfor class A are normallydistributed. Thisis alreadyanapproximationsince TP and FP are restrictedby 0 and the number of instances.

Definitions

- Recall = trulypredicted A / A in reference = trulypredicted A / CteIf A is normal, recall is normal.
- Precision = trulypredicted A / A in system A in system is a non-linearcombination of TP and FP. Precision is notnormal.
- F-score: non-linearcombination of recallandprecisionNotnormal.

Approximaterandomization test

- No assumption on distribution
- Can handle complicatedstatistics
- Onlyassumption: independencebetweenshuffledelements
- References:
- Computer Intensive MethodsforTesting Hypotheses, Noreen, 1989.
- More accurate tests for the statisticalsignificance of resultsdifferences, Yeh, 2000.

Basic idea

- Exact randomization test

Exact probability

H0: expert is independent of contents

P(ncorrect ≥ 2) = 7/24

= 0.29

Thus, do notreject H0 because the probability is largerthanalpha=0.05.

Approximateprobability

- The number of permutations is n! => quickincrease of number of permutations
- Iftoomuchpermutationstocompute: approximation: P = (nge + 1) / (NS + 1)
- nge : number of timespseudostatistic ≥ actualstatistic
- NS: number of shuffles
- +1: correctionforvalidity

Translationtoinstances

- Eachglass is aninstance
- Contents and expert are twolabeling systems
- Contents has anaccuracy of 100%, expert has anaccuracy of 50%
- Statistic is precision, f-score, recall, … instead of accuracy

Stratifiedshuffling

- For labeledinstances, itmakes no sense toshuffle the class label of oneinstancetoanother
- Onlyshufflelabels per instance

MBT

- Assumpton of independencebetweeninstances
- Shuffle per sentenceratherthan per token

Term extraction

- Shufflingextractedtermsbetween output of two term extraction systems

Script

- http://www.clips.ua.ac.be/~vincent/software.html#art
- http://www.clips.ua.ac.be/scripts/art
- Options:
- Exact andapproximaterandomization tests
- Instancebased, alsofor MBT
- Term extractionbased
- StratifiedShuffling
- Twosided / one-sided (check code!)

Remarks on usage

- It makes no sense toshuffleif exact randomizationcanbecomputed
- The value of p depends on NS. The larger NS, the lower p canbe
- Validity check
- Sign-test
- Re-test: toalleviate bad randomization

Sign test

- Canbecomparedwith P foraccuracy
- H0: correctness is independent ofsystem i.e.P(groen) = 0.5
- Binomial test

Interpretation (1)

- How much do these two systems differbased on precisionfor the A label?
- Maximally
- Intermediate
- Minimally

Conclusion

- Approximaterandomizationtestingcanbeusedformanyapplications.
- The basic idea is that the actualdifferencebetweentwo systems is (im)probabletooccurwhenallpossiblepermutions of the outputs are evaluated.
- Differencecanbecomputed in manyways as long as the shuffledelements are independent.

Download Presentation

Connecting to Server..