1 / 28

A Statistical Analysis of the Precision-Recall Graph

A Statistical Analysis of the Precision-Recall Graph. Ralf Herbrich, Hugo Zaragoza , Simon Hill. Microsoft Research, Cambridge University, UK. Overview. 2-class ranking Average-Precision From points to curves Generalisation bound Discussion. “Search” cost-functions.

nailah
Download Presentation

A Statistical Analysis of the Precision-Recall Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Statistical Analysis of the Precision-Recall Graph Ralf Herbrich, Hugo Zaragoza, Simon Hill. Microsoft Research, Cambridge University, UK. Microsoft Research

  2. Overview • 2-class ranking • Average-Precision • From points to curves • Generalisation bound • Discussion

  3. “Search” cost-functions • Maximise the number of relevant documents found in the top 10. • Maximise the number of relevant documents at the top (e.g. weight inversely proportional to rank) • Minimise the number of documents seen by the user until he is satisfied.

  4. Motivation • Why should 45 August, 2003 work for document categorisation? • Why should any algorithm obtain good generalisation average-precision? • How to devise algorithms to optimise rank dependant loss-functions?

  5. 2-class ranking problem X,Y Mapping: X  R {y=1} Relevancy: P(y=1|x)  P(y=1|f(x))

  6. Collection samples • A collection is a sample: z= ((x1,y1),...,(xm,ym))  (X x {0,1})m • where: • y = 1 if the document x is relevant to a particular topic, • z is drawn from the (unknown) distribution πXY • let k denote the number of positive examples

  7. Ranking the collection • We are given a scoring function f :XR • This function imposes an order in the collection: • (x(1) ,…,x(m)) such that : f(x(1)) > … > f(x(m)) • Hits (i1,…, ik)are the indices of the positive y(j). f(x(i)) y(i) = 1 1 0 1 0 0 1 0 0 0 ij = 1 2 4 7

  8. Classification setting • If we threshold the function f, we obtain a classification: • Recall: • Precision: f(x(i)) t

  9. Precision .vs. PGC PGC PGC PRECISION PRECISION

  10. The Precision-Recall Graph After reordering: f(x(i))

  11. 1 0.9 0.8 0.7 0.6 0.5 Precision 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Recall Graph Summarisations Break-Even point

  12. Precision-Recall Example

  13. overfitting? Average Precision (TEST SET) Average Precision (TAIN SET)

  14. Overview • 2-class ranking • Average-Precision • From points to curves • Generalisation bound • Discussion

  15. From point to curve bounds • There exist SVM margin-bounds [Joachims 2000] for precision and recall. • They only apply to a single (unknown a priori) point of the curve! Precision Recall

  16. Max-Min precision-recall

  17. Max-Min precision-recall (2)

  18. Features of Ranking Learning • We cannot take differences of ranks. • We cannot ignore the order of ranks. • Point-wise loss functions do not capture the ranking performance! • ROC or precision-recall curves do capture the ranking performance. • We need generalisation error bounds for ROC and precision-recall curves

  19. Generalisation and Avg.Prec. • How far can the observed Avg.Prec. A(f,z)be from the expected average A(f) ? • How far can train and test Avg.Prec.?

  20. Approach • McDiarmid’s inequality: For any function g:ZnR with stability c, for all probability measures P with probability at least1-δ over the IID draw of Z

  21. Approach (cont.) • Set n= 2m and call the two m-halves Z1 and Z2. Define gi (Z):=A(f,Zi). Then, by IID :

  22. Bounding A(f,z) - A(f,zi) • How much does A(f,z) change if we can alter one sample (xi,yi)? • We need to fix the number of positive examples in order to answer this question! • e.g. if k=1, the change can be from 0 to 1.

  23. Stability Analysis • Case 1: yi=0 • Case 2: yi=1

  24. Main Result Theorem: For all probability measures, for all f:XR, with probability at least 1- δover the IID draw of a training and test sample both of size m, if both training sample z and test sample z contain at least αm positive examples for all α(0,1), then:

  25. Positive results • First bound which shows that asymptotically training and test set performance (in terms of average precision) converge! • The effective sample size is only the number of positive examples. • The proof can be generalised to arbitrary test sample sizes. • The constants can be improved.

  26. Open questions • How can we let k change, so as to investigate: • What algorithms could be used to directly maximise A(f,z) ?

  27. Conclusions • Many problems require ranking objects in some degree. • Ranking learning requires to consider non-point-wise loss functions. • In order to study the complexity of algorithms we need to have large deviation inequalities for ranking performance measures.

More Related