1 / 19

Evaluating Different Methods of Estimating Retrieval Quality for Resource Selection

Evaluating Different Methods of Estimating Retrieval Quality for Resource Selection. H. Nottelmann, N. Fuhr Presented by Tao Tao February 12, 2004. Engine 1. Engine 3. Engine 4. Engine 2. Fusion. query. Distributed IR. Resource selection. Heuristic ways GIOSS CORI others

ulani
Download Presentation

Evaluating Different Methods of Estimating Retrieval Quality for Resource Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Different Methods of Estimating Retrieval Quality for Resource Selection H. Nottelmann, N. Fuhr Presented by Tao Tao February 12, 2004

  2. Engine 1 Engine 3 Engine 4 Engine 2 Fusion query Distributed IR Resource selection

  3. Heuristic ways GIOSS CORI others Good performance, but poor theoretical foundation Decision-theoretic framework (DTF) Solid foundation, good performance as well Resources selection

  4. Cost i Engine i Cost -- every factor is a type of cost • Effectiveness: • C+ to view a relevant doc; • C- for non-relevant one • Time: Ct • Money: Cm • others

  5. To minimize the total cost! HOW?

  6. Server i Rank 1 Rank 2 Rank 3 Rank 4 …… Rank si …… Rank n … Cim Cit Cirel cut ? Problems need addressing

  7. Three methods to estimate ri(si,q) • DTF-rp • DTF-sample • DTF-normal

  8. DTF-rp • Assume: Pi=Pi0(1-Ri) (precision-recall curve) P P0 R

  9. E(rel|q, DLi) ??

  10. DTF-sample

  11. DTF-normal • Four steps • Modeling the distribution of Pr(t←d) • Computing Pr(q←d) • Deriving Pr(rel|q,d) • Estimating r(si,q)

  12. Modeling the distribution of Pr(t←d)

  13. Experiments • DTF-rp • DTF-sample • DTF-normal • CORI-all: ALL done by CORI • CORI-rs: RS done by CORI, but fusion by DTF

  14. Optimum experiments

  15. Fixed number of selected DLs • DTF-rp and DTF-normal are very close • They performances similarly with CORI-all on mid-length query, but worse on short and long queries.

  16. Sensitivity to query length • Short queries: stable • Mid-length queries: good only for learning from mid-length queries • Long queries: short query leaning performance worst, mid-length is best

  17. Additional cost • DTF-rp and DRT-normal performs closely • Better than DTF-sample and two CORI’s

  18. Conclusions • Good theoretic foundation • Dynamically selection #libraries, and #docs • Performs competitively to CORI’s • Can include other types of cost

  19. Problems? • The authors claim this can solve redundancy problem by the same way. • But I think it CANNOT. Why? • The estimation doesn’t work if not selecting documents from continues places.

More Related