1 / 27

The study of information retrieval – a long view

Stephen Robertson Microsoft Research Cambridge and City University ser@microsoft.com. The study of information retrieval – a long view. A half-century of lab experiments. Cranfield began in 1958 some precursor experiments, but can treat that as the start of the experimental tradition in IR

daxia
Download Presentation

The study of information retrieval – a long view

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stephen Robertson Microsoft Research Cambridge and City University ser@microsoft.com The study of information retrieval – a long view IIiX, London

  2. A half-century of lab experiments Cranfield began in 1958 • some precursor experiments, but can treat that as the start of the experimental tradition in IR A brief timeline: • 1960s & 70s: various experiments, mostly with purpose-built test collections • late 60s on: exchange of test collections among researchers • mid-late-seventies: the ‘ideal’ test collection project • 1981: The Book (Information Retrieval Experiment, KSJ) • 1980s: relatively fallow period • 1990s to date: TREC • late 90s on: TREC spin-offs (CLEF, NTCIR, INEX etc.) • (and of course, late 90s on: web search engines) IIiX, London

  3. Some highlights (personal selection) • Cranfield 1 and 2 • Smart: VSM • Medlars: indexing and searching • KSJ: term weighting; test collections • Keen: index languages • Belkin and Oddy: ASK and user models • Okapi: simple search and feedback • UMass: various experimental systems • TREC: adhoc; feedback; the web; interaction • CLEF, NTCIR, INEX, DUC etc. [S Robertson, On the history of evaluation in IR, Journal of Information Science, Vol. 34, No. 4, 439-456 (2008)] IIiX, London

  4. A half-century of lab experiments Recapitulation of outcome (a gross over-simplification!) • Don’t worry too much about the NLP • ... or the semantics • ... or the knowledge engineering • ... or the interaction issues • ... or the user’s cognitive processes • but pay attention to the statistics • ... and to the ranking algorithms • bag-of-words rules OK IIiX, London

  5. A half-century... ... that deserves considerable celebration • but of course has a downside So, let’s explore a little • why we do lab experiments in the first place • what the alternatives are • what they might or might not tell us • what is good and bad about them • which directions they lead us in • more importantly, which they deflect us from and maybe, finally, • how they might be improved Note: this is my personal take on these questions! IIiX, London

  6. Abstraction Lab experiments involve abstraction • choice of variables included/excluded • control on variables • restrictions on values/ranges of variables [Note: models and theories also involve abstraction • but usually different abstractions, for different reasons] Why? • First, to make them possible IIiX, London

  7. Abstraction Why else? • study simple cases • clarify relationships • reduce noise • ensure repeatability • validate abstract theories IIiX, London

  8. Example: Newton’s laws IIiX, London

  9. The scientific method(simple-minded outline!) Collect empirical data • by observation and/or experiment Formulate hypotheses/models/theories Derive testable predictions • about events which may be studied empirically Conduct further observation/experiment • designed to test predictions Refine/reject models/theories • and reiterate IIiX, London

  10. Observation versus experiment(simple-minded outline again!) The experimental approach is a very powerful one Given a simple choice, we would usually choose experiment over observation • at least for hypothesis testing ... but the choice is rarely simple IIiX, London

  11. Traditional science The traditional image of science involves experiments in laboratories • but actually this is misleading Some sciences thrive in the laboratory • e.g. chemistry, small-scale physics Others have made a transition • e.g. the biochemical end of biology Others still are almost completely resistant • e.g. astrophysics, geology (not to mention such non-traditional sciences such as economics) IIiX, London

  12. Limitations of abstraction Abstractions involve assumptions • choosing one variable and eliminating another assumes that the two can be treated separately • if an abstraction is built into an experiment, then its assumptions cannot be tested by the experiment Even if we could do everything in a laboratory, we should not all do the same thing! • that is, we should not all use the same abstractions based on the same assumptions IIiX, London

  13. Limitations of abstraction Some phenomena resist abstraction • so that an abstract representation would be unrealistic or even illusory This gives us the basic conflict • between control and realism Note: I have exaggerated the polarity between observation and experiment • most investigations have elements of both ... but I have not exaggerated the conflict • most investigations struggle seriously with it • and have to make compromises IIiX, London

  14. Research in IR A conventional separation: • Laboratory experiments in the Cranfield/TREC tradition, usually on ranking algorithms • Obervational experiments addressing user-oriented issues Of course this is over-simplified • there are laboratory experiments addressing other issues • semantics, language, etc. • user interaction etc. • as well as observational experiments on algorithms IIiX, London

  15. Research in IR The Cranfield/TREC tradition is richer than it is often given credit for • TREC tracks and spin-offs have pushed the boundaries of lab experimentation, with some different outcomes Some examples: • QA: Here NLP and some aspects of semantics / knowledge engineering are critical • Cross-lingual: Here we need resources constructed from comparable corpora • The web: Here we are beginning to extract useful knowledge from usage data and resources such as wikipedia All of these are unconventional • Although all are dominated by statistical ideas IIiX, London

  16. Research in IR Communities involved in user-oriented issues have developed laboratory methods • in interactive tasks within TREC-like projects • in new forms of lab experiments Some core IR algorithm work is moving into observational user experiments • particularly in the web environment • particularly using click (and other user behaviour) data IIiX, London

  17. Observational IR research Aspects that suggest an observational approach: • interaction (human-system) • collaboration (human-human) • temporal scale • user cognition • context • task context • user knowledge IIiX, London

  18. Observational IR research Issues: • scale • it is hard to expand the scale of an observational study • reproducibility • it is hard to perform an observational study in such a way that it can be repeated by someone else • control • it is hard to control the variables that might affect an experiment (either the independent variables of interest, or the noise variables) IIiX, London

  19. Observational IR research Advantages: • realism • we have more confidence that the results of an observational study represent some kind of reality • context • those (perhaps unknown) aspects of context that are affect can be assumed to be present Maybe another significant difference... IIiX, London

  20. Hypothesis testing Back to the scientific method: • need to formulate predictions as testable hypotheses Properly, any prediction of a model or theory is a candidate for this • the objective is to test the model or theory • not to achieve some practical result from it • ideally, look for critical cases • where the predictions of the model in question differ from those of other models IIiX, London

  21. IR models and theories What are IR models designed to tell us? Different kinds of models might be expected to explain/predict many observables ... but in the Cranfield/TREC tradition, we usually interpret them in a narrow way specifally, we look only for effects on effectiveness This seems to be a limitation in our ways of thinking about them IIiX, London

  22. Hypothesis testing At least some user-oriented studies in IR ask other questions • and try to develop appropriate models/theories • e.g. about user behaviour Obviously we are interested in making systems better... • but a model or theory may (should) tell us more than just how to achieve that aim • and indeed other predictions may also be useful Even statistical models could be interpreted more broadly IIiX, London

  23. Other predictions(maybe accessible to statistical models) Patterns of term occurrence • maybe simply not believable Calibrated probabilities of relevance • hard to do but maybe useful Clicks • probability of click • patterns of click behaviour • e.g. click trails Other behaviours • abandonment • reformulation • dwell time IIiX, London

  24. Probabilities of relevance Usual assumption: • do not need actual probabilities, only rank order • the result of focussing on standard evaluation metrics • independence models are typically bad at giving calibrated probabilities Cooper suggested systems should give probabilities • as guide to user There are other practical reasons • filtering • combination of evidence IIiX, London

  25. Clicks There is a new movement in statistical modelling for IR: • we would like to integrate aspects of user behaviour into our models • specifically clicks Predicting patterns of click behaviour is a major component • which gives us the impetus to investigate and test other kinds of hypothesis Might use clicks to justify effectiveness metrics • but such predictions may also be useful for other reasons IIiX, London

  26. In general It seems to me that we should be trying to move in this direction • Constructing models or theories which are capable of making other kinds of predictions • Devising test of these other predictions • Laboratory tests • Observational tests … which would encourage rapprochement between the laboratory and observational traditions IIiX, London

  27. Finally I strongly believe in the science of search • as a theoretical science • in which models and theories have a major role to play • and as an empirical science • requiring the full range of empirical investigations • including, specifically, both laboratory experiments and observational studies The lack of a strong unified theory of IR reinforces the need for good empirical work IIiX, London

More Related