Exploring Interactive Information Retrieval Studies

An Examination of Different Delivery Modes for Interactive IR Studies Diane Kelly School of Information and Library Science University of North Carolina Schloss Dagstuhl, IIR Seminar, March 03, 2009

Different Types of IIR Studies • Standard Evaluation • Usability Study • Experiment • Requires manipulation of independent variable • Random assignment to condition • Lab and Field Experiments • Log-based Analysis • Information-Seeking (Online and Otherwise)

Other types of IIR Studies • Infrastructure Development (NOT an IIR or “User” study) • “Users” made relevance assessments • “Users” label objects for training data

Different Types of Online Studies • Web Experiment • Remote Usability Studies • Synchronous • Asynchronous • Surveys (Questionnaire Mode) • Correlation Designs • Often used to test psychometric properties of an instrument • Interviews and Focus Groups • Mechanical Turk and ESP

Major Issues to Consider • Validity • Internal • External • Reliability • Sampling • Control • Sources of Variance

“Some say that psychological science is based on research with rats, the mentally disturbed, and college students. We study rats because they can be controlled, the disturbed because they need help, and college students because they are available.” - Birnbaum, M. H. (1999). Testing critical properties of decision making on the Internet. Psychological Science, 10, 399-407, pg. 399.

Some Good Things • Broader range of more diverse participants • Age • Education • Race • Culture • Geography • Sex • … • Targeted Recruitment • Large samples (increased statistical power) • Science becomes more accessible to more people

More Good Things • Experimental situation is less artificial (although not completely) • Familiarity and comfort with physical situation • No travel time • No coordination • No navigation

And Even More Good Things • Volunteer Bias (?) • Freedom to Quit • In general (condition-independent drop-out ) • As an indicator (condition-dependent drop-out) • Computation of refusal rates • Demand Effects • Experimenter Effects (includes biases introduced during execution of the experiment, data transformation, analysis and interpretation)

And a Few More Good Things • Lower costs • Openness • Replication

Some Bad Things • Control Issues (Cheating and Fraud) • Multiple submissions • Faking data • Collaborating with others • Imitation of treatments • Control Issues (Experimental Control) • Do subjects understand what they are suppose to be doing? • Multi-tasking • Interruptions • Consulting other sources • EWI

More Bad Things • So, time is not such a great measure anymore (but maybe it isn’t really a good measure anyway) • Self-selection Bias (Topical interests) • No control over recruitment • Attrition (too high is not good) • Technical variance • Communication challenges with subjects • Difficult to explain deception

And a few More … • Experiment “Marketplace” • Encourages researcher laziness and carelessness? • Requires more knowledge of experimental design and measurement • Measurement checks • Decisions to eliminate data • Bad designs waste people’s time

More Thoughts and Questions • Some of the “bad” things add random error to the model (which exists even in lab experiments) • But you gain more participants, so if this error stays constant proportionately to your sample size, then is this an issue? • Random assignment to condition CRITICAL • Ultimately, do you get what you pay for?

And Finally … • Many studies have found that results obtained from lab and web experiments are similar • A Web mode does not excuse poor experimental design or instrumentation • What are the implications with respect to reporting practices?

For Your Reference • Birnbaum, M. H. (2000). Psychological Experiments on the Internet. London, UK: Academic Press. • Olonso, O., Rose, D., & Stewart, B. (2008). Crowdsourcing for relevance feedback. SIGIR Forum, 42(4), 9-15. • http://psych.hanover.edu/research/exponnet.html

Exploring Interactive Information Retrieval Studies