Designing and Evaluating Search Interfaces Prof. Marti Hearst School of Information UC Berkeley
Outline • Why is Supporting Search Difficult? • What Works? • How to Evaluate?
Why is Supporting Search Difficult? • Everything is fair game • Abstractions are difficult to represent • The vocabulary disconnect • Users’ lack of understanding of the technology • Clutter vs. Information
Everything is Fair Game • The scope of what people search for is all of human knowledge and experience. • Other interfaces are more constrained (word processing, formulas, etc) • Interfaces must accommodate human differences in: • Knowledge / life experience • Cultural background and expectations • Reading / scanning ability and style • Methods of looking for things (pilers vs. filers)
Abstractions Are Hard to Represent • Text describes abstract concepts • Difficult to show the contents of text in a visual or compact manner • Exercise: • How would you show the preamble of the US Constitution visually? • How would you show the contents of Joyce’s Ulysses visually? How would you distinguish it from Homer’s TheOdyssey or McCourt’s Angela’s Ashes? • The point: it is difficult to show text without using text
Vocabulary Disconnect • If you ask a set of people to describe a set of things there is little overlap in the results.
The Vocabulary Problem Data sets examined (and # of participants) • Main verbs used by typists to describe the kinds of edits that they do (48) • Commands for a hypothetical “message decoder” computer program (100) • First word used to describe 50 common objects (337) • Categories for 64 classified ads (30) • First keywords for a each of a set of recipes (24) Furnas, Landauer, Gomez, Dumais: The Vocabulary Problem in Human-System Communication. Commun. ACM 30(11): 964-971 (1987)
The Vocabulary Problem These are really bad results • If one person assigns the name, the probability of it NOT matching with another person’s is about 80% • What if we pick the most commonly chosen words as the standard? Still not good: Furnas, Landauer, Gomez, Dumais: The Vocabulary Problem in Human-System Communication. Commun. ACM 30(11): 964-971 (1987)
Lack of Technical Understanding • Most people don’t understand the underlying methods by which search engines work.
People Don’t Understand Search Technology A study of 100 randomly-chosen people found: • 14% never type a url directly into the address bar • Several tried to use the address bar, but did it wrong • Put spaces between words • Combinations of dots and spaces • “nursing spectrum.com” “consumer reports.com” • Several use search form with no spaces • “plumber’slocal9” “capitalhealthsystem” • People do not understand the use of quotes • Only 16% use quotes • Of these, some use them incorrectly • Around all of the words, making results too restrictive • “lactose intolerance –recipies” • Here the – excludes the recipes • People don’t make use of “advanced” features • Only 1 used “find in page” • Only 2 used Google cache Hargattai, Classifying and Coding Online Actions, Social Science Computer Review 22(2), 2004 210-227.
People Don’t Understand Search Technology Without appropriate explanations, most of 14 people had strong misconceptions about: • ANDing vs ORing of search terms • Some assumed ANDing search engine indexed a smaller collection; most had no explanation at all • For empty results for query “to be or not to be” • 9 of 14 could not explain in a method that remotely resembled stop word removal • For term order variation “boat fire” vs. “fire boat” • Only 5 out of 14 expected different results • Understanding was vague, e.g.: • “Lycos separates the two words and searches for the meaning, instead of what’re your looking for. Google understands the meaning of the phrase.” Muramatsu & Pratt, “Transparent Queries: Investigating Users’ Mental Models of Search Engines, SIGIR 2001.
Cool Doesn’t Cut It • It’s very difficult to design a search interface that users prefer over the standard • Some ideas have a strong WOW factor • Examples: • Kartoo • Groxis • Hyperbolic tree • But they don’t pass the “will you use it” test • Even some simpler ideas fall by the wayside • Example: • Visual ranking indicators for results set listings
Metadata Matters • When used correctly, text to describe text, images, video, etc. works well • “Searchers” often turn into “browsers” with appropriate links • However, metadata has many perils • The Kosher Recipe Incident
Small Details Matter • UIs for search especially require great care in small details • In part due to the text-heavy nature of search • A tension between more information and introducing clutter • How and where to place things important • People tend to scan or skim • Only a small percentage reads instructions
Small Details Matter • UIs for search especially require endless tiny adjustments • In part due to the text-heavy nature of search • Example: • In an earlier version of the Google Spellchecker, people didn’t always see the suggested correction • Used a long sentence at the top of the page: “If you didn’t find what you were looking for …” • People complained they got results, but not the right results. • In reality, the spellchecker had suggested an appropriate correction. • Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html
Small Details Matter • The fix: • Analyzed logs, saw people didn’t see the correction: • clicked on first search result, • didn’t find what they were looking for (came right back to the search page • scrolled to the bottom of the page, did not find anything • and then complained directly to Google • Solution was to repeat the spelling suggestion at the bottom of the page. • More adjustments: • The message is shorter, and different on the top vs. the bottom • Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html
Small Details Matter • Layout, font, and whitespace for information-centric interfaces requires very careful design • Example: • Photo thumbnails • Search results summaries
What Works for Search Interfaces? • Query term highlighting • in results listings • in retrieved documents • Term Suggestions (if done right) • Sorting of search results according to important criteria (date, author) • Grouping of results according to well-organized category labels (see Flamenco) • DWIM only if highly accurate: • Spelling correction/suggestions • Simple relevance feedback (more-like-this) • Certain types of term expansion • So far: not really visualization Hearst et al: Finding the Flow in Web Site Search, CACM45(9), 2002.
Highlighting Query Terms • Boldface or color • Adjacency of terms with relevant context is a useful cue.
found! found! don’t know don’t know Highlighted query term hits using Google toolbar Microso US Blackout PGA Microsoft
How to Introduce New Features? • Example: Yahoo “shortcuts” • Search engines now provide groups of enriched content • Automatically infer related information, such as sports statistics • Accessed via keywords • User can quickly specify very specific information • united 570 (flight arrival time) • map “san francisco” • We’re heading back to command languages!
Introducing New Features • A general technique: scaffolding • Scaffolding: • Facilitate a student’s ability to build on prior knowledge and internalize new information. • The activities provided in scaffolding instruction are just beyond the level of what the learner can do already. • Learning the new concept moves the learner up one “step” on the conceptual “ladder”
Scaffolding Example • The problem: how do people learn about these fantastic but unknown options? • Example: scaffolding the definition function • Where to put a suggestion for a definition? • Google used to simply hyperlink it next to the statistics for the word. • Now a hint appears to alert people to the feature.
Query Reformulation • Query reformulation: • After receiving unsuccessful results, users modify their initial queries and submit new ones intended to more accurately reflect their information needs. • Web search logs show that searchers often reformulate their queries • A study of 985 Web user search sessions found • 33% went beyond the first query • Of these, ~35% retained the same number of terms while 19% had 1 more term and 16% had 1 fewer Use of query reformulation and relevance feedback by Excite users, Spink, Janson & Ozmultu, Internet Research 10(4), 2001
Query Reformulation • Many studies show that if users engage in relevance feedback, the results are much better. • In one study, participants did 17-34% better with RF • They also did better if they could see the RF terms than if the system did it automatically (DWIM) • But the effort required for doing so is usually a roadblock. • Before the web and in most research, searches have to select MANY relevant documents or MANY terms. Koenemann & Belkin, A Case for Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness, CHI’96
Query Reformulation • What happens when the web search engines suggests new terms? • Web log analysis study using the Prisma term suggestion system: Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.
Query Reformulation Study • Feedback terms were displayed to 15,133 user sessions. • Of these, 14% used at least one feedback term • For all sessions, 56% involved some degree of query refinement • Within this subset, use of the feedback terms was 25% • By user id, ~16% of users applied feedback terms at least once on any given day • Looking at a 2-week session of feedback users: • Of the 2,318 users who used it once, 47% used it again in the same 2-week window. • Comparison was also done to a baseline group that was not offered feedback terms. • Both groups ended up making a page-selection click at the same rate. Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.
Query Reformulation Study Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.
Query Reformulation Study • Other observations • Users prefer refinements that contain the initial query terms • Presentation order does have an influence on term uptake Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.
Prognosis: Query Reformulation • Researchers have always known it can be helpful, but the methods proposed for user interaction were too cumbersome • Had to select many documents and then do feedback • Had to select many terms • Was based on statistical ranking methods which are hard for people to understand • RF is promising for web-based searching • The dominance of AND-based searching makes it easier to understand the effects of RF • Automated systems built on the assumption that the user will only add one term now work reasonably well • This kind of interface is simple
Supporting the Search Process • We should differentiate among searching: • The Web • Personal information • Large collections of like information • Different cues useful for each • Different interfaces needed • Examples • The “Stuff I’ve Seen” Project • The Flamenco Project
The “Stuff I’ve Seen” project • Did intense studies of how people work • Used the results to design an integrated search framework • Did extensive evaluations of alternative designs • The following slides are modifications of ones supplied by Sue Dumais, reproduced with permission. Dumais, Cutrell, Cadiz, Jancke, Sarin and Robbins, Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003.
Searching Over Personal Information • Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, notes) Slide adapted from Sue Dumais.
The “Stuff I’ve Seen” project • Unified index of items touched recently by user • All types of information, e.g., files of all types, email, calendar, contacts, web pages, etc. • Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) • Automatic and immediate update of index • Rich UI possibilities, since it’s your content • Search only over things already seen • Re-use vs. initial discovery Slide adapted from Sue Dumais.
SIS Interface Slide adapted from Sue Dumais
Search With SIS Slide adapted from Sue Dumais