1 / 48

WIRED Week 7

WIRED Week 7. Quick review of Information Seeking Readings Review Questions & Comment How does this affect IR system use? How would this change evaluating IR systems? Topic Discussions Web search lab game!. What Is Information Seeking?.

aviva
Download Presentation

WIRED Week 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WIRED Week 7 • Quick review of Information Seeking • Readings Review • Questions & Comment • How does this affect IR system use? • How would this change evaluating IR systems? • Topic Discussions • Web search lab game!

  2. What Is Information Seeking? • “a process in which humans purposefully engage in order to change their state of knowledge.” p. 5 • “a process driven by human’s need for information so that they can interact with the environment.” p. 28 • “begins with recognition and acceptance of the problem and continues until the problem is resolved or abandoned” p. 49Marchionini • more than just representation, storage and systematic retrieval

  3. Information Seeking in Context Learning Information Seeking Information Retrieval Browsing Strategy Analytical Strategy

  4. How do we search? • Analytical • careful planning • recall of query terms • iterative query reformulations • examination of results • batched • Browsing • heuristic • opportunistic • recognizing relevant information • interactive (as can be)

  5. Iseek - WebTracker study • Corporate IT and knowledge workers • In work environment • Own browser and network connection • Long-term study (weeks) • Overall Web use analyzed • Bookmarks, printed pages • How sites/pages found • Frequency of page visits

  6. Web Study Methodology • Surveys • Interviews • Web Use Data* • History Files • WebTracker • Server Logs • Bookmarks* • Printouts

  7. Study Elements • Research Design • Field Work • Field Workers • Data Collection 1. Questionnaire survey 2. WebTracker application (and Proxy Server) 3. Personal interviews

  8. Collecting Web Client Data • Modified client • Pitkow and Catledge 1995 • Bookmarks • Chosen Web sites are personal information space • Most valuable data file on user’s system • Automatically organizing bookmarks • History logs • The history mechanism • Most promising source for usage data

  9. WebTracker Expanded Window

  10. WebTracker Log

  11. Data Analysis • Log files tabulated into spreadsheets • Examined for clusters or patterns of behavior • Selection of episodes of Information Seeking behavior • a highlighting of the episode by the participant during the personal interview; • evidence of the episode having consumed a relatively substantial amount of time and effort; • evidence that the episode was a recurrent activity. • Determined the modes of scanning & moves exercised by the participants

  12. Behavioral Model • Recurring Web behavioral patterns that relate people’s browser actions (Web moves) to their browsing/searching context (Web modes) • Modes of scanning: Aguilar (1967) & Weick & Daft (1983, 1984) • Moves in information seeking behavior: Ellis (1989) & Ellis et. al. (1993, 1997)

  13. Modes of Scanning

  14. Modes of Scanning for Information

  15. ISeek Behaviors & Web Moves

  16. Modes & Moves Model

  17. Behavioral Model Verification • 61 identifiable episodes

  18. Behavioral Model Results • People who use the Web engage in 4 complementary modes of information seeking • Certain browser based actions & events indicate a particular mode of information seeking • Surprises • No Explicit Instances of Monitoring to Support Formal Searching • Very Few Instances of “Push” Monitoring • Extracting Involved Basic Search Strategies Only

  19. Interview Highlights • Most useful work-related sites: • Resource sites by associations & user groups • News sites • Company sites • Search engines • Most people do not avidly search for new Web sites • Criteria to bookmark is largely based on a site providing relevant & up-to-date information • Learning about new Web sites: • Search engines • Magazines & newsletters • Other people/colleagues

  20. Survey Highlights • The Web was the 3rd most frequently used source • Participants spent about 20% of their work hours using the Web • Majority looked for technical information on the Web • Quality of Web information was perceived to be “very high” (reliable) • Web was perceived as accessible as other “internal” sources however less accessible than mass media sources • Few participants deliberately set out to search for new sites

  21. Study 1 Summary • Behavioral model of information seeking on the Web • People who use the Web engage in complementary modes of information seeking • Certain browser based actions & events indicate particular moves in information seeking • The study suggests: • that a behavioral framework that relates user motivations and Web moves may be helpful in analyzing Web-based Information Seeking • that multiple, complementary methods of collecting qualitative and quantitative data may help compose a richer portrayal of how individuals use Web-based information in their natural work settings

  22. Study Recommendations

  23. Iseek Expanded Study (2) • Larger Dataset • One Organization • Longer Duration • Open-ended Interviews • IT Survey • More Quantitative Modeling • Glassman (1994); • Catledge & Pitkow (1995); • Tauscher & Greenberg (1997a, 1997b); • Huberman, Pirolli, Pitkow, & Lukose (1998)

  24. New Types Data Collection • Sources • Modified Logs • Interviews (More Focused) • Survey (Broader Focus) • Field Observation (Cube Work) • Volume • Over 1400 Consistent Users • Over a Month of Web Use • 8+ GB of data

  25. Collecting Web Server Data • Web Server Log Accuracy • Hit - a single file is requested from the Web server • View - all of the information contained on a single Web page • Visit - one series of views at a particular Web site. • Proxy Server Logs • Day sampling - stop caching and analyzing data. • IP sampling - cancel caching of particular Web users and measuring these results only • Continuous sampling - use cookie files to track a particular user(s) • KDD

  26. Survey Highlights • Users not motivated to change/update browser versions or startup page • IT made no modifications of browser until recently, primarily for system access testing • Most of most frequent users from technical departments • All IT system work now Web-specific

  27. Interview Highlights • Corporate adoption of Internet access driven by Intranet development • Local portrayals of successful Web work drove rapid adoption • Use of Intranet viewed as both resource conservation and expanded work • Logging of Web use data not a high concern • Open to recommendations to improve Web use • “Webify”ing Everything seen as good

  28. KDD Highlights • Extremely High Data Collection Reliability • Tightly-focused Web Use (business sites) • Very Small (Determinable) Inappropriate Use ( >.001%) • Lower than Expected Search Engine Use • Influenced by Startup Page • Internal Search Results Pages Used • Higher than Expected (Average) Use of Intranet

  29. KDD Use Highlights • 40,000+ episodes • 11:15 average episode length • Search term mode of 1 • Not dominantly work-related terms • Use of intranet search results influential

  30. Updated Behavioral Model • 32,512 identifiable episodes

  31. Behaviors Breakdown

  32. Other Studies • Tend to focus on server logs, a broad range of Web users, general Web seeking activity, quantitative methods • Glassman (1994): Proxy Study • Catledge & Pitkow (1995): Surveys and Client tool; • Tauscher & Greenberg (1997a, 1997b): The Back button; • Ingwersen (1995 & 1997): Informetrics • Huberman, Pirolli, Pitkow, & Lukose (1998): Information Foraging, “Law of Surfing” • Huberman “Laws of the Web” (2001)

  33. Study 2 Summary • Behavioral Model Scales Up • Server Logs Provide Significant Gains in Quantity • Server Logs Provide Challenges in Deriving Quality • Organizations Provide Focused View of Overall Web Use • Knowledge Workers Collaborate (But Not Enough)

  34. Summary • (New) Methodology • Provide new ideas for data collection & cleaning tools • Verify models of Information Seeking and Web Use • Discover models of Web usage • Find different types of Web users • Gain rich descriptions of perception of Web & Web use • Evoke new system & interface designs

  35. Other Tools for Web Studies • Pete Pirolli, Rob Reeder, Ed Chi, et. al (UIR Group Xerox PARC) Web Logger • Eytan Adgar, Bernardo Huberman (Web Ecology Group @ PARC, now HP) • Andy Edmonds – Uzilla.net • Vividence • Web Evaluation Tool (WET) • Eye Tracking (*)

  36. Improving Web Use • Expert Systems - SNLP • Multimedia Databases & Metadata • Display Technology • Better GUIs • Better, More Available Search Engines/query Syntax • Desktop Search • Ranking • Relevance • Help expert users get more expert

  37. Web Activities Taxonomies • What types of activities on the Web have impact? • What we do vs. what seems significant • Purpose of people’s search • Find • Get a fact or document • Download information • Find out about a product • Compare/Choose: 51% • Methods used to find information • Explore, Monitor, Find, Collect: 71% • Content for which they are searching • Medical: 18%, People: 13%, …

  38. Berrypicking & IR Flexibility • IR systems are rational, users aren’t (always) • We don’t search in a linear model • Single query, one good result • We gradually build on what we know, how we find it • Footnote chasing (backward chaining) • Citation searching (forward chaining) • Journal run (favorite sites) • Area scanning (browsing) • Subject searches in bibliographies, abstracts & indices • Author searching • We combine all of these when searching • Interface support for each & combinations

  39. Berrypicking Paths

  40. Web Search Studies Framework • Web IR is still relatively new • Differences in users & information • Changes in IR systems are rapid • Who doesn’t search now? • “A Web searching study focuses on isolating searching characteristics of searchers using a Web IR system via analysis of data, typically gathered from transaction logs.” p 3 • Studying Search Engine use • AltaVista, Excite • Web Searching Studies • Single & Multiple Web sites

  41. Characterizing Browsing • Modifed XMosiac to learn Web browser behavior • Path lengths key (but changed) • Types of users: • Serendipitous browsers – little repetition, short sequences • General purpose browsers – average, repeated actions • Searchers – long navigational sequences

  42. Cognitive Strategies in Web Search • Systems help with: • re-representation - different external representations, that have the same abstract structure, make problem-solving easier or more difficult. It also refers to how different strategies and representations, varying in their efficiency for solving a problem. • graphical constraining - constrain the kinds of inferences that can be made about the underlying represented concept. • temporal and spatial constraining - different representations make relevant aspects of processes and events more salient when distributed over time and space.

  43. Cognitive Strategies • Searching Conditions • Dispersed or Category Structures • Fact finding • Exploratory searching • Novice & Experiences users • Top-down, bottom-up & mixed

  44. Reading Time, Scrolling & Interaction • Can implicit feedback improve relevancy? • 561 documents, 6 subjects • Read documents & score them • Better than reading, saving & printing? • Measure use now vs. later • Focused on document, not activity • How do you know the user is reading? • Is saving a relevance measure? • No differences noted in scrolling (4.28) • What about following links? • Finding, highlighting, copying?

  45. How do we really use the Web? • People don’t read, they scan Web pages • We move quickly, we know we can go back • Quick experimentation & short memory • Behaviors that work are reinforced & continued • Satificing makes measures of quality difficult • Web pages as Billboards? • What’s billboard information for IR systems?

  46. Revisitation Patterns on WWW • Mostly Re-Visits (58%) • Continually Visit New Pages • Access Only A Few Pages Frequently • Clusters (Sets) & Short Paths of URLs • Frequency • Recency • “Distance” • Types of Navigation • Hub and Spoke • Depth Searching (lots of links before returning, if at all) • Guided Tour (Tasks)

  47. Revisitation Patterns 2 • Back Button Use Affects Everything (Even More Since Study) • Navigation Methods Differ • Reasons for Revisiting • Explore Further • Use Feature (Search or Home Page) • “On the Way” to another Page (IA Problem) • Users Don’t Understand Browser History Very Well or Do They Misunderstand Page/Site Navigation? • Provide Navigation Support • Work with the Back Button – Don’t Break its Functionality

  48. Web search lab game • Break into groups • Answer a set of questions • Different rules for each search • Search as you would • Talk & decide before each move • No typing this time! • Search as you would again • Fast as possible

More Related