1 / 14

WIRED Week 2

WIRED Week 2. Syllabus Update Readings Overview. IR originally mostly for systems, not people IR in the last 25 years: classification and categorization systems and languages user interfaces and visualization A small world of concern The Web changed everything

gudrun
Download Presentation

WIRED Week 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WIRED Week 2 • Syllabus Update • Readings Overview

  2. IR originally mostly for systems, not people IR in the last 25 years: classification and categorization systems and languages user interfaces and visualization A small world of concern The Web changed everything Huge amount of accessible information Varied information sources Relatively easy to look for information Improving IR means improving learning Digital technology changes everything (again) Why IR?

  3. WIRED Focus • Information Retrieval: representation, storage, organization of, and access to information items • Focus is on the user information need • User information need: • Find all docs containing information on Austin which: • Are hosted by utexas.edu • Discuss restaurants • Emphasis is on the retrieval of information (not data, not just a keyword match)

  4. The Search • Who is John Battelle? • Magazine Editor: WIRED, The Industry Standard • Web 2.0 conference organizer • Business 2.0 magazine columnist • Federated Media Publishing • Boingboing.net “manager”

  5. Database of Intentions • What do you think the database of intentions is? • Is it more than Google’s Zeitgeist? • What we’re thinking about and interested in. • Everything we want to know and when we want to know it. • “the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result” (Battelle, p 6) • “a real time history of post-Web culture” (p 6) • What other databases like this are there? • How is this possible?

  6. Searchiness? • The “tasking” of search? • Everything could be a search task? • Every task has an ad associated with it? • Our expectations are met and made with search. • How would the Web work without search? • Yahoo and email links, LOTS of email links • You are your clickstream? • Products & services based on it • “marketing, media, technology, pop culture, international law, and civil liberties” (p 13)

  7. Elements of Search • Crawl • Index • Runtime system (query processor) • Segments the data • Analyzes the Crawl • Optimizes everything • Interface • Query • Reults • Users

  8. Search before Google • Traditional systems: SMART (Salton) • Strongly typed information, (traditional databases) • Not always interactive or easy to use • Library Catalogs online • Controlled vocabulary & limited records • Internet: Archie & Veronica • Titles only (mostly) over text • Web: WWW Wanderer, Web Crawler • Full text, HTML & links

  9. AltaVista gets serious • Web now large enough to be a challenge • Now enough content that you’d want to search it • Costs of hardware & bandwidth falling • Parallel crawlers • Significant CPU resources • 1995 = 16 million documents • Why didn’t people get it ?

  10. The Web goes Pro • Lycos • Anchor text & content location context • Yahoo • Directory & clean interface for browsing links • Adversiting & user (logs) analysis • AOL • Gateway to the internet for many • Excite • Consumer-driven, word relationships • Acquisitions of Magellan, WebCrawler ++ • MyExcite - the Portal • @Home (compete with AOL)

  11. Google is Born • Larry Page & Sergey Brin • Links are the key (Bibliometrics) • Impact factor (“link it if you like it”) • Patterns of citation (links) expand the text • Defending & setting the context of your work by associating it with others • Backrub • Crawl pages, store links, analyze them, publish • Large computing challenges • PageRank • Link counts with a recipe for deriving (relative) value • Value is who & and their rank too

  12. Google goes Pro • More resources for more data • Help with (significant) analysis design • Lack of commercial approach may have been a strength • Not ads, but just good search • Simple (non-existent) design of interface had an impact • More people getting online • Broadband adoption & stabilizing browsers • Growing content (to say the least)

  13. Assignments • Read weekly Primary Readings & Participate in class discussions 10% • Re-design Search Results interface 10% • Web (log) analytics 25% • “Google 2010” (5 page paper) 10% • Class Topic Presentation 15% • Main Project 30%

  14. How can (Web) IR be better? Better IR models Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications Projects and/or Papers Overview

More Related