1 / 11

Inside Internet Search Engines: Products

Inside Internet Search Engines: Products. William Chang and Jan Pedersen. Web Oracle One, Two, Three... . Network of computers? Network of hypertext? Network of people? Internet...is a place where you can always find someone to help answer any question, or get anything done.

keanu
Download Presentation

Inside Internet Search Engines: Products

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inside Internet Search Engines:Products William Chang and Jan Pedersen Sigir’99

  2. Web Oracle One, Two, Three... • Network of computers? • Network of hypertext? • Network of people? • Internet...is a place where you can always find someone to help answer any question, or get anything done. • Productize that! Sigir’99

  3. Who’s Who and What’s What? • Query logs • what do people look for, besides sex? • What are indexible terms unbounded? • Can you index all possible phrases? • Formatting cue helps • Syntax helps • Stemming helps • Precision vs recall • WordNet -> PhraseNet? Sigir’99

  4. Who Likes What? • Too many hits! • the problem of indistinguishable scores • Spamming • the relevant and irrelevant • The web to the rescue • inside-out indexing Sigir’99

  5. Citation Index or Popularity Contest? • Counting hyperlinks • Avoiding double-counting • Site clustering; what’s a site? • Judging the source • Hyperlinks revisited • Anchor text context; Yanhong Li • Why is this result hard to duplicate? • Does adding more context help? Sigir’99

  6. Who asks What? • Query logs revisited • Query-based indexing – why index things people don’t ask for? • If they ask for A, give them B • From atomic concepts to query extensions • Structure of questions and answers • Shyam Kapur’s chunks Sigir’99

  7. FAQs and not so FAQs • Usenet FAQs –Robin Burke’s FAQFinder • FAQ discovery • Where are the answers? Sigir’99

  8. Indexing • Different ways of crawling the web • Frequency of change • Frequency of request • Managing Terabytes or GigaURLs? • Real-time indexing Sigir’99

  9. Searching • Multiway merge and scoring • Logical operations • Query parsing and phrase searching • Query refinement • Distributed searching and the perfect merge Sigir’99

  10. Design Issues • Managing complexity • Managing memory • Managing parallelism • Managing data turnover • Managing scalability Sigir’99

  11. Futures • Vertical markets – healthcare, real estate, jobs and resumes, etc. • Localized search • Search as embedded app • Shopping 'bots • Open Problems • Has the bubble burst? Sigir’99

More Related