1 / 10

Searching the Web II

Searching the Web II. The Web. Why is it important: “Free” ubiquitous information resource Broad coverage of topics and perspectives Becoming dominant information collection Growth and jobs Web access methods Search (e.g. Google) Directories (e.g. Yahoo!) Other …. Web Characteristics.

Download Presentation

Searching the Web II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching the Web II

  2. The Web • Why is it important: • “Free” ubiquitous information resource • Broad coverage of topics and perspectives • Becoming dominant information collection • Growth and jobs • Web access methods Search (e.g. Google) Directories (e.g. Yahoo!) Other …

  3. Web Characteristics • Distributed data • High volatility • Large volume • Unstructured data • Quality of data • Heterogeneous data

  4. Web Tasks • Precision is the key • Goal: first 10-100 results should satisfy user • Requires ranking that matches user’s need • Recall is not important • Completeness of index is not important • Comprehensive crawling is not important

  5. Browsing • Web directories • Human-organized taxonomies of Web sites • Small portion (< than 1%) of Web pages • Remember that recall (completeness) is not important • Directories point to logical web sites rather than pages • Directory search returns both categories and sites • People generally browse rather than search once they identify categories of interest

  6. Metasearch • Search a number of search engines • Advantages • Do not build their own crawler and index • Cover more of the Web than any of their component search engines • Difficulties • Need to translate query to each engine query language • Need to merge results into a meaningful ranking

  7. Metasearch II • Merging Results • Voting scheme based on component search engines • No model of component ranking schemes needed • Model-based merging • Need understanding of relative ranking, potentially by query type • Why they are not used for the Web • Bias towards coverage (e.g. recall), which is not important for most Web queries • Merging results is largely ad-hoc, so search engines tend to do better • Big application: the Dark Web

  8. Using Structure in Search • Languages to search content and structure • Query languages over labeled graphs • PHIQL: Used in Microplis and PHIDIAS hypertext systems • Web-oriented: W3QL, WebSQL, WebLog, WQL

  9. Using Structure in Search • Other use of structure in search • Relevant pages have neighbors that also tend to be relevant • Search approaches that collect (and filter) neighbors to returned pages

  10. Web Query Characteristics • Few terms and operators • Average 2.35 terms per query • 25% of queries have a single term • Average 0.41 operators per query • Queries get repeated • Average 3.97 instances of each query • This is very uneven (e.g. “Britney Spears” vs. “Frank Shipman”) • Query sessions are short • Average 2.02 queries per session • Average of 1.39 pages of results examined • Data from 1998 study • How different today?

More Related