100 likes | 173 Views
Explore the significance of the web as a free, vast information resource with diverse content and evolving data characteristics. Understand the key tasks involved in web search, browsing, and metasearch techniques. Learn about leveraging structure in search and the characteristics of web queries. Dive into the complexities and opportunities of navigating the dynamic online landscape.
E N D
The Web • Why is it important: • “Free” ubiquitous information resource • Broad coverage of topics and perspectives • Becoming dominant information collection • Growth and jobs • Web access methods Search (e.g. Google) Directories (e.g. Yahoo!) Other …
Web Characteristics • Distributed data • High volatility • Large volume • Unstructured data • Quality of data • Heterogeneous data
Web Tasks • Precision is the key • Goal: first 10-100 results should satisfy user • Requires ranking that matches user’s need • Recall is not important • Completeness of index is not important • Comprehensive crawling is not important
Browsing • Web directories • Human-organized taxonomies of Web sites • Small portion (< than 1%) of Web pages • Remember that recall (completeness) is not important • Directories point to logical web sites rather than pages • Directory search returns both categories and sites • People generally browse rather than search once they identify categories of interest
Metasearch • Search a number of search engines • Advantages • Do not build their own crawler and index • Cover more of the Web than any of their component search engines • Difficulties • Need to translate query to each engine query language • Need to merge results into a meaningful ranking
Metasearch II • Merging Results • Voting scheme based on component search engines • No model of component ranking schemes needed • Model-based merging • Need understanding of relative ranking, potentially by query type • Why they are not used for the Web • Bias towards coverage (e.g. recall), which is not important for most Web queries • Merging results is largely ad-hoc, so search engines tend to do better • Big application: the Dark Web
Using Structure in Search • Languages to search content and structure • Query languages over labeled graphs • PHIQL: Used in Microplis and PHIDIAS hypertext systems • Web-oriented: W3QL, WebSQL, WebLog, WQL
Using Structure in Search • Other use of structure in search • Relevant pages have neighbors that also tend to be relevant • Search approaches that collect (and filter) neighbors to returned pages
Web Query Characteristics • Few terms and operators • Average 2.35 terms per query • 25% of queries have a single term • Average 0.41 operators per query • Queries get repeated • Average 3.97 instances of each query • This is very uneven (e.g. “Britney Spears” vs. “Frank Shipman”) • Query sessions are short • Average 2.02 queries per session • Average of 1.39 pages of results examined • Data from 1998 study • How different today?