Searching the web ii
Download
1 / 10

Searching the Web II - PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on

Searching the Web II. The Web. Why is it important: “Free” ubiquitous information resource Broad coverage of topics and perspectives Becoming dominant information collection Growth and jobs Web access methods Search (e.g. Google) Directories (e.g. Yahoo!) Other …. Web Characteristics.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Searching the Web II' - patrick-barrett


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The web
The Web

  • Why is it important:

    • “Free” ubiquitous information resource

    • Broad coverage of topics and perspectives

    • Becoming dominant information collection

    • Growth and jobs

  • Web access methods

    Search (e.g. Google)

    Directories (e.g. Yahoo!)

    Other …


Web characteristics
Web Characteristics

  • Distributed data

  • High volatility

  • Large volume

  • Unstructured data

  • Quality of data

  • Heterogeneous data


Web tasks
Web Tasks

  • Precision is the key

    • Goal: first 10-100 results should satisfy user

    • Requires ranking that matches user’s need

    • Recall is not important

      • Completeness of index is not important

      • Comprehensive crawling is not important


Browsing
Browsing

  • Web directories

    • Human-organized taxonomies of Web sites

    • Small portion (< than 1%) of Web pages

      • Remember that recall (completeness) is not important

      • Directories point to logical web sites rather than pages

    • Directory search returns both categories and sites

    • People generally browse rather than search once they identify categories of interest


Metasearch
Metasearch

  • Search a number of search engines

  • Advantages

    • Do not build their own crawler and index

    • Cover more of the Web than any of their component search engines

  • Difficulties

    • Need to translate query to each engine query language

    • Need to merge results into a meaningful ranking


Metasearch ii
Metasearch II

  • Merging Results

    • Voting scheme based on component search engines

      • No model of component ranking schemes needed

    • Model-based merging

      • Need understanding of relative ranking, potentially by query type

  • Why they are not used for the Web

    • Bias towards coverage (e.g. recall), which is not important for most Web queries

    • Merging results is largely ad-hoc, so search engines tend to do better

  • Big application: the Dark Web


Using structure in search
Using Structure in Search

  • Languages to search content and structure

    • Query languages over labeled graphs

      • PHIQL: Used in Microplis and PHIDIAS hypertext systems

      • Web-oriented: W3QL, WebSQL, WebLog, WQL


Using structure in search1
Using Structure in Search

  • Other use of structure in search

    • Relevant pages have neighbors that also tend to be relevant

    • Search approaches that collect (and filter) neighbors to returned pages


Web query characteristics
Web Query Characteristics

  • Few terms and operators

    • Average 2.35 terms per query

      • 25% of queries have a single term

    • Average 0.41 operators per query

  • Queries get repeated

    • Average 3.97 instances of each query

    • This is very uneven (e.g. “Britney Spears” vs. “Frank Shipman”)

  • Query sessions are short

    • Average 2.02 queries per session

    • Average of 1.39 pages of results examined

  • Data from 1998 study

    • How different today?


ad