1 / 16

Chapter 2 : The Web and the Problem of Search

Chapter 2 : The Web and the Problem of Search. The size of the web, and how is it measured. Search engine usage statistics. The bow-tie structure of the web. The small-world web. Web information seeking strategies. A taxonomy of web searches. Web search versus Information Retrieval.

roseannec
Download Presentation

Chapter 2 : The Web and the Problem of Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 : The Web and theProblem of Search • The size of the web, and how is it measured. • Search engine usage statistics. • The bow-tie structure of the web. • The small-world web. • Web information seeking strategies. • A taxonomy of web searches. • Web search versus Information Retrieval. • Differences between global and local search. • Differences between search and navigation.

  2. Web size statistics • Number of accessible web pages – latest estimate, May 2005, 11.5 billion. • The deep (or hidden or invisible) web contains 400-550 times more information. • Coverage (i.e. the proportion of the web indexed) is crucial for search engines.

  3. Measuring the size of the web • Capture-recapture method • SE1 is the number of pages indexed first search engine. • QSE2 is the number of pages returned by second search engine for typical queries. • OVR is the number of pages returned by both search engines for typical queries. • Estimate = (SE1 x QSE2)/OVR • Estimate of 64.81 million web sites as of June 2005.

  4. Web usage statistics • Over 10% of the world’s population were online as of late 2004. • Number of broadband users is growing (over 50% of connected Americans use broadband). • Search engine usage as of June 2004: • Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) • 200 million hits per day to Google (mid 2004).

  5. Tabular Data versus Web Data Figure 2.1: A database table versus a web site

  6. Structure of the web Figure 2.2: Map of the Internet (1998)

  7. Structure of the web Figure 2.3: Web pages related to dcs.bbk.ac.uk (see www.touchgraph.com)

  8. Structure of the web Figure 2.4: Bow-tie shape of the web

  9. The small-world web • Over 75% of the time there is no directed path from one random web page to another. • When a directed path exists its average length is 16 clicks. • When an undirected path exists its average length is 7 clicks. • Short average path between pairs of nodes is characteristic of a small-world network.

  10. Web information seeking strategies • Direct navigation • Enter the URL directly into the browser. • Navigation within a directory • Use a web portal as an entry point to the web. • Information seeking on the web is problematic and more users are turning to search engines.

  11. Navigation using a search engine Figure 2.5: Information seeking

  12. A taxonomy of web searches • Informational – acquire some information about a topic from web pages. • Navigational – find a site to start navigation from. • Transactional – perform some activity mediated by a web site.

  13. Web search versus Information Retrieval • The scale of web search is way beyond traditional information retrieval. • The web is very dynamic. • The web contains an enormous amount of duplication. • The quality of web pages is not uniform. • The range of topics on the web is open. • The web is globally distributed. • Users typical habits are different (short queries, inspect only top-10 pages). • The web is hypertextual.

  14. Information retrieval evaluation Figure 2.6: Recall versus precision

  15. Differences between global and local search • Local search engines on web sites have a bad reputation. • Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine. • Many companies do not invest in local search. • Content management is a problem. • Language may be a problem. • Information needs on web sites may be different.

  16. Differences between search and navigation • Search – employing a search engine to find information. • Navigation (or surfing) – employing a link-following strategy to find information. • The web encourages a combination of search, navigation and browsing.

More Related