Getting to knowing the web
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Getting to knowing the Web PowerPoint PPT Presentation


  • 39 Views
  • Uploaded on
  • Presentation posted in: General

Getting to knowing the Web. How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape of the web? How hard is it to go from one page to another? How do people search for information? Can we categorize web searchers?

Download Presentation

Getting to knowing the Web

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Getting to knowing the web

Getting to knowing the Web

  • How big is the web and how do you measure it?

  • How many people use the web?

  • How many use search engines?

  • What is the shape of the web?

  • How hard is it to go from one page to another?

  • How do people search for information?

  • Can we categorize web searchers?

  • Differences b/w web search & Information Retrieval.

  • Differences between global and local search.

  • Differences between search and navigation.


How big is the web

How big is the web?

  • Number of accessible web pages – May 2005 estimate: 11.5 Billion pagesMost recent estimates? ________

  • The deep (or hidden or invisible) web “contains 400-550 times more information”(Are they serious?)

  • Coverage (i.e. the proportion of the web indexed) is crucial for search engines.Today, ____________ pages are indexed


How do you measure the size of web

How do you measure the size of web?

  • Capture-recapturemethod

    • SE1 = # of pages indexed search engine 1.

    • QSE2 = # of pages returned by search engine 2 for typical queries.

    • OVR = # of pages returned by both search engines for typical queries.

  • Estimate :SE1 / WWW = OVR / QSE2 =>WWW = (SE1 x QSE2) / OVR

WWW

OVR

SE1

QSE2

Lawrence & Giles: Searching the WWW


Relative size from overlap

AÇB

Relative Size from Overlap

Sample URLs randomly from A

Check if contained in B

and vice versa

AÇ B= (1/2) * Size A

AÇ B= (1/6) * Size B

(1/2)*Size A = (1/6)*Size B

\ Size A / Size B =

(1/6)/(1/2) = 1/3

Each test involves: (i) Sampling (ii) Checking(Assume for now that we can do them reliably)


How many people use the web ses

How many people use the web? SEs?

  • Over 10% of the world’s population were online as of 2004. Today? ________

  • Number of broadband users is growing (over 50% of connected Americans use broadband).

  • Search engine share as of June 2004:

    • Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) Today? _______

  • 200 million hits per day to Google (mid 2004). Today? ___


What is the shape of the web

What is the shape of the web?

“Map of the Internet” (1998)


Consider web sites

ConsiderWeb sites

Look at pathsand stronglyconnectedcomponents


What is the shape of the web1

What is the shape of the web?

Bow-tie shape of the web

Broder et.al: Graph structure of the web (2000)


But why is it a bowtie

But Why is it a Bowtie?

  • Maybe is a teapot, a daisy? A cauliflower?

  • It is a collection of Bowties, because it could not be anything else

  • Proof by construction


Bowtie web proof by construction

Bowtie Web: Proof by Construction

  • Start by considering one link per page

  • Pseudo-trees appear


The second link creates a bowtie

The second link creates a Bowtie


How hard is it to go from one page to another

How hard is it to go from one page to another?

  • Over 75% of the time there is no directed path from one random web page to another.

  • When a directed path exists its average length is 16 clicks.

  • When an undirected path exists its average length is 7 clicks.

  • Short average path between pairs of nodes is characteristic of a small-world network.

Kleiberg: The small-world phenomenon (we will revisit later)


How do people search for information

How do people search for information?

  • Direct navigation

    • Enter the URL directly into the browser.

  • Navigation within a directory

    • Use a web portal as an entry point to the web.

  • Information seeking on the web is problematic and more users are turning to search engines.

Broder: A taxonomy of web search


Can we categorize web searchers

Can we categorize web searchers?

Broder: A taxonomy of web search

  • Informational ____ %

    • acquire some information about a topic from web pages.

  • Navigational ____ %

    • find a site to start navigation from.

  • Transactional ____ %

    • perform some activity mediated by a web site.

Think of your own searches. Do you agree?

How did Broder found out these categories?

How did he measure the percentages?


Web search vs info retrieval

Web search vs. Info Retrieval

  • The scale of web search is way beyond traditional information retrieval.

  • The web is very dynamic.

  • The web contains an enormous amount of duplication.

  • The quality of web pages is not uniform.

  • The range of topics on the web is open.

  • The web is globally distributed.

  • Users typical habits are different (short queries, inspect only top-10 pages).

  • The web is hypertextual.


Differences b w global local search

Differences b/w global & local search

  • Local search engines on web sites have a bad reputation.

  • Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine.

  • Many companies do not invest in local search.

  • Content management is a problem.

  • Language may be a problem.

  • Information needs on web sites may be different.


Differences b w search navigation

Differences b/w search & navigation

  • Search –

    • employing a search engine to find information.

  • Navigation (or surfing) –

    • employing a link-following strategy to find information.

  • The web encourages a combination of search, navigation and browsing.


  • Login