CS155b: E-Commerce. Lecture 15: March 6, 2003 Web Searching and Google. Finding Information on the Internet. The Internet is so successful partly because it is so easy to publish information on the World Wide Web.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lecture 15: March 6, 2003
Web Searching and Google
The Internet is so successful partly because
it is so easy to publish information on the
World Wide Web.
There is still ongoing research to find better
ways to solve these problems!
(Not Like Call Graphs)
Xp>0, p T non-negative “Authority Weights”
Yp>0, p T non-negative “Hub Weights”
I operation Update Authority Weights
O operation Update Hub Weights
Normalize: X2 = Y2 = 1
X Y Z
Repeat until Convergence
Apply I /* Update Authority weights */
Apply O /* Update Hub Weights */
Return Limit (X*, Y*)
A = n x n “Adjacency Matrix”
Rewrite I and O:
X ATY ; Y AX
Xi = (ATA) i-1 ATZ ; Yi = (AAT)iZ
AAT Symm., Non-negative and Z = (1,1,…, 1)
X* = lim Xi = 1(ATA)
Y* = lim Yi = 1 (AAT)
q Search Engine |S| < k
Base Set T:
(In S, S , S) and < d links/page
Remove “Internal Links”
Run Core Algorithm on T
From Result (X,Y), Select
C pages with max X* values
C pages with max Y* values
q = censorship + net
q = Gates
[Compares well with Yahoo!, Galaxy, etc.]
Objective Performance Criteria
Dependence on Search Engine
Nondeterministic Choice of S and T
Scalable Search Services:
* Unlike other search engines, businesses cannot pay to modify PageRank results. (Note that employees can, sometimes, but only in special cases like hiding sensitive data by special request.)
2. The web server sends the query to the Index Server cluster, which matches the query to documents.
1. The user enters a query on a web form sent to the Google web server.
4. The list, with abstracts, is displayed by the web server to the user, sorted(using a secret formula involving PageRank).
3. The match is sent to the Doc Server cluster, which retrieves the documents to generate abstracts and cached copies.
Google’s Zeitgeist has interesting statistics about
people’s searches by logging the search queries!
Origin of Google searches
by country (October 2001)
Languages used to search Google(March 2001 – January 2003)
Top Ten Brand