1 / 10

Authors: Brin S. and Page L. Presented By: Shiliang Xue

The Anatomy of a Large-Scale Hypertextual Web Search Engine In : 7th International WWW Conference (1998 ). Authors: Brin S. and Page L. Presented By: Shiliang Xue. Introduction. Google is designed to scale well to extremely large data sets. Fast crawling technology

atalo
Download Presentation

Authors: Brin S. and Page L. Presented By: Shiliang Xue

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Anatomy of a Large-Scale Hypertextual Web Search EngineIn: 7th International WWW Conference (1998) Authors:Brin S. and Page L. Presented By: ShiliangXue

  2. Introduction Google is designed to scale well to extremely large data sets. • Fast crawling technology • Storage space must be used efficiently • Efficient indexing system • Queries must be handled quickly

  3. System Features • Googlemakes use of the link structure of the Web to calculate a quality ranking(PageRank) for each web page. PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) 2. Google utilizes link to improve search results. Associate the text of a link with the page that it is on Associate the text of a link with the page that it points to

  4. System Features Aside from the PageRank and the use of anchor text, Google has several other features. • Location information • visual presentation details • Full raw HTML of pages is available in a repository

  5. System Anatomy Architecture Overview • Crawler • URLserver • Storeserver • Indexer • URLresolver • Sorter • Searcher

  6. System Anatomy Major Data Structures • BigFiles • Repository • Document Index • Lexicon • Hit Lists • Forward Index • Inverted Index

  7. System Anatomy Working Procedure • Crawling the Web • Indexing the Web • Parsing • Indexing Documents into Barrels • Sorting • Searching • The Ranking System • Feedback

  8. System Anatomy • Searching • Parse the query. • Convert words into wordIDs. • Seek to the start of the doclist in the short barrel for every word. • Scan through the doclists until there is a document that matches all the search terms. • Compute the rank of that document for the query. • If we are in the short barrels and at the end of any doclist, seek to the start of the doclist in the full barrel for every word and go to step 4. • If we are not at the end of any doclist go to step 4. • Sort the documents that have matched by rank and return the top k

  9. Performance

  10. Conclusion • Google is designed to be a scalable search engine. • The primary goal of Google is to provide high quality search results. • Google is a complete architecture for gathering web pages, indexing them, and performing search queries over them.

More Related