1 / 9

The anatomy of a Large-Scale Hypertextual Web Search Engine

The anatomy of a Large-Scale Hypertextual Web Search Engine. What we want from a search engine. Speed Quantity of Results Efficient Storage Space Quality of Results. Google attempts to bring us all of these aspects from search. Precision of result:. Second Generation Search Engine.

tybalt
Download Presentation

The anatomy of a Large-Scale Hypertextual Web Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The anatomy of a Large-Scale Hypertextual Web Search Engine

  2. What we want from a search engine. • Speed • Quantity of Results • Efficient Storage Space • Quality of Results Google attempts to bring us all of these aspects from search.

  3. Precision of result: Second Generation Search Engine Page Rank Anchor Text

  4. Page Rank The more number of links that is pointing to a page (from other pages), the higher the page rank will be. The probability that a random internet surfer will reach this page by randomly clicking links. Also determined by the number of links the page has pointing you have. The more links page A has, the more valued the link from page A to B will be. PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

  5. Anchor Text Each and every link on the internet will have some “invisible” text alongside it. This text is given by the page creator explaining what this link does, where it leads, or what it attempts to explain. By taking all of these links from hundreds of different sites, Google uses these anchor text to be able to provide most relevant search results.

  6. Proximity Search and Others Google keeps track of how close the related words are too each other and also keeps track of the visual presentation (font size, color, boldness ect).

  7. Crawling and Indexing • Google typically ran about 3. • Each crawler opens roughly 300 connections as once. • At peak performance, with 4 crawlers, Google can crawl 100 web pages per second. • Roughly 600K per second of data. • Parsing • Indexing documents into barrels • Sorting

More Related