1 / 13

HOW SEARCH ENGINE WORKS.

HOW SEARCH ENGINE WORKS. Aasim Bashir. What is a Search Engine?. Search engine: It is a website dedicated to search other websites and there contents. It is a program that searches documents for specified keywords. It returns a list of the documents where the keywords were found.

karl
Download Presentation

HOW SEARCH ENGINE WORKS.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HOW SEARCH ENGINE WORKS. Aasim Bashir.

  2. What is a Search Engine? • Search engine: • It is a website dedicated to search other websites and there contents. • Itis a program that searches documents for specified keywords. • It returns a list of the documents where the keywords were found.

  3. Examples of search engine. • There are many search engines but some of the most popular search engines are: • Google • Yahoo • Ask.com • Alta Vista. • Dogpile • Bing. etc

  4. Google Introduction. • Google was a research project in 1996 by Larry Page and Sergey Brin, who were both PhD students at Stanford University. • They thought that a search engine that could analyze the relationships between websites would product better results than other search engine. • They called their new creation "BackRub", because it checked the backlinks to estimate a site's importance. • The logo they had then was much different from today's logo, and the name was changed n September 7, 1998, when Larry Page and Sergey Brin bought the domain Google.com, and officially changed the name to Google.

  5. Today, Google is a publicly traded company that handles one of the most used search engines in the world. • The company currently employs 8,000 employees, and is based in Mountain View, California. • It also has several other headquarters in places like Seattle, Washington. • Google offers many innovative services, such as Blogger, Orkut, and Gmail, and since its introduction in 1996, it offers a wide variety of services, not just search anymore.

  6. What happens when we do a web search?

  7. When we do a Google search actually we are searching the web, we are searching Google's index of the web. • We do this by software programs called spiders. • Spiders start fetching a few web pages and then they follow the link and fetch the pages they point to.

  8. Spiders or Crawlers. • A spider, also known as a robot or a crawler, is actually a program that follows, or "crawls", links throughout the Internet, grabbing content from sites and adding it to search engine indexes. • Spiders only can follow links from one page to another and from one site to another. That is the primary reason why links to your site are so important. • Links to your website from other websites will give the search engine spiders more "food" to chew on. The more times they find links to your site, the more times they will stop by and visit. Google especially relies on its spiders to create their vast index of listings. • Spiders find Web pages by following links from other Web pages, but you can also submit your Web pages directly to a search engine or directory and request a visit by their spider.

  9. Googlebot • Googlebot is Google’s web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer. • It functions much like our web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Google’s indexer. • Googlebot consists of many computers requesting and fetching pages much more quickly than you can with your web browser. • Googlebot can request thousands of different pages simultaneously. To avoid overwhelming web servers, or crowding out requests from human users, Googlebot deliberately makes requests of each individual web server more slowly than it’s capable of doing. • When Googlebot fetches a page, it culls all the links appearing on the page and adds them to a queue for subsequent crawling. • Googlebot can quickly build a list of links that can cover broad reaches of the web. This technique, known as deep crawling, also allows Googlebot to probe deep within individual sites.

  10. Google’s Query Processor • The query processor has several parts, including the user interface (search box), the “engine” that evaluates queries and matches them to relevant documents, and the results formatter. • Page rank is Google’s system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower Page Rank. • Google considers over a hundred factors in computing a PageRank and determining which documents are most relevant to a query, including the popularity of the page, the position and size of the search terms within the page, and the proximity of the search terms to one another on the page.  • Google applies machine-learning techniques to improve its performance automatically by learning relationships and associations within the stored data. . For example, the spelling-correcting system. • Google gives more priority to pages that have search terms near each other and in the same order as the query. Google can also match multi-word phrases and sentences.

  11. Let’s see how Google’s processes a query.

  12. Google’s Indexer. • Googlebot gives the indexer the full text of the pages it finds. • These pages are stored in Google’s index database. • This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. • To improve search performance, Google ignores (doesn’t index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters). • Stop words are so common that they do little to narrow a search, and therefore they can safely be discarded. • The indexer also ignores some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google’s performance.

  13. Thank you. Any query please ask?

More Related