1 / 24

How to build a better Google?

How to build a better Google?. Adam Bak IST 497E November 21, 2002. Google Timeline. 1995 March-December – Ph.D. candidates Sergey Brin and Larry Page meet at Stanford University and discuss ideas about new search technology 1996-1997 January 1996-December – Brin and Page create BackRub.

beau-boyer
Download Presentation

How to build a better Google?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to build a better Google? Adam Bak IST 497E November 21, 2002

  2. Google Timeline • 1995 • March-December – Ph.D. candidates Sergey Brin and Larry Page meet at Stanford University and discuss ideas about new search technology • 1996-1997 • January 1996-December –Brin and Page create BackRub

  3. Google Timeline • 1998 • August-December –Sergey and Larry raise one million dollars in funding and create Google Corporation • 10,000 search queries per day • 1999 • February-June –500,000 search queries per day

  4. Google Timeline • 1999 • August-December – 3 million searches per day • 2000 • May-June –18 million search queries per day • November-December – 60 million searches per day

  5. Google Timeline • 2002 • May – 150 million searches per day

  6. Google’s Current Technology • Page Rank • Does not count direct links • Page A would have a lower rank if pages B and C did not have a high weighting

  7. Google’s Current Technology • Hypertext-Matching Analysis • Font size – The larger and bolder the fonts, the higher the weights • Capitalization – Higher weights • Relative Distance – Example - Peanut Butter

  8. Google’s Search Capabilities • Images • Usenet • Search by language • File Types (key word filetype:) • News (new feature)

  9. Google’s Key Words • cache: Will retrieve the page that Google has stored in its cache • link: Will display pages that link to the given page • related: Will display pages that are similar to the specified page • info: Will show information about a particular page

  10. Google’s Key Words • stocks: Will treat the query as a stock ticker symbols • site: Will restrict the search to the given domain • allintitle: Will search words found only in the title • intitle: Will display results with the first word appearing in the title • allinurl: Will search words found only in the URL • inurl: Will display results with the first word appearing in the URL

  11. The big question • Can any improvements be done to make Google any better than it already is?

  12. Google’s Programming Contest • Started this year • Winner - Daniel Egnor • His Idea – A geographic search • “Converted street addresses found within a large corpus of documents to latitude-longitude-based coordinates” • Would allow the user to specify a query – “What are closest movie theaters near my house”

  13. Personalized results based on location • The server knows your IP • Find the server closest to you by doing a trace route • http://www.calweb.com/cgi-bin/traceroute • The relative geographic location of your computer can be found by doing a whois query on your IP’s server • http://dns411.com/cgi-bin/whois.pl • Once your location is found your results can be customized based on where you live

  14. Personalized results from Cookies • Google could ask the user to answer a one time survey and store the results as a cookie • For example: • Age • Sex • Education • A query done by a 60 year old man for “rock” might give back different results than the same search done by a teenager

  15. Linguistic Approach • Google could tailor results based on the language used • For example the English word “Java” has many definitions • The programming language • The coffee • The Indonesian island

  16. File type restriction • Google already has the ability to search for file types with its keyword filetype: • What if that user does not want to find a certain file type, but instead has the need to find a page that contains a file type either embedded inside the page or has a link to that certain file type? • For example: Find me only pages that have audio files and java applets

  17. Authorities and Hubs • Authorities - Highly cited pages • Hubs – Pages that contain many authorities • Difference between search on www.Google.com and www.inquirus.com when searching for “Pasta”

  18. Business Improvements • Develop Google software for the PC market • The single search query using the search tool on a windows machine is relatively slow compared to a Google search done online

  19. P2P • If Google would create software for the PC market, maybe the amount of searchable documents would increase drastically. • Perhaps with this P2P technology one would be able to find a computer science document about search engine technology that sits in a professor’s computer at Stanford

  20. B2B • Business to Business • Google could act as an intermediary between corporations that are looking for the business of other corporations • Coupled together with the Geographic technology, a business could perform a sample query: Find me all the businesses that sell paper around the Philadelphia Region

  21. Other ideas • Include commercial databases • Library catalogs • Proquest • Cluster documents by topic • After searching for the keyword “Law,” Google should cluster the documents pertaining to the type of law (property law, banking law, criminal law)

  22. Resources • http://www.google.com/corporate/tech.html • http://www.google.com/corporate/timeline.html • http://www.google.com/programming-contest/winner.html • http://citeseer.nj.nec.com/borodin01finding.html • http://www.calweb.com/cgi-bin/traceroute • http://dns411.com/cgi-bin/whois.pl • Aaron Steward– Finance Major

  23. Any Questions?

More Related