1 / 17

Hotbot A Search Engine Case Study

Hotbot A Search Engine Case Study. Introduction. Owned by Terra/Lycos. One of the largest web search engines. Uses the Inktomi database combined with Direct Hit and the DMOZ Open Directory. Basic search screen is simple, but the advanced search allows for a full range of search features.

henson
Download Presentation

Hotbot A Search Engine Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HotbotA Search Engine Case Study

  2. Introduction • Owned by Terra/Lycos. • One of the largest web search engines. • Uses the Inktomi database combined with Direct Hit and the DMOZ Open Directory. • Basic search screen is simple, but the advanced search allows for a full range of search features.

  3. Databases • Open Directory • Direct Hit • Inktomi • Direct Hit results display if the option for 10 results at a time is selected and there are 10 results available from Direct Hit. If an option for more than 10 results at a time is selected the Direct Hit results are available via a link. Other content comes from various advertisers, the Lycos Network, and GoTo. The GoTo and other advertiser results may show up above and/or below the other results but are under a separate heading such as "feature listings."

  4. Strengths • Advanced searching capabilities • Page depth limit • Advanced search help • Truncation

  5. Weaknesses • Link searches must be exact • Database size shrunk for awhile • Advanced features have not always worked right

  6. Features • Default Operation: Processed as an AND • Full Boolean Searching: AND, OR, and NOT • Proximity Searching • Truncation with the * symbol • Case sensitive • Extensive, dynamic stop word list • Word Stemming - Search for grammatical word variants including plural, singular, and tense.

  7. Field Searches • Field Searching: Searching title words and links to a specific URL • acrobat/applet/activex/audio/embed/ flash/form/frame/image/script/ shockwave/table/video/vrml

  8. Limits • linkdomain: Limits pages containing links to the specified domain • Outgoingurlext: Limits to pages containing embedded files with the specified extension • Scriptlanguage: Limits to pages containing only javascript or vbscript • after: [day]/[month]/[year] • before: [day]/[month]/[year] • within:[number/unit] • Language Limit

  9. Unique for Hotbot • Page Type – • Default is Any (Any pages) • Top Page (the root page of a URL ie. www.unca.edu) • Page Depth - Limits how far down a subdirectory hierarchy Hotbot Searches • These are useful for finding the primary sites for organizations or information

  10. Sorting • Results are sorted by relevance with groupings by site available at the end of each brief record. • The display includes the relevance score, title, URL, a brief extract, and date. HotBot displays 10 records at a time, by default.

  11. Architecture • Direct Hit: • Provides the breadth of a conventional search engine, with the relevancy of an index which is edited by humans • References the searching activity of millions of users • Adjusts rankings based on the popularity of the retrieved documents

  12. Architecture • Inktomi • Hosts Web searches for its clients on coupled-cluster, parallel-computing multiple workstations • Receiving a search query from a user, that interface translates the query from HTTP into Inktomi Data Protocol (IDP) and sends it to the Inktomi Master Cluster • it sends the results in IDP to the client Web server, which translates the information into HTTP and sends it to the user

  13. Results • Query 1: Information on Home of the Rockefellers Kykuit - To test the engines on a very specific bit of Americana - Kykuit, the baronial home of the Rockefellers on the Hudson River in New York. • Query 2: Information on Neuschwanstein Castle - To test the engines on a fairly well-known tourist attraction in Germany - Neuschwanstein Castle • Query 3: Information on Francis Pilkington Madrigals - To test the engines on retrieval of an obscure musical reference - the Elizabethan madrigals of Francis Pilkington.

  14. Query 1: Information on Home of the Rockefellers Kykuit • Hotbot - 72 Matches • FPL: www.gorp.com/gorp/location/ny/kyk_hudv.htm • Relevance rating: Page 14: County Historys • Google - 91 Matches • FPL: www.abbeville.com/booktemplate.asp?stockno=2220 • Relevance: Page 30: A Book Where Kykuit is mentioned • UNCA Library - 5 Matches • FPL: wncln.appstate.edu/search/...information+on+how+to+use+the+dietary+guidelines&1,1 • Relevance: Page 1: Information on how to use dietary guidelines

  15. Query 2: Information on Neuschwanstein Castle • Hotbot - 2,700 Matches • FPL: www.castlesoftheworld.com/Brochure/ • Relevance: Page 10: Castles of the US • Google – 4,060 Matches • FPL: www.neuschwanstein-castle.com/ • Relevance: Page 33: A Page on King Ludwig II - No Mention of Neuschwanstein Castle • UNCA Library - 5 Matches • FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinformation+on+self+employment+tax&1,1 • Relevance: Page 1: Information On Self Employment Tax

  16. Query 3: Information on Francis Pilkington Madrigals • Hotbot - 53 Matches • FPL: www.medieval.org/emfaq/cds/van624.htm • Relevance: Page 5 - A Page about the Lute - no mention of Madrigals • Google - 33 Matches • FPL: www.netstrider.com/search/methods.html • Relevance: Page 3: No mention of Pilkington Madrigals • UNCA Library - 5 Matches • FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinformation+on+the+red+notice+system&1,1 • Relevance: Page 1: Information On The Red Notice System

  17. Conclusion • HotBot is an interface to advanced web searches, and it presents a dynamically changing backend. Both the Inktomi and Direct Hit technologies serve, in different ways, to provide a relevant list of results through advanced queries, and both seek to minimize the commercial influence over search results. All of these technologies are subject to changes in technology developments, and changes in the business environment. • Its weaknesses include that it still doesn't seem to produce the depth and breadth of some other engines, and that it's advanced features have not always worked correctly. As the proliferation of this engine's index and searching features continues, these weaknesses should be overcome.

More Related