1 / 12

CSC 96 Building and Managing Web Sites with Microsoft Technologies

CSC 96 Building and Managing Web Sites with Microsoft Technologies. Week 9 Search Engines and Microsoft Index Server. Search Engines. Search engines are important for web sites larger than 100 pages. However, should not be a replacement for a good site structure and navigation scheme.

osanna
Download Presentation

CSC 96 Building and Managing Web Sites with Microsoft Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 96Building and Managing Web Sites with Microsoft Technologies Week 9 Search Engines and Microsoft Index Server CSC96B

  2. Search Engines • Search engines are important for web sites larger than 100 pages. • However, should not be a replacement for a good site structure and navigation scheme. • Provides alternative content discovery mechanism for advanced users. • Search engines create entries automatically. CSC96B

  3. How Search Engines Work • Search engines are an information gathering and filtering subsystem. • Robots/Spiders gather data from remote/local web repositories to local indexing database. • Information conversion and extraction • Search engines periodically review records to update database. CSC96B

  4. Formatting Pages for Search Engines • Use short and descriptive TITLE element. • <TITLE> should be first element of <HEAD> section. • Use META description element to provide abstract of page. • Break up content into more smaller pages for more precise searching. • Use META keywords element. Most search engines limit to first 25 words. • Ranking algorithms typically based on keyword frequency and location on page. CSC96B

  5. Robots Exclusion Standard • Most search engines are look for a text file called robots.txt in your site's root directory. • Robots.txt tells robots/spiders what they can and can't index. • Most but not all robots abide by this standard. • Only one robots.txt file per web ite -- any others ignored. • Wild cards not supported. Truncate path instead (e.g., /help disallows both /help.html and /help/index.html) • Notes indicated with # CSC96B

  6. Sample Robots.txt File # Test robot.txt file # this section restricts /temp and /current to all agentsUser-agent: * #applies to all robots Disallow: /temp/ #restrict /tempDisallow: /current/ #restrict /currentAllow: /current/allow.htm # restricts BadSpider from all contentDisallow:User-agent: BadSpiderDisallow: / #BadSpider restricted CSC96B

  7. How to Identify Visiting Spiders • Check server logs for sites that retrieve many documents, especially in a short time. • If your server supports User-agent logging, check for retrievals with unusual User-agent header values. • Look for sites that repeatedly check for the file '/robots.txt' CSC96B

  8. Robots META Element • Can direct Robots at the page level using the Robots META tag. • No server administrator action is required. • Only some robots implement this. <HTML><Head><Title>Robots Test Page</Title><META name="robots" content="noindex,nofollow"></Head><Body> ... CSC96B

  9. Robots Information For more information: General Resource: http://wdvl.internet.com/Location/Search/Robots.html Directory of Robots: http://info.webcrawler.com/mak/projects/robots/active.html CSC96B

  10. Microsoft Index Server • Excellent indexing server packaged free with IIS/NT 4. • Use only Version 2.0 with NT4 Option pack. • Once installed, runs automatically with virtually no attention required. • Spins through content, indexing all words in the document. • Can create multiple indexes for different web and/or portions of webs. • Use IIS to turn off indexing of specific directories. • Indexing occurs during less busy times. • Occasionally will need to rebuild the index(s). CSC96B

  11. Using Index Server • Three different methods to use Index Server: • Forms using .htx and .idq files • Use ASP pages to access index contents using supplied Index Server objects. • Used ASP pages and ADO to access index contents with SQL statements. • Basic forms are easiest, but ASP pages provide the most power. CSC96B

  12. Accessing Index Server with Forms • Create a search form that references an IDQ parameters file. • Create an .IDQ file that passes information to Index Server, including the output template file (.HTX). • Create an .HTX file to output the format. CSC96B

More Related