1 / 29

Google Session

Google Session. About MIT’s Google Search Appliance (GSA) Adding Google search to your web site Customizing search results Tips on improving a site’s rankings Q&A – actually, ask questions anytime!. MIT's Google Configuration. MIT license is for 3M documents

bianca
Download Presentation

Google Session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Google Session • About MIT’s Google Search Appliance (GSA) • Adding Google search to your web site • Customizing search results • Tips on improving a site’s rankings • Q&A – actually, ask questions anytime!

  2. MIT's Google Configuration • MIT license is for 3M documents • Two collections of 1.5M documents each • MIT has over 1M web pages on 1,000 web servers • Google follows links from the MIT Home Page • web.mit.edu – crawled three times a week • Other MIT web servers – crawled twice a week

  3. MIT Google does • Performs twice as well as Inktomi in a “blind test” • Indexes 220 different file formats • Provides control over our own crawling schedule • Allows user customization of search results format • Indexes certificate-restricted content(not implemented yet)

  4. MIT Google does NOT • Cache old pages • Index image files (our decision) • Index image ALT tags (Google’s decision) • Allow us to fiddle with the relevancy algorithm • Tell you “who’s linking to my page” because the GSA does not share that information across collections. When your pages move, we recommend using a 301 redirect.

  5. MIT Google does NOT index Java, Perl, Python documentation Debian, GNU/Linux mirrors URLs containing these strings: sipb.mit.edu dev.mit.edu net.mit.edu lees.mit.edu ops.mit.edu classics.mit.edu hypermail pipermail Certificate protected pages No robots sites, no index pages Dynamically generated pages containing ‘?’ except by request URLs containing cgi-bin URLs containing /afs/

  6. Telling Google not to index • No robots in server • No robots in locker/directory • No robots in html file • No index, follow

  7. Avg. daily views - January 2005 Total queries Jan 1 - 26: 340,656

  8. Goooglesearch forms

  9. Simple search form

  10. Sample search code 1. <form method='get' action='http://gb-server.mit.edu/search'>2. <input type='text'name='q' size='32' maxlength='255' value=''/>3. <input type='submit' name='btnG' value='Search'/>4. <input type='hidden' name='site' value='mit'/>5. <input type='hidden' name='client' value='mit'/>6. <input type='hidden' name='proxystylesheet‘ value='http://web.mit.edu/xsl/google-mit.xsl'/>7. <input type='hidden' name='output' value='xml_no_dtd'/>8. <input type='hidden' name='as_dt' value='i'/>9. <input type='hidden' name='as_sitesearch' value= 'web.mit.edu/newsoffice'/>10.</form> Doc

  11. Restrict to one directory tree • name='as_sitesearch' value='<yoururl>'use web.mit.edu/newsoffice not web/newsoffice • The slash / mattersweb.mit.edu/newsofficeto include sub-directoriesweb.mit.edu/newsoffice/to exclude sub-directories • as_sitesearch allows allows you to specify one directory (and all its sub-directories) as the domain to be searched—you cannot specify multiple disparate directories using this option • If you want the search feature on your site to search the entire MIT web site, delete this parameter. Doc

  12. Restrict tomultipledirectories or servers • Contact google@mit.edu and we will create a subcollection for you. • A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library". Doc

  13. Advanced search example

  14. Gooogle Custom Results You can customize the look and feel of Google’s search results by providing a stylesheet.

  15. Site-wide MIT template

  16. IS&T custom results

  17. IS&T Search

  18. IS&T Custom Results

  19. Customizing results Your HTMLheader/footer Google Results Data • You provide the header and footer (HTML) wrapper, and any desired content formatting • Google provides the raw data (XML)

  20. Results content “title” only

  21. How customization works Search Query = MIT-Google Index <XML/> <XSLT> MIT-Google Index MIT-Google Index MIT-Google Index HTMLResults Search Results Stylesheet • The form points to an XSLT stylesheet • Google returns results to query in XML • An XSLT document translates the XML into your custom HTML +

  22. Notes • It is not necessary to customize the results. • You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet. • Updates to the Google service may require you to make changes in your stylesheet. • Subscribe to google-partners@mit.edu • WCS will provide fee-based production services for custom search results.

  23. How to customize the results • Plan how you want the results to look • Copy the MIT Google XSLT stylesheet http://web.mit.edu/xsl/google-mit.xsl • Save it to web readable space, naming it google-mysite.xsl

  24. Point to your XSL • Update your search form to point the MIT-Google server to your custom XSLT style sheet. <form method='get' action='http://gb-server.mit.edu/search'> <input type='text' name='q' size='32' maxlength='255' value=''/> <input type='submit' name='btnG' value='Search'/> <input type='hidden' name='site' value='mit'/> <input type='hidden' name='client' value='mit'/> <input type='hidden' name='proxystylesheet' value='http://web.mit.edu/my_dept/google-mydept.xsl'/> <input type='hidden' name='output' value='xml_no_dtd'/> </form>

  25. Step-by-step customization See http://web.mit.edu/ist/google/stylesheets.html

  26. Documentation • http://web.mit.edu/ist/google/(Includes the “official” Google documentation, including their XML specification; also XSLT tips.) • Search Engine Submission Tipshttp://searchenginewatch.com/webmasters/Using SS for an • Effective SEO Campaignhttp://www.alistapart.com/articles/seo/

  27. Support HTMLResults • The MIT Google team will support your creating a Google search form and answer queries sent to google@mit.edu • WCS offers fee-based production services for custom search results

  28. Q&A

More Related