Basic Internet Search Techniques Or How to really find information on the internet Shayna Keces Reference Librarian 236-0301 ext. 441 August 2004
Agenda • Size of Internet • Types of search engines • Search strategies • Some hints on selecting search strategies • Interpretation of search results • Tutorials on searching and search engines
Size of Internet/World Wide Web • July 2000 2.1 billion web pages, est. 4 billion pages by early 2001 (Some place much higher if count invisible or deep web) • Size of search engine databases • Google 4.28 billion • Fast (alltheweb) 2.1 billion • AltaVista 1.1 billion • Yahoo 2 million catalogued
Search strategies • Do not • use search button • use a string of keywords without specifying Boolean properties • use upper case unless part of strategy • use NOT or - unless absolutely sure is necessary • elimination of unanticipated pages • format is non standardized
Search Strategies • Do • Consider what type of resource will best answer your question and search for that resource (eg. dictionary or certain type of web page) • think of a list of keywords that will narrow or broaden your search keeping in mind that with the internet, narrowing your search is usually better • Stick to small list of search engines and learn the search syntax for the search engine you’re using
Types of search engines • Keyword or robot based (builds a database) • Directory based (categories indexed by people rather than computer) • Annotated directory-based search engines • Meta indexes (can combine searches or allow you to search a variety of engines individually) • Specialized search engines
Keyword or robot based Search Engines • Large database of web pages • No human involvement and no quality control • Can submit website or will find some on own • Searches full text to certain level, does not search deep or invisible web • Google (www.google.com) • Alta Vista (www.altavista.com) • Fast (www.alltheweb.com) • Wisenut (www.wisenut.com)
Google (www.google.com) • Presently largest database (ca. 4 billion) • Very sophisticated placement of results particularly good for popular sites, company sites • Advanced search can limit search to title of page or to URL • implied AND • + for stop words • If you want or needs to be expressed in caps • not case sensitive
Google (www.google.com) cont. • no stemming or truncation (except on ad hoc basis controlled by Google. • description shows keywords in context • cached pages helpful for sites not working • Searches some formats not found in other search engines (eg. Adobe acrobat and postscript files, Excel, Powerpoint, and Word files as well as rich text files.) • Innovative in new features (eg. ability to convert measurements, eg. 4 miles in km) See www.google.ca/help/features.html for a description of features.
AltaVista (www.altavista.com) • One of larger search engines (1.3 billion pages/objects or more) • Particularly good for finding less popular sites • Implied “and” but noted for changing • Case sensitive when word is in quotations • Stemming with * at end or in middle of words • Has related terms which helps you focus your search
AltaVista Advanced Search • Has “build a Boolean search” facility or can create your own • Can specify pages be from certain country based on country codes so will not include .com etc. • Can specify dates of last modification
Directory-based Search Engines • Indexed by individuals so subject searches will be more accurate • Smaller database than Robot engines • Used mainly for finding good site on general topic • Yahoo (www.yahoo.com or ca.yahoo.com) • About (about.com ) • Looksmart (www.looksmart.com)
Yahoo (ca.yahoo.com) • Most popular of directory based search engines • Many different versions (international have same pages as others but local options are supplied first) • Now has own web search which is competing with Google’s • Can search by categories and sub-categories
Annotated directory-based search engines • Because annotated, database is even smaller than Directory-based engine • Quality of web pages is better • Web pages often rated • Librarian’s Index to the Internet (lii.org) • The Internet Public Library (www.ipl.org/)
Librarian’s Index to the Internet (lii.org) • Topical list of high quality websites with abstracts and qualitative analysis • Can willow down by topic or use search capability • Only websites which meet the standards of the editors are included • Provides date site was added to index as well as date the lii entry was last updated
Meta indexes • One site searches more than one search engine • Results can be separated or combined • Sometimes a problem in interpreting question equally effectively for all search engines • Used if not sure which search engine will give you best results and/or for obscure topics
Meta indexes examples • Dogpile (www.dogpile.com) • Metacrawler (www.metacrawler.com/index.html) • Surfwax (www.surfwax.com) • Hotbot (www.hotbot.com)
Specialized Search Engines • Geographic based (www.altavistacanada.com, http://www.ottawastart.com/ • Phone directories (canada411.sympatico.ca/, www.infospace.com/canada/index.htm) • Newsgroup searching (groups.google.com) • News searching (news.google.ca) • Women’s information (wwwomen.com) • Different formats (www.gimpsy.com/, www.kartoo.com/)
Specialized sites • Ottawa Public Library (www.library.ottawa.on.ca) • Reference tools (see library reference sites, eg. lii.org, www.ipl.org/ref) • Encyclopedias (www.britannica.com, Columbia encyclopedia www.bartleby.com/65/ • Canadian information (vrl.tpl.toronto.on.ca/, Canadian information by subject www.nlc-bnc.ca/caninfo/ecaninfo.htm, Canadian encyclopedia online, www.thecanadianencyclopedia.com/
Some hints on selecting search strategies • For any page on general topic to which you need an introduction try Directory-based search engine. If do not need specific quality can use address bar search • For web page of major company or organization try Google or Alta Vista • For a specific web page that would not necessarily be popular try Alta Vista or Google
Some hints on selecting search strategies cont. • For health topics try a health website engine like www.medbroadcast.com or the Canadian Health Network www.canadian-health-network.ca/customtools/homee.html, or the library’s health database, Health Source (www.library.ottawa.on.ca/electronic/index.htm), or the health links on the library’s web page (www.library.ottawa.on.ca/english/links/PublicAdults/index.htm).
Some hints on selecting search strategies cont. • For very obscure topic topic try Google or Alta Vista or one of meta indexes • For items in databases, try to find the correct host or search a special site for invisible websites (eg. www.invisible-web.net/)
Interpretation of search results • Look at results and reformat search using things like searching within results, Prisma and adding new keywords. • Analytically choose which sites to look at in result list • Anatomy of URL domain + type of name, I.e. the name or organization followed by the type of organization. Some popular suffixes are: .com for commercial sites, .edu for university sites (mainly American), .org for non-profit organizations, .gov for U.S. government sites, and .gc.ca for Canadian government sites.
Interpretation of search results cont. • Consider things like the authority of the author, the currency of the information, and the reason for creating the website (implications for bias) • Do not look through pages and pages of results. If the first three pages are not promising refine the search (see the first point on interpreting the results).
Some useful tutorials for searching • See “Learning to search” section of Collection of special search engines (appears under contents on left-hand side of the page) www.leidenuniv.nl/ub/biv/specials.htm • Web searching tips www.searchenginewatch.com/facts/index.html • Net tutor (gateway.lib.ohio-state.edu/tutor/les5/)
Some useful tutorials for searching cont. • In the links section of the Ottawa Public Library’s web site, (www.library.ottawa.on.ca/english/links/PublicAdults/index.htm), look under the category WWW under the subcategory Internet
To find more info on search engines • Searchenginewatch (www.searchenginewatch.com) • Searchengineshowdown (www.searchengineshowdown.com)
For More Help on Searching • Contact the Reference Dept. of the Main Branch of OPL by phoning 236-0302, ext. 233, or email firstname.lastname@example.org • Consult this web page or other specialized web presentations on the library’s web page at http://www.library.ottawa.on.ca/english/services/reference/index.htm