online database vs web search engines n.
Skip this Video
Loading SlideShow in 5 Seconds..
Online Database vs. Web Search Engines PowerPoint Presentation
Download Presentation
Online Database vs. Web Search Engines

Loading in 2 Seconds...

play fullscreen
1 / 29

Online Database vs. Web Search Engines - PowerPoint PPT Presentation

  • Uploaded on

Online Database vs. Web Search Engines. 571-Information Access and Retrieval. Online Database. Overview of Online Database 30 years (William (2006). From 1975 to 2005, databases increased considerably, from 301 to 17539

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Online Database vs. Web Search Engines

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
online database vs web search engines

Online Database vs. Web Search Engines

571-Information Access and Retrieval

overview of online database 30 years william 2006
Overview of Online Database 30 years (William (2006)
  • From 1975 to 2005, databases increased considerably, from 301 to 17539
  • database records from 52 million to 21.02 billion, and database entries from 301 to 16532.
  • The number of producers has not grown as fast as databases because one producer might publish multiple databases.
  • The number of publishers increased from 200 to 3208 from 1975 to 2005.
  • In 2005, the average producer produced 5.13 databases. Since each vendor might provide services from multiple databases, the number of vendors grew at a slower pace from 105 to 2811.
types of search
Types of search
  • Known item search
  • Specific-information search
  • Subject search
  • Exploring/Browsing information
  • Others
general search steps
General search steps
  • Search plan
  • System access
  • Database selection (Optional)
  • Search query formulation
  • Preliminary results evaluation
  • Search query reformulation (Optional)
  • Final results evaluation (Optional)
some search strategies
Some search Strategies
  • Building blocks
    • combine sub-searches
  • Citation pearl growing
    • use the index term to retrieve further similar citations
  • Successive fractions
    • reduce the set using narrower index terms
  • Most specific facet first
    • start with the most specific concept
search strategy formulation
Search Strategy Formulation
  • Imagine the title and keywords of relevant documents
  • Boolean
    • and, or, not
  • proximity operator
    • adj, near, freq, atleast
  • search fields/segments
    • au, co, ti, de
  • Use controlled vocabulary to identify context
  • truncation
    • string
    • plural
    • single character
how to find related words
How to find related Words?
  • Personal knowledge
    • terminology
    • relevant document
  • Term mapping provided by system
  • Feedback from search results
    • title, descriptor, text
  • Others
search strategy reformulation
Search Strategy Reformulation
  • System
    • search fields
    • vocabulary
    • more like this
    • refine search
    • Limit/focus search
  • User
    • relevance feedback
narrow search
Narrow search
  • Find the right database
  • Add another word or phrase
  • Negative feedback (exclude one aspect of the search statement)
  • Exclude related terminology
  • Restrict to certain field
    • title, descriptor, frequency, etc.
  • Restrict to certain types of publication
  • Restrict to certain time range
  • Restrict to certain language
evaluate search results
Evaluate search results
  • Known item
    • title, author, publication, date
  • Specific information
    • Key Word In Context (KWIC)
  • Subject information
    • title, abstract, descriptor, full text
check for tutorial for online databases
Check for Tutorial for online databases
characteristics of web ir
Characteristics of web IR
  • Web documents
    • Distributed stored
    • Growing in size
    • Deep and surface documents
    • Multiple formats
    • Various in quality
    • Frequently changed
    • Others
  • Users
    • Various user groups
    • Others
  • Systems
what is search engines
What is search engines?


Search Engine


key components
Key components
  • Data collection
    • Web spider or crawler
  • Data processing
    • Ranking
    • Indexing
  • Query formulating
    • Interface
  • Matching
  • Result displaying
how ranking works
How ranking works?
  • Literally match
    • Measure of word significance: The frequency of word occurrence (term frequency)
    • location: relative position of a word
    • Examples
how ranking works cont
How ranking works? (Cont’)
  • Hyperlinks (Brin&Page 1998)
    • PR(A)=(1-d) + d(PR(T1)/C(T1) +…+PR(Tn)/C(Tn)) *
      • PA(A)—Page Rank of document A
      • C(A)—Number of outgoing links from document A
      • d—Dumping factor between 0-0.85


other types of search engines
Other Types of Search Engines
  • Directories
    • hierarchically organized indexes that allow you to browse through lists of web sites by category or subject
  • Meta-search engines
    • query multiple search engines simultaneously and return a complete set of hits
  • Specialized search engines
    • Create a database of sites on a specific topic using robots or spiders
    • For specific user groups
    • Visualization
examples of directories
Examples of Directories
  • Yahoo Directory

  • The Internet Public Library

  • Librarians’ Index to the Internet

  • INFOMINE, from the University of California, is a good example of an academic subject directory
examples of meta search engines
Examples of Meta-Search Engines
  • MetaCrawler

  • Ixquick

  • Clusty

  • Mamma

more examples of specialized search engines
More examples of Specialized Search Engines
  • Career Mosaic

  • Diseases, Disorders and related topics

  • The Day in History


user behaviors
User Behaviors
  • Web queries are short, not much modified, very simple in structure
  • Very few advanced search features, if do so, half of them are mistakes
  • View only first one or two pages
  • No interested in relevance feedback
appendix a tips
Appendix A: Tips
  • Most search engines employ the principles of Boolean logic in the formulation of search queries. If you take the time to understand the basics of Boolean logic, you will have a better chance of search success.
  • Search engines tend to have a default Boolean logic. This means that the space between multiple search terms defaults to either OR logic or AND logic. This has become a de facto standard. It is imperative that you know which logical operator is the default. Nowadays, the default logic tends to be AND, but you should always check the site's Help file to make sure.
  • Another de facto standard is the requirement to search for phrases within quotations, e.g., "dealth penalty".
appendix a cont
Appendix A (Cont’)
  • If the option is available, use proximity operators (e.g., NEAR) if these are available rather than specifying an AND relationship between your keywords. This will make sure that your search terms are located near each other in the full text document. The closer your terms are placed, the more possibly relevant the document will be. Google does proximity searching by default.
  • Field searching is another extremely important way of limiting your search results in large search engines that contain millions of full-text files. For example,


in a search engine such as AltaVista will bring you more relevant hits than merely searching on the keyword slavery.

  • To enhance subject searches, try the URL field to narrow your results. The URL field offers a good way to search for certain subject terms. This is because of the make-up of the URL.
appendix a cont1
Appendix A (Cont’)
  • The Internet is a self-publishing medium. It is not a library of evaluated publications selected by professionals. Rather, the Internet is a bulletin board containing everything from the definitive to the spurious. Everything, everything must be analyzed for its appropriateness for research use.
  • Before you select a search tool, always think about your topic and what you are trying to find. Once you begin your research, be sure to try out a handful of sites. Don't rely on a single site.
  • Don't just Google everything! Google is great, but there are other useful tools on the Web, too. Google has become so popular that many people use this tool exclusively, and miss out on others that might be more useful for their particular search.
  • Others?
appendix b
Appendix B

Anatomy of a URL

This is a URL on the CNN home page:

This URL is typical of addresses hosted in domains in the United States:

Protocol: http

Host computer name: www

Second-level domain name: cnn

Top-level domain name: com

Directory name: feedback

File name: comments.html

The directory name and file name often contain subject terms. These can be searched with the URL field. For example, URL:slavery will give you more relevant results than the keyword slavery by searching for this term as a directory name or a file name.

appendix c
Appendix C
  • Search engine comparison chart
  • Tutorials
    • Google Tutorial