evaluation in searching n.
Skip this Video
Loading SlideShow in 5 Seconds..
EVALUATION in searching PowerPoint Presentation
Download Presentation
EVALUATION in searching

Loading in 2 Seconds...

play fullscreen
1 / 37

EVALUATION in searching - PowerPoint PPT Presentation

  • Uploaded on

EVALUATION in searching. Requirements Criteria. tefkos@rutgers.edu ; http://comminfo.rutgers.edu/~tefko/. Central ideas. Evaluation is an integral part of searching But there a number of: contexts & approaches to evaluation requirements for evaluation criteria used in evaluation. ToC.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'EVALUATION in searching' - abraham

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
evaluation in searching

EVALUATION in searching



tefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/

Tefko Saracevic

central ideas
Central ideas
  • Evaluation is an integral part of searching
  • But there a number of:
    • contexts & approaches to evaluation
    • requirements for evaluation
    • criteria used in evaluation

Tefko Saracevic

  • Importance, definitions
  • Contexts & approaches
  • Requirements for evaluation
  • Web evaluation

and some pretty pictures at the end

Tefko Saracevic

definition of evaluation
Definition of evaluation


1. assessment of value

the act of considering or examining something in order to judge its value, quality, importance, extent, or condition

In searching:

assessment of search results on basis of given criteria as related to users and use

criteria may be specified by users or derived from professional practice, other sources or standards

Results are judged & with them the whole process, including searcher & searching

Tefko Saracevic

importance of evaluation
Importance of evaluation
  • Integral part of searching
    • always there - wanted or not
      • no matter what user will in some way or other evaluate what obtained
    • could be informal or formal
  • Growing problem for all
    • information explosion makes finding “good” stuff very difficult
  • Formal evaluation part of professional job & skills
    • requires knowledge of evaluation criteria, measures, methods
    • more & more prized

Tefko Saracevic

place of evaluation


Inf. need




Place of evaluation

Tefko Saracevic

general application
General application
  • Evaluation (as discussed here) is applicable to results from a variety of information systems:
    • information retrieval (IR) systems, e.g. Dialog, Scopus …
    • sources included in digital libraries, e.g. Rutgers
    • reference services e.g. in libraries or commercial on the web
    • web sources e.g. as found on many domain sites
  • Many approaches, criteria, measures, methods are similar & can be adapted for specific source or information system

Tefko Saracevic

broad context
Broad context

Evaluating the role that an information system plays as related to:

  • SOCIETY - community, culture, discipline ...
  • INSTITUTION - university, organization, company ...
  • INDIVIDUALS - users & potential users (nonusers)

Roles lead to broad, but hard questions as to what CONTEXT to choose for evaluation

Tefko Saracevic

questions asked in different contexts
Questions asked in different contexts
  • Social:
    • how well does an information system support social demands & roles?
      • hardest to evaluate
  • Institutional:
    • how well does it support institutional/organizational mission & objectives?
      • tied to objectives of institution
      • also hard to evaluate
  • Individual:
    • how well does it support inf. needs & activities of people?
      • most evaluations in this context

Tefko Saracevic

approaches to evaluation
Approaches to evaluation
  • Many approaches exist
    • quantitative, qualitative …
    • effectiveness, efficiency ...
    • each has strong & weak points
  • Systems approach prevalent
    • Effectiveness: How well does a system perform that for which it was designed?
    • Evaluation related to objective(s)
    • Requires choices:
      • Which objective, function to evaluate?

Tefko Saracevic

approaches cont
Approaches … (cont.)
  • Economics approach:
    • Efficiency: at what costs?
    • Effort, time also are costs
    • Cost-effectiveness: cost for a given level of effectiveness
  • Ethnographic approach
    • practices, effects within an organization, community
    • learning & using practices & comparisons

Tefko Saracevic

prevalent approach
Prevalent approach
  • System approach used in many different ways & purposes – in evaluation of:
    • inputs to system & contents
    • operations of a system
    • use of a system
    • outputs from a system
  • Also, in evaluation of search outputs for given user(s) and use
    • applied on the individual level
      • derived from assessments from users or their surrogates, e.g. searchers
    • this is what searchers do most often
    • this is what you will apply in your projects

Tefko Saracevic

five basic requirements for system evaluation
Five basic requirements for system evaluation

Once a context is selected need to specify ALLfive:

1. Construct

  • A system, process, source
    • a given IR system, web site, digital library ...
    • what are you going to evaluate?

2. Criteria

  • to reflect objective(s) of searching
    • e.g. relevance, utility, satisfaction, accuracy, completeness, time, costs …
    • on basis of what will you make judgments?

3. Measure(s)

  • to reflect criteria in some quantity or quality
    • precision, recall, various Likert scales, $$$ ...
    • how are you going to express judgment?

Tefko Saracevic

requirements cont
Requirements … (cont.)

4. Measuring instrument

  • recording by users or user surrogates (e.g. you) on the measure
    • expressing if relevant or not, marking a scale, indicating cost
    • people are instruments – who will it be?

5. Methodology

  • procedures for collecting & analyzing data
    • how are you going to get all this done?
    • Assemble the stuff to evaluate (construct)? Choose what criteria? Determine what measures to use to reflect the criteria? Establish who will judge and how will the judgment be done? How will you analyze results? Verify validity and reliability?

Tefko Saracevic

requirements cont1
Requirements … (cont.)
  • Ironclad rule:

No evaluation can proceed if not ALL five of these are specified!

  • Sometimes specification on some are informal & implied, but they are always there!

Tefko Saracevic

1 constructs
1. Constructs
  • In IR research: most done on test collections & test questions
    • Text Retrieval Conference - TREC
      • evaluation of algorithms, interactions
      • reported in research literature
  • In practice: on use & user level: mostly done on operational collections & systems, web sites
    • e.g. Dialog, LexisNexis, various files
      • evaluation, comparison of various contents, procedures, commands,
      • user proficiencies, characteristics
      • evaluation of interactions
      • reported in professional literature

Tefko Saracevic

2 criteria
2. Criteria
  • In IR: Relevance basic & most used criterion
    • related to the problem at hand
  • On user & use level: many other
    • utility, satisfaction, success, time, value, impact, ...
  • Web sources
    • those + quality, usability, penetration, accessibility ...
  • Digital libraries, web sites
    • those + usability

Tefko Saracevic

2 criteria relevance
2. Criteria - relevance
  • Relevance as criterion (as mentioned)
    • strengths:
      • intuitively understood, people know what it means
      • universally applied in information systems
    • weaknesses:
      • not static - changes dynamically, thus hard to pin down
      • tied to cognitive structure & situation of a user – possible disagreements
  • Relevance as area of study
      • basic notion in information science
      • many studies done about various aspects of relevance
  • Number of relevance types exist
    • indication of different relations
      • had to be specified which ones

Tefko Saracevic

2 criteria usability
2. Criteria - usability
  • Increasingly used for web sites & digital libraries
  • General definition (ISO)

“extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use”

  • Number of criteria
    • enhancing user performance
    • ease of operations
    • serving the intended purpose
    • learnability – how easy to learn, memorize?
    • losstness – how often got lost in using it?
    • satisfaction
    • and quite a few more

Tefko Saracevic

3 measures
3. Measures
  • in IR: Precision & recall preferred (treated in unit 4)
    • based on relevance
    • could be two or more dimensions
      • e.g. relevant–not relevant; relevant–partially relevant–not relevant
  • Problem with recall
    • how to find what's relevant in a file?
      • e.g. estimate; broad & narrow searching or union of many outputs then comparison
  • On use & user level
    • Likert scales - semantic differentials
      • e.g. satisfaction on a scale of 1 to x (1=not satisfied, x=satisfied)
    • observational measures
      • e.g. overlap, consistency

Tefko Saracevic

4 instruments
  • People used as instruments
    • they judge relevance, scale ...
  • But people who?
    • users, surrogates, analysts, domain experts, librarians ...
  • How do relevance, utility ... judges effect results?
    • who knows?
  • Reliability of judgments:
    • about 50 - 60% for experts

Tefko Saracevic

5 methods
5. Methods
  • Includes design, procedures for observations, experiments, analysis of results
  • Challenges:
    • Validity? Reliability? Reality?
      • Collection - selection? size?
      • Request - generation?
      • Searching - conduct?
      • Results - obtaining? judging? feedback?
      • Analysis - conduct? tools?
      • Interpretation - warranted? generalizable?

Tefko Saracevic

evaluation of web sources
Evaluation of web sources
  • Web is value neutral
    • it has everything from diamonds to trash
  • Thus evaluation becomes imperative
    • and a primary obligation & skill of professional searchers – you
    • continues & expands on evaluation standards & skills in library tradition
  • A number of criteria are used
    • most derived from traditional criteria, but modified for the web, others added
    • could be found on many library sites
      • librarians provide the public and colleagues with web evaluation tools and guidelines as part of their services

Tefko Saracevic

criteria for evaluation of web dlib sources
Criteria for evaluation of web & Dlib sources
  • What? Content
    • What subject(s), topic(s) covered?
    • Level? Depth? Exhaustively? Specificity? Organization?
    • Timeliness of content? Up-to-date? Revisions?
    • Accuracy?
  • Why? Intention
    • Purpose? Scope? Viewpoint?
  • For? Users, use
    • Intended audience?
    • What need satisfied?
    • Use intended or possible?
    • How appropriate?

Tefko Saracevic

criteria ...
  • Who done it? Authority
    • Author(s), institution, company, publisher, creator:
      • What authority? Reputation? Credibility? Trustworthiness? Refereeing?
      • Persistence? Will it be around?
      • Is it transparent who done it?
  • How? Treatment
    • Content treatment:
      • Readability? Style? Organization? Clarity?
    • Physical treatment:
      • Format? Layout? Legibility? Visualization?
    • Usability
  • Where? Access
    • How available? Accessible? Restrictions?
    • Links persistence, stability?

Tefko Saracevic

criteria ...
  • How? Functionality
    • Searching, navigation, browsing?
    • Feedback? Links?
    • Output: Organization? Features? Variations? Control?
  • How much? Effort, economics
    • Time, effort in learning it?
    • Time, effort in using it
    • Price? Total costs? Cost-benefits?
  • In comparison to? Wider world
    • Other similar sources?
      • where & how similar or better results may be obtained?
      • how do they compare?

Tefko Saracevic

evaluation to what end
Evaluation:To what end?
  • To asses & then improve performance – MAIN POINT
    • to change searches & search results for better
  • To understand what went on
    • what went right, what wrong, what works, what doesn't & then change
  • To communicate with user
    • explain & get feedback
  • To gather data for best practices
    • conversely: eliminate or reduce bad ones
  • To keep your job
    • even more: to advance
  • To get satisfaction from job well done

Tefko Saracevic

  • Evaluation is a complex task
    • but also an essential part of being an information professional
  • Traditional approaches & criteria still apply
    • but new ones added or adapted to satisfy new sources, & new methods of access & use
  • Evaluation skills are in growing demand particularly because web is value neutral
  • Great professional skill to sell!

Tefko Saracevic

evaluation perspectives
Evaluation perspectives

Tefko Saracevic

possible rewards
Possible rewards*

* but don’t bet on it!

Tefko Saracevic