university of palestine n.
Skip this Video
Loading SlideShow in 5 Seconds..
University of Palestine PowerPoint Presentation
Download Presentation
University of Palestine

Loading in 2 Seconds...

play fullscreen
1 / 27

University of Palestine - PowerPoint PPT Presentation

  • Uploaded on

University of Palestine. Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester 2008-2009. Chapter 3 Retrieval Evaluation. Why is System Evaluation Needed?. There are many retrieval systems on the market, which one is the best?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'University of Palestine' - shannon-mcleod

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
university of palestine

University of Palestine

Topics In CIS

ITBS 3202

Ms. Eman Alajrami

2nd Semester 2008-2009

Chapter 3

Retrieval Evaluation

why is system evaluation needed
Why is System Evaluation Needed?
  • There are many retrieval systems on the market, which one is the best?
  • When the system is in operation, is the performance satisfactory? Does it deviate from the expectation?
  • To fine tune a query to obtain the best result (for a particular set of documents and application)
  • To provide inputs to cost-benefit analysis of an information system (e.g., time saving compared to a manual system)
  • To determine the effects of changes made to an existing system (system A versus system B)
  • Efficiency: speed
  • Effectiveness: how good the result is?
retrieval evaluation
Retrieval Evaluation
  • Before the final implementation of an information retrieval system an evaluation of the system is carried out ,The type of evaluation to be considered depends on objectives of retrieval system Any software system has to provide the functionality it was conceived for. Thus , the first type of evaluation which should be considered is a functionalanalysis.
  • Functional analysis

Does the system provide most of the

functions that the user expects?

What are unique functions of this


How user-friendly is the system?

  • Error Analysis

How often does the system fail?

retrieval performance evaluation
Retrieval Performance Evaluation

The most common measures of system performance are timeand space The shorter the response time, the smaller the space used, the better the system is considered to be. There is a tradeoff between space complexity and time complexity which frequently allows trading one for the other.

retrieval performance evaluation1
Retrieval Performance Evaluation
  • In a system designed for providing informationretrieval,other metrics, besides time and space are also of interest, like recall and precision. Since the user query request is vague, the retrieved documents are not exact answers and have to be ranked according to their relevance to the query.
  • Such relevance ranking concept is not present in data retrieval systems.
  • The IR systems require the evaluation of how precise is the answer set.
relevance for ir
Relevance for IR
  • The capability of an information retrieval system to select and

retrieve data appropriate to a user’s needs

  • A measurement of the outcome of a search
  • The judgment on what should or should not be retrieved
  • There are no simple answers to what is relevant and what is not relevant. (difficult to define, Subjective, depending on knowledge, needs, time, situation, etc.)
  • Relevanceis a central concept of information retrieval.
e ffectiveness of retrieval system
Effectiveness of Retrieval System

Effectivenessis a measure of the ability of the system to retrieve relevant documents while at the same time holding back non-relevant one, It can be measured by recall and precision.

difficulties in evaluating ir system
Difficulties in Evaluating IR System
  • Effectiveness is related to relevancy of items retrieved
  • Relevancy is not a binary evaluation but a continuous function
  • Even relevancy judgement is binary, it is difficult to make the judgement
  • Relevancy, from a human judgement standpoint, is
    • subjective - depends upon a specific user’s judgement
    • situational - relates to user’s requirement
    • cognitive - depends on human perception and behavior
    • temporal - changes over time
retrieval performance evaluation2
RetrievalPerformance Evaluation
  • The Retrieval Performance Evaluation for information retrieval systems is usually based on a test reference collection and on

an evaluation measure.

  • The test reference collection consists of:
      • A collection of documents.
      • A set of example information requests.
      • A set of relevant documents ( Provided by specialists) for each example information requests.
recall and precision
Recall and Precision
  • Given a query, how many documents should a system retrieve:
    • Are all the retrieved documents relevant?
    • Have all the relevant documents been retrieved ?
  • Measures for system performance:
    • The first question is about the precision of the search
    • The second is about the completeness (recall) of the search.

Entire document collection

retrieved & irrelevant

Not retrieved & irrelevant


Retrieved documents

Relevant documents

retrieved & relevant

not retrieved but relevant



not retrieved

Retrieval Effectiveness - Precision and Recall




Not Relevant




Not retrieved





P = --------------

R = --------------



precision and recall
Precision and Recall
  • Precision
    • evaluates the correlation of the query to the database
    • an indirect measure of the completeness of indexing algorithm
  • Recall
    • the ability of the search to find all of the relevant items in the database
  • Among three numbers,
    • only two are always available
      • total number of items retrieved
      • number of relevant items retrieved
    • total number of relevant items is usually not available
  • Unfortunately, precision and recall affect each other in the opposite direction! Given a system:
    • Broadening a query will increase recall but lower precision
    • Increasing the number of documents returned has the same effect
relationship between recall and precision

Return most of the relevant

documents but include many junks

The ideal





Return mostly relevant

documents but miss

many relevant ones

Relationship between Recall and Precision


recall and precision examples
Recall and Precision... Examples
  • If you knew that there were 1000 relevant documents in a database ( R ) and your search retrieved 100 of these relevant documents ( Ra )  Your recall would be 10%.
  • If your search retrieves 100 documents ( A ) and 20 of these are relevant ( Ra ), your precision is 20%.
fallout measure
Fallout Measure
  • Falloutis just everything that is left over. All the junk that came up in your search that was irrelevant.
  • If you retrieve 100 documents and 20 are relevant, then your fallout is 80%.
  • Fallout becomes a bigger problem as the size of your database grows and your retrieval gets larger .
fallout rate
Fallout Rate
  • Problems with precision and recall:
    • A query on “Hong Kong” will return most relevant documents but it doesn’t tell you how good or how bad the system is! (What is the chance that a randomly picked document is relevant to the query?)
    • number of irrelevant documents in the collection is not taken into account
    • recall is undefined when there is no relevant document in the collection
    • precision is undefined when no document is retrieved
  • A good system should have high recall and low fallout
example 1
Example 1
  • Consider an example information request for which a query q is formulated.
  • Assume that a set Rq is composed of the following documents:
  • Assume that a set Rq containing the relevant documents for q has been defined.

Rq = { d3,d5,d9,d25,d39,d44,d56,d71,d89,d123 }

example 11
Example 1
  • Consider now a new retrieval algorithm which has just been designed.
  • Assume that this algorithm returns, for query q, a ranking of the documents in the answer set as follows.
example 12
Example 1
  • The documents that are relevant to the query q

are marked with a bullet.

  • Compute :
    • Recall
    • Precision
    • Fallout rate

Precision = number of relevant doc retrieved /

Total numbers of doc retrieved.

Recall= number of relevant doc retrieved / total number of relevant documents .


Precision = 5 / 15 = 33 %

Recall = 5 / 10 = 50 %