Discussion class 5
Download
1 / 11

Discussion Class 5 - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Discussion Class 5. TREC. Discussion Classes. Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When answering: Stand up. Give your name. Make sure that the TA hears it. Speak clearly so that all the class can hear. Suggestions:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Discussion Class 5' - glora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Discussion classes

Discussion Classes

Format:

Questions.

Ask a member of the class to answer.

Provide opportunity for others to comment.

When answering:

Stand up.

Give your name. Make sure that the TA hears it.

Speak clearly so that all the class can hear.

Suggestions:

Do not be shy at presenting partial answers.

Differing viewpoints are welcome.


Question 1 objectives
Question 1: Objectives

The TREC workshop series has four goals:

(a) Encourage research in text based retrieval based on large test collections

(b) Communication among industry, academia and government

(c) Transfer of technology from research labs into products by demonstrating methodologies on real-world problems

(d) Increase availability of appropriate evaluation techniques

What does the ad hoc task contribute to each of these goals?


Question 2 the trec corpus
Question 2: The TREC Corpus

Source Size # Docs Median

(Mbytes) words/doc

Wall Street Journal, 87-89 267 98,732 245

Associated Press newswire, 89 254 84,678 446

Computer Selects articles 242 75,180 200

Federal Register, 89 260 25,960 391

abstracts of DOE publications 184 226,087 111

Wall Street Journal, 90-92 242 74,520 301

Associated Press newswire, 88 237 79,919 438

Computer Selects articles 175 56,920 182

Federal Register, 88 209 19,860 396


Question 2 the trec corpus1
Question 2: The TREC Corpus

What characteristics of this data are likely to impact the results of experiments?

Explain the statement, "Disks 1-5 were used as training data."

Suppose that you were designing two search engines: (i) for use with a library catalog, (ii) for use with a Web search service. How does your data differ from the TREC corpus?


Question 3 trec topic statement
Question 3: TREC Topic Statement

<num> Number: 409

<title> legal, Pan Am, 103

<desc> Description:

What legal actions have resulted from the destruction of Pan Am Flight 103 over Lockerbie, Scotland, on December 21, 1988?

<narr> Narrative:

Documents describing any charges, claims, or fines presented to or imposed by any court or tribunal are relevant, but documents that discuss charges made in diplomatic jousting are not relevant.

A sample TREC topic statement


Question 3 trec topic statement1
Question 3: TREC Topic Statement

(a) What is the relationship between TREC topic statements and queries?

(b) Distinguish between manual and automatic methods of query generation.

(c) Explain the process used by the manual methods.

(d) Some of the results used a time limit (e.g., "limited to no more than 10 minutes clock time"). What was being timed?


Question 4 relevance assessments
Question 4: Relevance Assessments

(a) Explain the statement, "All TRECs have used the pooling method to assemble the relevance assessments."

(b) How is relevance assessed?

(c) What is the impact of some relevant documents being missed from the pool?

(d) What is the problem of some relevant documents in the pool coming from only a single run? How serious is this?



Question 5
Question 5:

What are:

(a) The recall-precision curve?

The mean (non-interpolated) average precision?

The report commented that, "two topics are fundamental to

effective retrieval performance." What are they?

How do the automatic tests differ from the manual?


Question 6 the future
Question 6: The future

(a) Why was TREC-8 the last year for the ad hoc task?

(b) Does this mean that text-based information retrieval is now solved?


ad