slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The INFILE project: a crosslingual filtering systems evaluation campaign PowerPoint Presentation
Download Presentation
The INFILE project: a crosslingual filtering systems evaluation campaign

Loading in 2 Seconds...

play fullscreen
1 / 15

The INFILE project: a crosslingual filtering systems evaluation campaign - PowerPoint PPT Presentation


  • 160 Views
  • Updated on

The INFILE project: a crosslingual filtering systems evaluation campaign. Romaric Besançon , Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri. Overview. Goals and features of the INFILE campaign Test collections: Documents Topics Assessments Evaluation protocol

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

The INFILE project: a crosslingual filtering systems evaluation campaign


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric Besançon , Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri

    2. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Overview • Goals and features of the INFILE campaign • Test collections: • Documents • Topics • Assessments • Evaluation protocol • Evaluation procedure • Evaluation metrics • Conclusions

    3. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Goals and features of the INFILE Campaign • Information Filtering Evaluation • filter documents according to long-term information needs (user profiles - topics)‏ • Adaptive : use simulated user feedback • Following TREC adaptive filtering task • Crosslingual • three languages: English, French, Arabic • close to real activity of competitive intelligence professionals • in particular, profiles developed by CI professional (STI)‏ • pilot track in CLEF 2008

    4. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Test Collection • Built from a corpus of news from the AFP (Agence France Presse)‏ • almost 1.5 million news in French, English and Arabic • For the information filtering task: • 100 000 documents to filter, in each language • NewsML format • standard XML format for news (IPTC)‏

    5. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Document example document identifier keywords headline

    6. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Document example IPTC category AFP category location content

    7. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Profiles • 50 interest profiles • 20 profiles in the domain of science and technology • developped by CI professionals from INIST, ARIST, Oto Research, Digiport • 30 profiles of general interest

    8. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Profiles • Each profile contains 5 fields: • title: a few words description • description: a one-sentence description • narrative: a longer description of what is considered a relevant document • keywords: a set of key words, key phrases or named entities • sample: a sample of relevant document (one paragraph)‏ • Participants may use any subset of the fields for their filtering

    9. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Constitution of the corpus • To build the corpus of documents to filter: • find relevant documents for the profiles in the original corpus • use a pooling technique with results of IR tools • the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine)‏ • each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample • 28 runs

    10. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Constitution of the corpus (2)‏ • pooling using a “Mixture of Experts” model • first 10 documents of each run is taken • first pool assessed • a score is computed for each run and each topic according to the assessments of the first pool • create next pool by merging runs using a weighted sum • weights are proportional to the score • ongoing assessments • keep all documents assessed • documents returned by IR systems by judged not relevant form a set of difficult documents • choose random documents (noise)‏

    11. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Evaluation procedure • One pass test • Interactive protocol using a client-server architecture (webservice communication)‏ • participant registers • retrieves one document • filters the document • ask for feedback (on kept documents)‏ • retrieves new document • limited number of feedbacks (50)‏ • new document available only if previous one has been filtered

    12. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Evaluation metrics • Precision / Recall/F-measure • Utility (from TREC)‏ P=a/a+b R=a/a+c F=2PR/P+R u=w1∗a-w2∗b

    13. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Evaluation metrics (2)‏ • Detection cost (from TDT)‏ • uses probability of missed documents and false alarms

    14. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Evaluation metrics • per profile and averaged on all profiles • adaptivity: score evolution curve (values computed each 10000 documents)‏ • two experimental measures • originality • number of relevant documents a system uniquely retrieves • anticipation • inverse rank of first relevant document detected

    15. LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit Conclusions • INFILE campaign • Information Filtering Evaluation: • adaptive, crosslingual, close to real usage • Ongoing pilot track in CLEF 2008 • current constitution of the corpus • dry run mid-June • evaluation campaign in July • workshop in September • Work in progress • the modelling of the filtering task assumed by the CI practitioners