1 / 17

The Domain-Specific Track at CLEF 2007

The Domain-Specific Track at CLEF 2007. Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest, September 19, 2007. Outline. The Domain-Specific Task Collections & Controlled Vocabularies Topics

Olivia
Download Presentation

The Domain-Specific Track at CLEF 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest, September 19, 2007

  2. Outline • The Domain-Specific Task • Collections & Controlled Vocabularies • Topics • Participants, Runs & Relevance Assessments • Themes • Summary & Outlook

  3. The Domain-Specific Task • CLIR on structured scientific document collections: • social science domain • bibliographic metadata • controlled vocabularies for subject description • Leverage bibliographic metadata & controlled vocabularies for: • search • translation

  4. The Domain-Specific Task • Tasks: • Monolingual against German, English or Russian • Bilingual against German, English or Russian • Multilingual against combined collection

  5. Collections

  6. Controlled Vocabularies • 5 different subject-describing terminologies: • Thesaurus for the Social Sciences (GIRT-DE, -EN) • Thesaurus of Sociological Indexing Terms (CSA-SA) • INION Thesaurus (ISISS) • Social Sciences Classification (GIRT-DE, -EN) • Sociological Abstracts Classification (CSA-SA)

  7. Controlled Vocabularies – Mapping Tools • Translation: • GIRT German  GIRT English • Intellectual term mappings (cross-walks): • equivalent terms in vocabularies • GIRT German  CSA-SA English • GIRT English  CSA-SA English • original-term: agricultural area • mapped-term: Rural areas

  8. Topics • 25 topics in standard TREC format (title, desc, narr): • 15 volunteers (social scientists) • 2-5 suggestions from 28 subject specialties • checked for: • coverage in collections • variance from previous years • translated into English, Russian

  9. Participants 5 groups

  10. Runs

  11. Relevance Assessments All assessments done with Univ. of Padova‘s DIRECT System. * In Russian collection:3 topics without relevant topics

  12. Relevance Assessments – Best MAP

  13. Themes - Retrieval models • Lucene • Language Modelling • Logistic Regression • Comparison: Vector Space, LM,Probabilistic - Okapi, DFR • Data fusion • Russian • word-based vs. N-gram retrieval • new light-weight stemmer

  14. Themes – Query Expansion • Entry Vocabulary Modules • query terms associated with thesaurus terms from documents • Thesaurus Lookup • combined thesaurus from all CVs • GIRT Thesaurus Index • Lexical Entailment • find document terms in relation to query terms • Blind Feedback

  15. Themes – Translation • Lucene plug-in • Babelfish, Google, PROMT, Reverso • Bilingual thesaurus mapping • Dictionary adaption • disambiguate term translation given language context of feedback documents • Statistical machine translation • MATRAX • Commercial Software

  16. Summary & Outlook • Extension of Russian materials • Translation table DE-EN-RU for GIRT Thesaurus • Translation table RU-EN for INION Thesaurus • Mapping between GIRT – INION Thesaurus • More tools for Terminology mapping • different relationships (0T, SYN, BT, NT, RT) • GESIS-IZ project: > 40 mappings • 25 controlled vocabularies / 11 disciplines • ~ 125,000 terms & phrases • ~ 400,000 relations

  17. Domain-Specific Track: http://www.gesis.org/en/research/ information_technology/clef_ds_2007.htm Vocabulary Mappings: http://www.gesis.org/en/research/ information_technology/komohe.htm Email: vivien.petras@gesis.org

More Related