1 / 20

Acceso a la información mediante exploración de sintagmas

III Jornadas de Bibliotecas Digitales El Escorial, 2002. Acceso a la información mediante exploración de sintagmas. Anselmo Peñas , Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED. Overview. Motivation: problems in query formulation Hand-crafted approaches

clonergan
Download Presentation

Acceso a la información mediante exploración de sintagmas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. III Jornadas de Bibliotecas Digitales El Escorial, 2002 Acceso a la información medianteexploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED

  2. Overview • Motivation: problems in query formulation • Hand-crafted approaches • Controlled vocabularies • Automatic approaches • Pure string processing • Automatic terminology extraction • Website Term Browser • Conclusions

  3. Formulation Query Refinement Search engine Document ranking Docs. Precise information needs Information need Help users to express and precise their information needs • Vague need • User doesn’t know exactly what he is looking for • Broad need • Compile or summarize pieces of information around a topic Users develop strategies without system assistance

  4. Information need Formulation Query Refinement Search engine Document ranking Docs. Language barriers Help users to overcome language barriers • Specific domain terminology • Find appropriate wording • Translinguality • Information available only in a foreign language • Natural Language characteristics • Lexical ambiguity • Terminology variation

  5. Controlled vocabularies indexing & browsing Terminology General approaches Information Retrieval

  6. Controlled vocabularies Problems • Construction & management (high cost) • Indexing • Manual keyword assessment • Errors in automatic keyword assessment • Domain specific • New domain needs a new thesaurus • Specialist oriented (know preferred descriptors) • Less specialized audience get poorer results

  7. Controlled vocabularies indexing & browsing String Processing Terminology Free text indexing General approaches Information Retrieval

  8. Search Free text searching • Help users to express and precise their information needs? • Help users to overcome language barriers?

  9. Keyphrase navigation (Phrasier) Controlled vocabularies indexing & browsing String Processing Terminology Free text indexing Phrase indexing & browsing (Phind) General approaches Information Retrieval

  10. “Keyphrase” navigation (Jones 1999) • Automatic extraction and assessment of 10 “keyphrases” to each document (KEA, Frank 1999) • Navigation between documents that share “keyphrases” Problems • No translinguality • No terminology variation

  11. Problems • No translinguality • No terminology variation

  12. Objectives • Develop a model • to help users to express and precise their information needs • to help users to overcome language barriers • Bringing to users the collection terminology • Morpho-syntactic, semantic & translingual variations • Without needs of thesauri construction • Establish an appropriate evaluation framework Website Term Browser

  13. Keyphrase navigation (Phrasier) Controlled vocabularies indexing & browsing String Processing Terminology Free text indexing Terminology Retrieval & Term browsing (WTB) Phrase indexing & browsing (Phind) Disambiguation Conceptual indexing Automatic Terminology Extraction Proposed approach Information Retrieval Natural Language Processing

  14. Terminology Retrieval From Automatic Terminology Extraction... Obtain lists of terms relevant for a specific domain • Term Extraction • Term Weighting • Term Selection ... to Terminology Retrieval Retrieve terms relevant for an information need • User query points the relevant terms • No terminology lists truncation • Favor recall relaxing term extraction patterns ... & Browsing • Navigate through relevant terminology • Access information from retrieved terms • Bridge the gap between query and collection vocabularies • Cross-Language

  15. Query in Spanish Hierarchy of terms Ranking of documents English Spanish Catalan

  16. Semantic variations Translingual variation Morpho-syntactic variations (permutation, insertion)

  17. Usefulness of Term Browsing • 2000 session logs in UNED.es comparing: • - Use of term area from WTB • - Use of document area from Google

  18. Conclusions Browsing of phrases and terminology • User oriented approach • Interaction over terminological information • Intermediate way between free-searching and thesaurus-guided searching • Without needs of thesaurus construction Website term Browser • Brings to users the collection terminology • Morpho-syntactic & semantic variations • Translinguality Evaluation • Users appreciate Term Browsing • WTB phrasal information can substantially complement the document ranking provided by the search engines

More Related