1 / 10

Browsing by phrases: terminological information in interactive multilingual text retrieval

Joint Conference on Digital Libraries 2001 Roanoke, VA. Browsing by phrases: terminological information in interactive multilingual text retrieval. Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED).

swann
Download Presentation

Browsing by phrases: terminological information in interactive multilingual text retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint Conference on Digital Libraries 2001 Roanoke, VA Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED)

  2. Goals • to bridge the gap between users’ vocabulary and collection terminology • even cross-language • without needs of thesauri construction • robust and efficient integration of NLP resources and tools • Semantic network: EuroWordNet • Tokeniser • Morphological analyser • POS tagger • Shallow parser

  3. Approach Perform Automatic Terminology Extraction to provide: • At indexing time: Criteria to add to the index a controlled set of phrases • At query time: Term browsing, to navigate through the terminology and access the documents from complex terms

  4. Lemma Document Phrase Approach The task:To retrieve terminology • Lexical compounds are retrieved from mono-lexical terms Requires • A phrase indexing level • Query expansion • Query translation Phrasal information is used to reduce noise when expanding and translating (co-occurrence of words in the same phrase)

  5. Patterns for Spanish and Catalan N N N A N [A] Prep N [A] N [A] Prep Art N [A] N [A] Prep V N [A] Prep V N [A] Patterns for English A N [N] N N [N] A A N N A N N Prep N Terminology Extraction and Indexing Processing • Tokenising, Lemmatising,Tagging • Shallow parsing (Syntactic pattern recognition) Results Terminological phrases for each language • Term frequency • Document frequency • Component lemmas

  6. Query Expansion and Translation de Prohibición embargo entredicho interdicción interdicto proscripción ban interdiction prohibition proscription de Pruebas cata, catadura degustación ensayo escandallo experimento gustación muestreo, tanteo demonstrate establish, exhibit experiment experimentation fall, fitting indicate, point present, proof prove, run sample, sampling shew,show, taste test, trial, try Nucleares nuclear nuclear Tratados acuerdo capitulación concertación convenio cuidar, pacto manejar procesar accord discourse handle manage pact process treat treatise treaty Expansion Translation Nuclear fitting interdiction manage? Nuclear taste proscription process?

  7. Query in Spanish Hierarchy of terms Ranking of documents English Spanish Catalan

  8. QUERY EXPLORE DOCUMENT EXPLORE PHRASE RECONSULT WITH PHRASE

  9. Evaluation • 1523 sessions with interaction • an average of 5.11 actions per session • explore phrase is used in 65.13% All queries 1 word queries >1 word queries First action DOC 40.70% 45.49% 37.30% afterQUERY PHRASE 51.14% 45.65% 55.05% RECONSULT8.141%8.846%7.640% Last action before finishingQUERY 48.74% 53.38% 45.15% the session with PHRASE 42.95% 40.85% 44.57% exploreDOCRECONSULT 8.306% 5.764% 10.27%

  10. Conclusions • Development of a search engine based on terminology extraction • Using terminological phrases in an intermediate way between free-searching and thesaurus-guided searching • Without needs of thesaurus construction • Bridging the distance between the terms used in the query and the terminology used in the collection (even in different languages) • Users appreciate phrasal information for document selection • Phrases give higher expectations of relevance than Google’s ranking • WTB phrasal information can substantially complement the document ranking provided by the search engines

More Related