Thesauri and ontologies for digital libraries
1 / 15

Thesauri and Ontologies for Digital Libraries - PowerPoint PPT Presentation

  • Uploaded on

Thesauri and Ontologies for Digital Libraries. Pavel Smrž, Anna Sinopalnikova, Martin Povolny { smrz, anna, xpovolny} Faculty of Informatics, Masaryk University in Brno, Czech Republic. Outline. Motivation Role of Thesauri and Ontologies in Present DLs, Relations Covered

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Thesauri and Ontologies for Digital Libraries' - mattox

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Thesauri and ontologies for digital libraries

Thesauri and Ontologiesfor Digital Libraries

Pavel Smrž, Anna Sinopalnikova, Martin Povolny

{smrz, anna, xpovolny}

Faculty of Informatics, Masaryk University in Brno, Czech Republic


  • Motivation

  • Role of Thesauri and Ontologies in Present DLs, Relations Covered

  • Word-Association Thesaurus, CLIR

  • XML Document Management System

  • XML Family Standards, XSLT Processor Extension

  • Conclusions and Future Directions


  • size and complexity of DL grow rapidly

  • future DLs will need algorithms to process and understand contained data

  • intelligent procedures must be implemented to transform natural-language knowledge into a more appropriate representation

  • description of concepts and relations between them becomes crucial


  • common understanding of application domains is provided by ontologies

  • creation of broad-coverage ontologies from scratch is extremely labour-intensive

  • efforts to reuse (clean-up, refine, merge) existing resources = wordnet-like semantic networks, lexical databases, thesauri, ...

Thesauri and ontologies in present digital libraries
Thesauri and Ontologies in Present Digital Libraries

  • structuring and classification of digital data(bibliographic classification supplemented/replaced by automatic conceptual document indexing)

  • contradictory results in the area of information retrieval (IR)

    Standard IR measures (precision/recall) vs. navigation through documents, userinterface aspects

Relations covered
Relations Covered

  • Synonymy – query expansion (validated by the user)

    • true synonyms

    • style, register, regional variants

    • orthographic variants (proper names)

  • Hierarchical relations (hyponymy, meronymy) – query expansion, named entity recognition, ...

  • “see-also”, “related-to” relations – definition of topics

Word association thesauri
Word-Association Thesauri

  • Large-scale psycholinguistic experiments (free association test)

  • Large numbers of stimulus-reaction pairs (170 000), many subjects (1 500) of different age, sex, profession, ...

  • Availability for English, German, Russian, Czech

  • Concept search rather than context search

Cross lingual information retrieval and extraction
Cross-lingual information retrieval and extraction

CLIR = finding documents in a language different from the one used in the query

Multilingual resources (wordnets) for many languages (EuroWordNet, BalkaNet) linked by ILI

CLIE = translation of answers back to the language of the user query

Visualiasation of terms referring to hierarchically organized concepts

Xml document management system integrating ontologies
XML Document Management System Integrating Ontologies

  • Several systems allow storing data and metadata together

  • BUT no support for efficient integration thesauri and ontologies

  • DEB – open-source client/server system for efficient storage and retrieval of arbitrary XML collections

  • XML-family standards employed in the data format, customization of UI, query language, visualisation, ...

Xml family standards in deb
XML-Family Standards in DEB

  • DEB clients use XSLT for transforming XML data into HTML (presented with the help of a HTML widget)

  • User-defined data views by means of XSLT

  • Client-side caching of parsed DOM objects

  • XPath for accessing information

  • OWL for storing ontologies transformed automatically from

Extension of the standard xslt processor
Extension of the Standard XSLT Processor

  • nested queries for efficient processing

  • XSLT sheets can request data from DEB server based on information processed

  • Special schema (deb://) creates a virtual space of XML documents that are results of the queries

  • Accessing the server data from XSLT processor the same way as any other external resources

Conclusions and future directions
Conclusions and Future Directions

  • Our research on the role of thesauri and ontologies in DL influenced the development of the Czech part of the multilingual lexical resource developed under the current BalkaNet project and the last extensions to the RussNet project.

  • DEB is currently used as the core DL engine at NLP Lab, FI MU, Brno, Czech Republic. It manipulates standard document collections as well as dictionaries, lexical semantic databases, e-learning materials, ...

Future directions
Future Directions

  • Open research problems related to the conceptual design of lexical resources (integration of generative concepts to the structure of knowledge bases)

  • DEB development – specialized modules for new W3C standards, three-level architecture (thin clients), simplification of UI customization by means of automatically generated XSLT, reimplementation in RUBY, ...