1 / 34

Controlled Vocabularies in TELPlus

Controlled Vocabularies in TELPlus. Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007. Agenda. TELPlus Context Improving subject access 3 sub-tasks Services for TEL. TELPlus Context. Started October 2007 Running 27 months Content WPs

zizi
Download Presentation

Controlled Vocabularies in TELPlus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007

  2. Agenda • TELPlus Context • Improving subject access • 3 sub-tasks • Services for TEL

  3. TELPlus Context • Started October 2007 • Running 27 months • Content WPs • OCRing previously digitised material • Improving the usability of TEL through OAI PMH compliancy • Improving Access • Integrating services with TEL portal • User personalisation services • Extending TEL to Bulgaria & Romania

  4. WP3 – Improving Access • Task 1: Indexing for usability • Review/test state-of-the-art semantic search engines • On content of documents • Task 2: Improving subject access • Task 3: FRBR aggregation, search and browsing • Create/exploit FRBR metadata repositories • Task 4: Focus on users • Focus groups on prototypes

  5. WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Search through collections • Using metadata • In a controlled setting • Paving the way for enhanced usages • Advanced treatments mentioned in TELplus need conceptual structures and links between these structures • E.g. clustering

  6. WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Reference: MACS project • Manually-built semantic equivalences between Rameau, SWD & LCSH headings

  7. MACS: Querying Collections

  8. MACS: Query Reformulation Options

  9. WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Reference: MACS project • Manual equivalences between Rameau, SWD, LCSH headings • Here: an experiment on deploying automatic alignment techniques • Determining possible strategies • Assessing feasibility and usefulness • MACS context

  10. WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other

  11. Converting subjects to standard representation language Goal: solving syntactic heterogeneity between vocabularies • Enabling the use of standard tools • E.g. for query (re)formulation • Paving the way for dealing with semantic heterogeneity • Definitions of concepts expressed according to a common model

  12. Converting subjects to standard representation language Approach: Semantic Web and SKOS • Semantic Web • Knowledge objects as web resources (URIs) • Description by linking resources (RDF) • Description using shared formal vocabularies (ontologies) • SKOS • A standard Semantic Web model (ontology) • For knowledge organization systems (thesauri, subject heading lists…)

  13. SKOS: Example skos:ConceptScheme rdf:type skos:Concept http://www.iconclass.nl/ rdf:type skos:inScheme http://www.iconclass.nl/s_11F skos:prefLabel skos:broader “the Virgin Mary”@en “la Vierge Marie”@fr skos:prefLabel http://www.iconclass.nl/s_11

  14. Converting subjects to standard representation language - Process • Getting processable versions from owners • E.g. XML • Analyzing the models • Converting to SKOS

  15. WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other

  16. Vocabulary Alignment • Specifying required alignment format (links) • Type of mapping links: equivalence, broader • Cardinality: one-to-one, one-to-many • Taking application context (TEL) into account

  17. Vocabulary Alignment • Specifying required alignment format (links) • Selecting (& running) alignment techniques/tools • Inspired by semantic web approaches

  18. Vocabulary Alignment Techniques • Similar to ontology alignment problem • Existing approaches for (semi-) automatic ontology alignment • Using techniques from linguistics, computer science, statistics • Problem: performances do not allow 100% automatic alignment • Problem: multilingual case • Some techniques cannot be used

  19. Backgroundknowledge Potential Technique: Using Background Knowledge • Using a shared conceptual reference to find links “Publication” “Calendar” SHL 1 SHL 2

  20. Potential Technique: Statistical Alignment • Object information (book indexing) “Dutch Literature” SHL 1 SHL 2 “Dutch” Dually-indexed books

  21. Vocabulary Alignment • Specifying required alignment format (links) • Selection (& running) of tool/method • Evaluation (& cleaning) • Considering application

  22. Evaluation of Alignments • MACS has produced mappings! • Possible gold standard • But: has MACS produced all mappings? • Which proportion of the SHLs is covered? • Taking into account all indexing strings? • Are MACS mappings the only interesting ones? • “Serendipity” mappings • Concepts that are not equivalent but could bring useful results when added to queries • Compensating for indexing variability

  23. Evaluation of Alignments • Several scenarios for using and evaluating alignments • Concept-based search • Re-indexing • Integration of one SHL into the other • SHL Merging • Free-text search • Navigation

  24. Evaluation of Alignments • Several scenarios for using and evaluating alignments • Concept-based search • Retrieving books indexed by SHL1 using SHL2 concepts • Re-indexing • Integration of one SHL into the other • SHL Merging • Free-text search • Matching user search terms to both SHL1 or SHL2 concepts • Navigation • Browsing several collections using one SHL structure

  25. Evaluation of Alignments • Several settings for a single scenario • Fully automatic reformulation vs assisted reformulation (candidates) • Different evaluation measures • Good mappings vs acceptable ones • Number of candidates for reformulation • Semantic closeness to original query

  26. Vocabulary Alignment • Specifying required alignment format (links) • Selection (& running) of tool/method • Evaluation (& cleaning) • Assessment of the approach • Efforts required, quality, extendibility

  27. WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other

  28. Deploying the alignment knowledge obtained into TEL framework • Observing integration of MACS data into TEL • Conceptual input for alignment requirements • Integration of the obtained alignment in TEL • Assessment of the alignment integration • Technical aspects, usage aspects

  29. Reminder • Alignment is a difficult problem • Application-specific alignment pretty much unexplored in Semantic Web research More a feasibility study than a complete solution to the problem Practical goal: investigate how automatic techniques could help MACS-like initiatives • Manual mapping is labour-intensive

  30. Agenda • TELPlus Context • Improving subject access • 3 sub-tasks • Services for TEL

  31. WP4 – Integrating services with the European Library portal Theo van Veen (KB) Tasks: • Identifying services that are going to give the user the greatest return • Creating new services • Integrating services within TEL …

  32. WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: • Thesaurus and name authority service • Providing terms linked to query terms • Semantic enrichment service • Users can annotate search results with terms • Distance between terms and related terms

  33. WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: • Thesaurus and name authority service • Semantic enrichment service • Distance between terms and related terms Adding more value from controlled vocabularies and alignments between them

  34. Thanks!

More Related