1 / 32

TUE / Information Retrieval Reference Structures part 2 – NSR architecture Trezorix & RNA-project

TUE / Information Retrieval Reference Structures part 2 – NSR architecture Trezorix & RNA-project. TUE / Information Retrieval / Reference Structures (2). contents. chapter 1 – flat file system advantages / disadvantages.

drake
Download Presentation

TUE / Information Retrieval Reference Structures part 2 – NSR architecture Trezorix & RNA-project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TUE / Information RetrievalReference Structurespart 2 – NSR architectureTrezorix&RNA-project

  2. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / extra data

  3. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data

  4. TUE / Information Retrieval / Reference Structures (2) flat file system / advantages and disadvantages advantages all relevant information is closely gathered around the concept, so the concept data don’t have to be queried together speed disadvantages redundancy no guaranteed integrity

  5. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data

  6. TUE / Information Retrieval / Reference Structures (2) search engine / features phrase searchingboolean operatorsproximity searchingdirected proximity searchingphonic searchingstemmingnumeric range searchingfuzzy searchingconcept searchingautomatic term weightingpositional scoringvariable term weighting combining nearly all search types

  7. TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries

  8. TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries the columns show the respective times for the query (in seconds), the total number of documents returned by both programs, and the documents returned by one program that were not returned by the other

  9. TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries * * dtSearch is faster in all except two, indicated by asterisks

  10. TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries the documents returned by dtSearch are always equal to, or a superset of, the documents returned by MS-SQL - MS-SQL missed some documents due to malformed punctuation, e.g., no space after a period, so that a term of interest is “conjoined” to the first word of the next sentence

  11. TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries source: Journal of the American Medical Informatics Association

  12. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data

  13. TUE / Information Retrieval / Reference Structures (2) identifiers / scientific names advantages straightforward human readable disadvantages spelling mistakes difficult to define as unique id’s (special characters, etc.) homonyms occur, no unique identifying there is a ‘system behind’, which is dangerous in fact, there are even more ‘systems behind’

  14. TUE / Information Retrieval / Reference Structures (2) identifiers / tcn code tcn code: taxon code nederland advantages good indentifier, can’t get lost when splitting a concept, two new identifiers are created disadvantages when splitting a concept, strange things happen not generally accepted (mainly fresh water organisms) there is a ‘system behind’, which is dangerous

  15. TUE / Information Retrieval / Reference Structures (2) identifiers / uri’s as identifiers advantages flexible human readable (to a certain extend) disadvantages bound to a domain name, no guarantee for persistency local rules within domains

  16. TUE / Information Retrieval / Reference Structures (2) identifiers / digital object identifiers advantages accepted for ISO standardisation can be used to identify any media or content already over 20 million DOI’s assigned disadvantages a DOI has to be registered with a Registration Agency (a small fee per DOI)

  17. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data

  18. TUE / Information Retrieval / Reference Structures (2) external links live queries to external websites: data of Ministery of Agriculture (LNV) nature observations site (waarnemingen.nl)

  19. TUE / Information Retrieval / Reference Structures (2) external links server side: caching possibilities client side: scalability Ajax

  20. TUE / Information Retrieval / Reference Structures (2) external links findability: dynamically obtained, so cannot be queried solution:spidering, indexing with free text search engine solution:integration of databases

  21. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data

  22. TUE / Information Retrieval / Reference Structures (2) NSR between source databases source databases:external databases with authorized data which supply essential elements to the system examples:taxonomic thesaurusimage library related issues:bringing data from different sources together for one presentationexternal source keeps its own independant existancelive imports/updates, deletion of old records

  23. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data

  24. TUE / Information Retrieval / Reference Structures (2) NSR towards compagnion databases or websites naming system:image librarywhale beachings and observations site (walvisstrandingen.nl) presentation of taxonomics structure:nature observations site (waarnemingen.nl) how far can we go?generalisation?specific applications? Google and other web search engines

  25. TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / extra data

  26. TUE / Information Retrieval / Reference Structures (2) datamodels / reference structures SKOSSimple Knowledge Organisation SystemW3C standard RDF vocabulary compliant to ISO 2788 and ISO 5964 (thesaurus standards)for defining ‘simple’ structures, like thesauri, glossaries, taxonomies, etc.undemanding in terms of expertise and effortcomplemental with OWL OWLWeb Ontology LanguageW3C standardRDF vocabularyfor defining complex conceptual structures demanding in terms of expertise and effortcomplemental with SKOS

  27. TUE / Information Retrieval / Reference Structures (2) datamodels / taxonomic objects Darwin Corerecommended standard (Taxonomic Database Working Group)small set of data element definitions (44) flat structure for sharing and integration of primary biodiversity data ABCDrecommended standard (Taxonomic Database Working Group)access to biological collections data (hence ABCD)comprehensive set of data elements (700)hierarchical structure, ontologyexchange of primary biodiversity datacompatible with Darwin Core NBN datamodelNational Biodiversity Network (UK)reliable exchange of biodiversity data from heterogeneous sources hierarchical structure, ontologymapping to Darwin Core, ABCD, etc.

  28. TUE / Information Retrieval / Reference Structures (2) datamodels / extra data species counterfor each taxon the number of underlying species is displayed availability of photographs of a speciesfor each taxon images of underlying taxa is displayed

  29. TUE / Information RetrievalReference StructuresTrezorix&RNA-projectend of part 2 www.rnaproject.orgwww.soortenregister.nl

  30. TUE / Information RetrievalReference StructuresassignmentTrezorix&RNA-project

  31. TUE / Information Retrieval / Reference Structures / assignment assignmentmake a website for digital access to (part of) the NSR collection goalto illustrate use of different reference structures for browsing and searching of digital collections stepsdescribe a NSR datamodelrepresent the structure part of the NSR in SKOSrepresent as much of the ‘extra data’ as possible in SKOSrepresent those data which don’t fit into SKOS in OWLpresent the result in a websiteuse the open source RDF-framework Sesame for storage of structuresuse the open source search engine Lucene for findability of NSR-elements applicable ‘extra data’change record datadata for species counterdata about availability of photographs of a species what we supplya limited NSR data set (for instance ‘songbirds’)extra datamore technical background about the NSR

  32. TUE / Information RetrievalReference StructuresTrezorix&RNA-projectend www.rnaproject.orgwww.soortenregister.nl

More Related