1 / 14

The PLAZI Markup System

Universität Karlsruhe (TH) Research University – founded 1825. The PLAZI Markup System. Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter. The PLAZI Markup System. Document markup, external referencing. Taxonomic data sources & web services. Taxon LSIDs, GeoData.

jcramer
Download Presentation

The PLAZI Markup System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universität Karlsruhe (TH) Research University – founded 1825 The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter

  2. The PLAZI Markup System The PLAZI Markup System Document markup, external referencing Taxonomic data sources & web services Taxon LSIDs, GeoData GoldenGATE Document Editor External Data Sources New Taxon Names Links, Materials Citations Marked-Up Documents XML & PDF storage, treatment server Search portal, TAPIR provider, RSS feed PLAZI Server PLAZI Search Portal Queries Treatments, Detail Data, PDF Document Handles

  3. The PLAZI Markup System Indexers Indexers Indexers Indexers Web Service SRS FT MD MC TN Document Management Data XML Documents Index Data File System PostgreSQL The PLAZI Server • GoldenGATE Search & Retrieval Server (SRS) • Extracts individual treatments from XML documents • Stores and indexes treatments • Based on independend, pluggable Indexers • Taxonomic names • Materials citations • Document meta data • Full text • Serves treatments or indexed details • DSpace • Stores PDF and XML documents • Issues Handles for documents

  4. The PLAZI Markup System The PLAZI Markup System Document markup, external referencing Taxonomic data sources & web services Taxon LSIDs, GeoData GoldenGATE Document Editor External Data Sources New Taxon Names Links, Materials Citations Marked-Up Documents XML & PDF storage, treatment server Search portal, TAPIR provider, RSS feed PLAZI Server PLAZI Search Portal Queries Treatments, Detail Data, PDF Document Handles

  5. The PLAZI Markup System The PLAZI Search Portal • Series of Java Servlets running in Apache Tomcat • Front-end for SRS Web Service • Linker plug-ins create hyperlinks to other web sites • HTML based search portal for humans • Search treatments & index data • Links submitting new search queries • Links to external data sources (e.g. HNS, GoogleMaps) • Links to PDF document & XML versions of treatments • XML document access in various XML schemas • TAPIR provider • Taxonomic names • Materials citations • RSS feed for new treatments

  6. The PLAZI Markup System The PLAZI Search Portal Probolomyrmex tani

  7. The PLAZI Markup System The PLAZI Markup System Document markup, external referencing Taxonomic data sources & web services Taxon LSIDs, GeoData GoldenGATE Document Editor External Data Sources New Taxon Names Links, Materials Citations Marked-Up Documents XML & PDF storage, treatment server Search portal, TAPIR provider, RSS feed PLAZI Server PLAZI Search Portal Queries Treatments, Detail Data, PDF Document Handles

  8. The PLAZI Markup System The GoldenGATE Editor • Java-based editor for semi-automated document markup • Extensible through plug-in mechanism • Independent of specific XML schema • Element-level XML editing (XML syntax is generated) • Flexible display for clear view on all detail levels • Existing plug-ins provide broad spectrum of functionality: • NLP-based markup generation • Regular expressions, gazetteers, GATE JAPE • Homegrown and third-party NLP components • Import of data from external sources (e.g. LSIDs) • Specialized document views for correcting NLP results • Markup transformation & filtering • IO components for different data formats & storage locations(e.g. for uploading XML documents to PLAZI server)

  9. The PLAZI Markup System The GoldenGATE Editor

  10. The PLAZI Markup System The PLAZI Markup System Document markup, external referencing Taxonomic data sources & web services Taxon LSIDs, GeoData GoldenGATE Document Editor External Data Sources New Taxon Names Links, Materials Citations Marked-Up Documents XML & PDF storage, treatment server Search portal, TAPIR provider, RSS feed PLAZI Server PLAZI Search Portal Queries Treatments, Detail Data, PDF Document Handles

  11. The PLAZI Markup System The External Data Sources • Hymenoptera Name Server (HNS) • Retrieve LSIDs for taxon names • Enter new taxon names in HNS database • Further LSID sources: ZooBank, Index Fungorum • GBIF pulls materials citations via TAPIR • EOL pulls treatments via TAPIR (to start soon)

  12. The PLAZI Markup System Outlook • Tighter integration of GoldenGATE editor with server • Load plug-ins from server Easier update distribution • Upload documents directly after OCR • Host documents at server throughout markup Users can share markup work (experts do LSIDs, etc) Treatments available in search portal soon as marked up • Auto-distribute documents to different storage locations • Run automated markup generation on server side • Get corrections from community via online feedback forms • Other extensions of GoldenGATE editor • Simplified, more flexible plug-in architecture • Extensible user interface

  13. Universität Karlsruhe (TH) Research University – founded 1825 Thank you! Questions? agosti@amnh.org catapanoth@gmail.com ram@cs.umb.edu sautter@ipd.uka.de Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter PLAZI homepage PLAZI search portal GoldenGATE homepage http://plazi.org http://plazi.org:8080/GgSRS http://idaho.ipd.uka.de/GoldenGATE

  14. The PLAZI Markup System The GoldenGATE Editor V3 Plug-in GUI extensions (hideable) Simplified, more flexible architecture Document navigator for finding stuff more quickly Pre-OCR page images for correcting OCR errors

More Related