1 / 10

CHLT Integration

CHLT Integration. Integration in two directions. Interoperability with indexing structures of Perseus Digital Library Integration of parsers into indexing module of search and visualization tool. Integration with Structure of Perseus Digital Library.

lorand
Download Presentation

CHLT Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHLT Integration

  2. Integration in two directions • Interoperability with indexing structures of Perseus Digital Library • Integration of parsers into indexing module of search and visualization tool

  3. Integration with Structure of Perseus Digital Library • Perseus text display system transforms XML and legacy SGML files tagged according to an arbitrary DTD and creates a consistent set of core data files that can be read by any application • Sentences • Chunks • Lemmatized • Inflected • Catalog of works (PTEXT DB) • Morphological Databases • Short Definitions

  4. File Locations • The surrogate files are written to a location that is associated with the unique ID assigned to the document in the PDL. • Each chunk or sentence also has a unique identifier • These two pieces of information can be used: • To generate URLs to access full text in DL • To generate human readable citations of the sentences according to scholarly conventions

  5. WP2 Integration: Word Profile Tool • Word Profile tool reads lemmatized files to acquire a complete list of words in IGL corpus • All frequency counts, display sentences, human readable citations, and links to full text are based on surrogate files generated by PDL.

  6. WP2 Integration: Multi-Lingual IR Tool • Author and language selection routines in MLIR tool is dynamically generated from PDL metadata catalog • Database of translation equivalents is created directly from SGML/XML and saved as a core data file that is available to other applications in the system • Translation Equivalence Program works with any TEI conformant dictionary. Dictionary selection screen updates dynamically. • Translated query is handed off to current PDL search engine and the visualization tool based on documented APIs

  7. WP4 Integration: Old Norse Text and Parser • Middleware translates Old Norse Parser output to format used by PDL • ISO Language tags in texts tell system to use Old Norse morphology and link to Old Norse lexicon • PDL short definition program automatically extracts information from Zoega

  8. WP4 & 6: Corpus Integration • TEI makes corpus integration easy • Old Norse texts and lexica and Neo-Latin texts are tagged according to TEI standards • Documentation of tagging conventions.

  9. Parser Integration with WP1 • Similar middleware can link LemLat to PDL • WP1 Visualization Tool also includes a parsing/stemming step • This program is designed generally to work with many systems, not simply those created by PDL • Source code for LemLat and Old Norse so that search/visualization tool can be used to search Old Norse and Latin texts that are not part of PDL

  10. Next Steps: • Implementation of parser integration with WP1 • Seamless integration of MLIR tool and production deployment • Improved documentation of tags required for OAI linking

More Related