1 / 30

From Papyrus to Digital: UCI’s Thesaurus Linguae Graecae Project

From Papyrus to Digital: UCI’s Thesaurus Linguae Graecae Project. Maria Pantelia July 2006. Thesaurus Linguae Graecae® (TLG ®) Latin for ‘Treasury of the Greek Language’ 3450 Berkeley Place UC Irvine. Special Research Project

oona
Download Presentation

From Papyrus to Digital: UCI’s Thesaurus Linguae Graecae Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Papyrus to Digital: UCI’s Thesaurus Linguae Graecae Project Maria Pantelia July 2006

  2. Thesaurus Linguae Graecae®(TLG ®) Latin for ‘Treasury of the GreekLanguage’3450 Berkeley PlaceUC Irvine • Special Research Project • Comprehensive digital library of Greek literature from antiquity to the present era • Preservation and access

  3. New Testament Aristotle Homer Aeschylus, Oresteia

  4. Audience Researchers in Classics, Byzantine Studies, Ancient History and Philosophy, Lexicography, Religious Studies, Linguists, etc.)

  5. Classics and Technology • Fragmentary texts (papyri, inscriptions) • Dating of materials • Reconstruction of antiquity (virtual)

  6. Classics Databases • Perseus Project (Tufts University) Texts, large collection of images, lexicographical tools • Packard Humanities Institute (Inscriptions, documentary papyri, Latin texts) • Database of Classical Bibliography (L'Année philologique ) • Classical Atlas (Ancient World Mapping Center)

  7. History of the Project • UCI (1972) Dr. Marianne McDonald (UCSD) Collection of texts in digital form Mirror image of printed critical editions • International collaboration

  8. The Ibycus system • David Packard (PHI) • Ibycus Computer • Beta code • Magnetic tapes (1976) • CD-ROM (1985)

  9. From Ibycus to…the modern era Stephanus Ibycus

  10. Current status of the collection • Homer to A.D. 400 (complete) • Byzantine period A.D. 4-15 (in progress) • Expansion to medieval and modern works to follow • 5-6 million new words added annually • Contents: • 3,800 authors • 15,000 works • 95-million words • 1.365 million distinct forms

  11. Use and advantages • Preservation • Access to rare texts and editions • Portability • Access from any place • Browsing (Full-text) • Ability to search the corpus for particular words or phrases • Research and pedagogy

  12. Creating a Digital Collection • Digitization • Data Management • Dissemination

  13. Dissemination: TLG CD ROMs • 1985 TLG A (27-million) • 1988 TLG C (42-million) • 1992 TLG D (57-million) • 2000 TLG E (76-million) 2001 Online TLG

  14. The Online TLG • TLG developed Search Engine • Quarterly updates • Bibliographies and Demo version open to the public • Full-Text Browsing and Searching • Search Full-Corpus or selection of Authors • Fonts (input and display Greek characters) Unicode Project (http://repositories.cdlib.org/tlg/unicode/)

  15. Distribution 58 countries TLG E (CD ROM) 1,100 institutions 1,500 individuals Online TLG 250 institutions 50,000 users 5 million hits in 2005

  16. Data Management: Collection maintainance

  17. Canon of Greek authors and works15,000 entries (including information such as dates, genre, origin, etc.)

  18. Digitization • Selection of text editions 2. Text markup (beta code) • Data entry • Correction in-house Importance of ‘Verification and Correction’ (…where Google has a long way to go…)

  19. Digitization • The Critical Edition • Homer, Odyssey 16.180-193

  20. 2. Text Markup

  21. 3. Data entry in beta code:

  22. 4.Text Correction

  23. 5. Converted Greek text

  24. Challenges • Dealing with a large corpus developed over a period of 30+ years Editorial choices and markup Corpus retrofitting Accuracy Non-Roman script • Conversion to standard encoding (Unicode--TEI/XML)

  25. Lexical Database Used for fast data retrieval Goal: Full corpus lemmatization • Morpheus (Perseus) • 1,365,000 unique forms (approx. 250,000 lemmata) • Morphological recognition for a highly-inflected language

More Related