1 / 25

Istituto di Linguistica Computazionale – Pisa Andrea Bozzi

Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents. Istituto di Linguistica Computazionale – Pisa Andrea Bozzi. NEH/CNR Meeting Washington DC October 5, 2007. Presentation contents.

lazaro
Download Presentation

Istituto di Linguistica Computazionale – Pisa Andrea Bozzi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Special applications for Digital Libraries:computer-aided philological and linguistic analysis of digital documents Istituto di Linguistica Computazionale – Pisa Andrea Bozzi NEH/CNR Meeting Washington DC October 5, 2007

  2. Presentation contents • An EU supported system for Greek papyrology • A special application for browsing and searching demotic documents on ostraka; • A philological workstation for digital medieval manuscripts; • CHLT-LEMLAT (EC-NSF project)to perform lemmatization of Latin texts; • How to integrate all these modules in a web-based open source application.

  3. Presentation contents • An EU supported system for Greek papyrology • A special application for browsing and searching demotic documents on ostraka; • A philological workstation for digital medieval manuscripts; • CHLT-LEMLAT (EC-NSF project)to perform lemmatization of Latin texts; • How to integrate all these modules in a web-based open source application.

  4. The philological workstation: image and text transcription

  5. Image segmentation and semi-automatic word linking

  6. Annotations and critical apparatus

  7. Wordforms list and specific indexes

  8. The web philological workstation to manage documents of the Istituto Papirologico Vitelli in Florence (restricted use)

  9. Presentation contents • An EU supported system for Greek papyrology • A special application for browsing and searching demotic documents on ostraka; • A philological workstation for digital medieval manuscripts; • CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts; • How to integrate all these modules in a web-based open source application. Andrea Bozziandrea.bozzi@ilc.cnr.it NEH/CNR Meeting, Washington October 5, 2007

  10. Special system for teaching and retrieving linguistic information from demotic texts on ostraka OMM 1381: E. Bresciani, S. Pernigotti, M.C. Betrò, Ostraka demotici da Narmuti, Pisa, 1983, pp. 16-18; OMM 300: Gallo P., Ostraca demotici e ieratici dall’archivio bilingue di Narmouthis, Pisa, 1997, pp. 113-114; OMM 393: R. Pintaudi, P.J. Sijpesteijn, Ostraka greci da Narmuthis, Pisa, 1993, p. 40.

  11. L’archivio delle immagini digitali e la tabella dei segni demotici

  12. Research results: see the blue parts (arrow) where the selected symbol has been found

  13. Presentation contents • An EU supported system for Greek papyrology • A special application for browsing and searching demotic documents on ostraka; • A philological workstation for digital medieval manuscripts; • CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts; • How to integrate all these modules in a web-based open source application. Andrea Bozziandrea.bozzi@ilc.cnr.it NEH/CNR Meeting, Washington October 5, 2007

  14. Textual criticism for medieval manuscripts Link to the list of collated sources

  15. Evaluation of the variant reading in the collated source Selection of the variant eixens

  16. Recording of the variant Eixens in the Critical apparatus

  17. Variants search in different ancient printed editions of the same work Link to the list of collated books

  18. Image of thecorresponding page

  19. Presentation contents • An EU supported system for Greek papyrology • A special application for browsing and searching demotic documents on ostraka; • A philological workstation for digital medieval manuscripts; • CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts; • How to integrate all these modules in a web-based open source application. Andrea Bozziandrea.bozzi@ilc.cnr.it NEH/CNR Meeting, Washington October 5, 2007

  20. Lemmatization results(C. Sallustius Crispus, De coniuratione Catilinae, 1-2)

  21. Lemmatization results of selected wordforms

  22. Presentation contents • An EU supported system for Greek papyrology • A special application for browsing and searching demotic documents on ostraka; • A philological workstation for digital medieval manuscripts; • CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts; • How to integrate all these modules in a web-based open source application. Andrea Bozziandrea.bozzi@ilc.cnr.it NEH/CNR Meeting, Washington DC October 5, 2007

  23. Pinakes 3.0http://pinakes.imss.fi.it • Aim: web-based open source application to manage cultural heritage historical data in digital format. • Partners: • Fondazione Rinascimento Digitale, Florence; • Istituto e Museo della Storia della Scienza, Florence; • Ministero per i Beni Culturali, Rome • CNR, Istituto di Linguistica Computazionale, Pisa

  24. Technology • Programming language: JAVA (Jdk1.5) • Servlet Engine: Tomcat 5.5.x + Apache HTTP Connectors. • Web server: Apache httpd server 2.2.x. • Web Applications Framework: Jakarta Struts • Web Service Framework: Apache Axis 1.4 • Database Engine: Postgres 8.1 • Programming environment: NetBeans 5.5.1. • Final development: Hibernate 3.2.5.

  25. Standards • DCMI (Dublin Core Metadata Initiative) • TEI (Text Encoding Initiative) • OWL (Ontology Web Language) • RDF-XML (Resource Description Framework) • SPARQL (Query Language fo RDF) • UTF8 (Unicode Transformation Format).

More Related