1 / 33

Developing a Digital Library for the Humanities

Developing a Digital Library for the Humanities. Gregory Crane (gcrane@tufts.edu) Winnick Family Chair in Technology and Entrepreneurship Professor of Classics Director, Perseus Digital Library Project Http://www.perseus.tufts.edu/About/grc.html. Perseus Digital Library.

tale
Download Presentation

Developing a Digital Library for the Humanities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing a Digital Library for the Humanities • Gregory Crane (gcrane@tufts.edu) • Winnick Family Chair in Technology and EntrepreneurshipProfessor of ClassicsDirector, Perseus Digital Library ProjectHttp://www.perseus.tufts.edu/About/grc.html

  2. Perseus Digital Library • On-going areas of Development • 1987: DL on Classical Greek Culture • 1993: History of Science • 1996: Began work on Latin and Rome • 1997: Early Modern English • 1999: History and Topography of London • 2000: Ancient Egyptian Giza • 2000: Slavery and the US Civil War

  3. Partner Institutions • Max Planck Institute for the History of Science (Berlin) • Museum of Fine Arts, Boston • Stoa Publishing Consortium • New Variorum Shakespeare Series, Modern Language Association • Special Collections at Tufts, Brandeis, the University of Pennsylvania

  4. On-Going Support • National Endowment for the Humanities(DLI2, Preservation & Access, Education) • National Science Foundation (DLI2) • Fund for the Improvement of Postsecondary Education, Dept of Ed. • Max Planck Society

  5. The Whole greater than the sum • Tufts Health Sciences Database: • An on-line Medical School Curriculum • First iteration: 70% of the value • Second Iteration: 90% • Third Iteration: 130% • “Data” and “system” interact in increasingly dynamic ways.

  6. Persistent value over time &space • How many ages hence Shall this our lofty scene be acted over,In states unborn and accents yet unknown? • Brutus in Julius Caesar • How do we structure data for • Contemporary users we can’t directly anticipate? • Systems not yet designed?

  7. Radically New Documents • Reconstructions of Historical Spaces, e.g. • UVA’s Crystal Palace (London) • UCLA’s Rome and VR Lab • Integrating Virtual Spaces with Sources • Museum of Fine Arts, Tombs at Giza • Greek Sculpture • The Streets of 19th Century London

  8. Traditional Docs Rethought • Concordance: “Obsolete” • Bibliographies — databases • Encyclopedias — automatic linking • Lexica and lexicography — • Automatically discovered semantic rel-s • THEN lexicographic work

  9. Development is two part • Ultimate end: Radically new docs? • Short term: Electronic Incunabula • New Variorum Shakespeare • Electronic Marlowe • Tallis Street Maps • FIRST we thoroughly analyze what we have • THEN radical redesign emerges

  10. Technology outruns Practice • The 3D Reconstruction/Virtual Space • Cutting edge technology • Still nascent scholarly practices • Mature Document Structures • Textual Notes: 1908 Richard 3 • Traditional Text Citations: 1887 Commentary

  11. The More Things Stay the same... • “Content” can remain unchanged • “Presentation” is dynamic and flexible • The Dictionary knows what you are reading • Citations —> Bidirectional links • Automatic Linking by keyword • Text and Atlas: Plot sites in a document

  12. Current Paradigm: DL Dipomacy • Monolithic Systems (e.g., Perseus!) • One way to view each document • Intercommunication via metadata • DL as metadata for “opaque” objects • Major Problems • Renting access, rather than collecting content • All publications become ephemera

  13. Three Strategies • 1) The Editing Problem — • How do real authors create structured docs? • 2) Developing Radically New Docs — • Archimedes DL on Mechanics • MFA Excavations at Giza • 3) Radical Repurposing of Print • Bolles Collection on London

  14. Bolles Collection at Tufts • documenting the history and topography of London and its environs • 35 "full-size” maps • 320 more specialized maps • 400 books (284 linear feet of shelf space) • 1,000 pamphlets. • “Paper Hypertexts” • 10,000+ “extra illustrations”

  15. Bolles Electronic Archive • A Testbed for the Perseus Digital Library • “Level 5” TEI Encoded Full Text • Quotes, languages, proper names, dates, money • High-end OCR and Double Keyboarding • OCR ideal for some but not all • Keyboarding much the best — money permitting

  16. Bolles — Initial Texts • Five Million Words now in L5 TEI • Will exceed 10 million by year’s end • Surveys of London History and Topography • Stow, Maitland, Wilkinson, Allen, Thornbury • Commentary on social conditions • Mayhew, Archer, Hollingshead, Booth • Literary works with London as backdrop • Defoe, Dickens, “Sherlock Holmes”

  17. Images • 10,000 Grayscale Images • Mainly engravings of people and places • “opportunistic” metadata (=captions & context) • 2,400 Contemporary Images • Well catalogued and geo-referenced • QTVR Panoramas • 70 Tallis Map “Elevations”

  18. Geospatial Data • Bartholomew 1:5000 Data set for London • Modern data as reference and interchange • Historical maps georeferenced to Barth. Data • 10 so far (c. 2 hours each) • Urban maps do not easily “line up” • How to create an historical GIS? • GPS Waypoints • As of May 2000, good to within 10m. or better

  19. Feature Extraction • Easy identification: Dates, Money • Known Keywords and Classes • The Getty TGN (1 m. places and lon/lats) • The Bartholomew Gazzetteer (10,000) • Indices to Maps (e.g. Cruchley 1826, 4200) • The Index/Abstract of the DNB (30,000+) • Clean-up with rule based Proper Name classification: Mr NAME; NAME street

  20. “Runtime” Links • Runtime links supplement in file tagging • 1) Where metadata is less precise • Metadata from unedited headers and captions • 2) Where the source does not contain data • If no dates, then scan for them • Use tagging for “high confidence” data • Ideal situation: automated tags hand proofed

  21. Strategic Questions • “Editions” a foundation for scholarship • Where does the editor’s job start? • How does editor’s job change? • How do we define “Corpus Editors”? • People with domain expertise in content • Expertise in software and Library systems • Need for scholarly automated processing

  22. Delivering Integrated Data • “Good” and “rough” maps for Cic’s Letters • Coleman delivers quite useful results • Map locates Coleman Street. • Streets in description of "Portsoken Ward”. • Historical Views of this section of London • Timeline 1: A Linear History • Timeline 2: “Encyclopedic Scatter”

  23. Further Work • Disambig., auto-cataloguing, Time/Space • VR Interface: Tallis 1, 2 and Headset • New challenging document types • Geospatial Data in : Patterson's Journeys • Urban data in Booth and City Directories. • Tallis Map for Oxford Street with overall and more focused directories.

  24. Research Projects • Robert Jacob and VR Interfaces • Figure: Tallis VR Conversion 1. • Figure: Tallis VR Conversion 2.. • Figure: Head mounted VR navigation. • Holly Taylor and Cognitive Analysis • Spatial Cognition • Text Comprehension

  25. Conclusions • Baseline Knowledge Environment • Practical and useful • “Corpus Editions” • Midway between editions and library digitiz. • Requires a new config. of skills • The “Diplomatic” Federated DL model weak • Need access to full data for visualizations

  26. Perseus Document Manager • Works with XML • Multiple granularities: sentence, section, chapter • Deals with overlapping doc hierarchies • Combines internal and external metadata • Our metadata in RDF and can be XML • Since all data and metadata —> XML • Well suited to Federated DL Applications

  27. Scalable DL • SGML/XML need translation for display • Can’t maintain stylesheets for millions of docs • Intelligent display of various DTDs • “Cheaply” acquires XML/SGML docs • Individual Custom Style sheets allowed • Integration of Geo-spatial Data • Multilingual support, feature extraction • Integrated multi-resolution image support

  28. Perseus Document Manager • Short term development: • Collecting new datasets to the Perseus DL • (leveraging Internet 2 investment) • Adding value: e.g., • Sources for the History of Mechanics (Max Planck) • Duke Databank of Documentary Papyri • Books, maps etc. on the City of London • Shakespeare and Early modern English

  29. Perseus Document Manager • Longer Term: Distribution of the System • How best to maintain and expand the system? • Open source? • Commercial Licensing? • Wait for third party to match PDM features?

  30. Automatic Integration • Content Analysis: Various Languages • Time: extracting and visualizing dates • Space: Integrating historical Geographic Data • Names: establishing authority lists • Getty Thesaurus of Geographic Names • Names and Coordinates • Encyclopedias: e.g., Harpers, DNB • Names and Dates

  31. Our Research Agenda • Developing a self-sustaining models • Publication of documents • Maintenance of software • Exploring Problem Sets in different domains • E.g., sparse data (antiquity) vs. rich (London) • Helping humanists rethink their position • Reaching new audiences • Changing habits

  32. Technology matters: e.g.19th c. Printing in England • 20th Century Radio/Film/TV: ambiguous • 19th Century Print Technology • 1810: c. 10,000 copies for a successful book • Audience for literature mainly upper class • 1850: hundreds of thousands • Audience vastly expands • Huge numbers read Dickens, etc. • 21st Century Network Technology?

  33. The Future? • Two models: • Reproduce current world in new form • Narrow/expensive distribution • Think about how that world may change • Broader/inexpensive distribution • What happens now sets the stage for … • “talk show” cyber culture? or • a new dispersal of intellectual life?

More Related