developing a digital library for the humanities l.
Skip this Video
Download Presentation
Developing a Digital Library for the Humanities

Loading in 2 Seconds...

play fullscreen
1 / 33

Developing a Digital Library for the Humanities - PowerPoint PPT Presentation

  • Uploaded on

Developing a Digital Library for the Humanities. Gregory Crane ( Winnick Family Chair in Technology and Entrepreneurship Professor of Classics Director, Perseus Digital Library Project Http:// Perseus Digital Library.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Developing a Digital Library for the Humanities' - sherlock_clovis

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
developing a digital library for the humanities
Developing a Digital Library for the Humanities
  • Gregory Crane (
  • Winnick Family Chair in Technology and EntrepreneurshipProfessor of ClassicsDirector, Perseus Digital Library ProjectHttp://
perseus digital library
Perseus Digital Library
  • On-going areas of Development
  • 1987: DL on Classical Greek Culture
  • 1993: History of Science
  • 1996: Began work on Latin and Rome
  • 1997: Early Modern English
  • 1999: History and Topography of London
  • 2000: Ancient Egyptian Giza
  • 2000: Slavery and the US Civil War
partner institutions
Partner Institutions
  • Max Planck Institute for the History of Science (Berlin)
  • Museum of Fine Arts, Boston
  • Stoa Publishing Consortium
  • New Variorum Shakespeare Series, Modern Language Association
  • Special Collections at Tufts, Brandeis, the University of Pennsylvania
on going support
On-Going Support
  • National Endowment for the Humanities(DLI2, Preservation & Access, Education)
  • National Science Foundation (DLI2)
  • Fund for the Improvement of Postsecondary Education, Dept of Ed.
  • Max Planck Society
the whole greater than the sum
The Whole greater than the sum
  • Tufts Health Sciences Database:
  • An on-line Medical School Curriculum
    • First iteration: 70% of the value
    • Second Iteration: 90%
    • Third Iteration: 130%
  • “Data” and “system” interact in increasingly dynamic ways.
persistent value over time space
Persistent value over time &space
  • How many ages hence Shall this our lofty scene be acted over,In states unborn and accents yet unknown?
    • Brutus in Julius Caesar
  • How do we structure data for
    • Contemporary users we can’t directly anticipate?
    • Systems not yet designed?
radically new documents
Radically New Documents
  • Reconstructions of Historical Spaces, e.g.
    • UVA’s Crystal Palace (London)
    • UCLA’s Rome and VR Lab
  • Integrating Virtual Spaces with Sources
    • Museum of Fine Arts, Tombs at Giza
    • Greek Sculpture
    • The Streets of 19th Century London
traditional docs rethought
Traditional Docs Rethought
  • Concordance: “Obsolete”
  • Bibliographies — databases
  • Encyclopedias — automatic linking
  • Lexica and lexicography —
    • Automatically discovered semantic rel-s
    • THEN lexicographic work
development is two part
Development is two part
  • Ultimate end: Radically new docs?
  • Short term: Electronic Incunabula
    • New Variorum Shakespeare
    • Electronic Marlowe
    • Tallis Street Maps
  • FIRST we thoroughly analyze what we have
  • THEN radical redesign emerges
technology outruns practice
Technology outruns Practice
  • The 3D Reconstruction/Virtual Space
    • Cutting edge technology
    • Still nascent scholarly practices
  • Mature Document Structures
    • Textual Notes: 1908 Richard 3
    • Traditional Text Citations: 1887 Commentary
the more things stay the same
The More Things Stay the same...
  • “Content” can remain unchanged
  • “Presentation” is dynamic and flexible
    • The Dictionary knows what you are reading
    • Citations —> Bidirectional links
    • Automatic Linking by keyword
    • Text and Atlas: Plot sites in a document
current paradigm dl dipomacy
Current Paradigm: DL Dipomacy
  • Monolithic Systems (e.g., Perseus!)
    • One way to view each document
  • Intercommunication via metadata
    • DL as metadata for “opaque” objects
  • Major Problems
    • Renting access, rather than collecting content
    • All publications become ephemera
three strategies
Three Strategies
  • 1) The Editing Problem —
    • How do real authors create structured docs?
  • 2) Developing Radically New Docs —
    • Archimedes DL on Mechanics
    • MFA Excavations at Giza
  • 3) Radical Repurposing of Print
    • Bolles Collection on London
bolles collection at tufts
Bolles Collection at Tufts
  • documenting the history and topography of London and its environs
    • 35 "full-size” maps
    • 320 more specialized maps
    • 400 books (284 linear feet of shelf space)
    • 1,000 pamphlets.
    • “Paper Hypertexts”
      • 10,000+ “extra illustrations”
bolles electronic archive
Bolles Electronic Archive
  • A Testbed for the Perseus Digital Library
  • “Level 5” TEI Encoded Full Text
    • Quotes, languages, proper names, dates, money
  • High-end OCR and Double Keyboarding
    • OCR ideal for some but not all
    • Keyboarding much the best — money permitting
bolles initial texts
Bolles — Initial Texts
  • Five Million Words now in L5 TEI
    • Will exceed 10 million by year’s end
  • Surveys of London History and Topography
    • Stow, Maitland, Wilkinson, Allen, Thornbury
  • Commentary on social conditions
    • Mayhew, Archer, Hollingshead, Booth
  • Literary works with London as backdrop
    • Defoe, Dickens, “Sherlock Holmes”
  • 10,000 Grayscale Images
    • Mainly engravings of people and places
    • “opportunistic” metadata (=captions & context)
  • 2,400 Contemporary Images
    • Well catalogued and geo-referenced
  • QTVR Panoramas
  • 70 Tallis Map “Elevations”
geospatial data
Geospatial Data
  • Bartholomew 1:5000 Data set for London
    • Modern data as reference and interchange
  • Historical maps georeferenced to Barth. Data
    • 10 so far (c. 2 hours each)
    • Urban maps do not easily “line up”
    • How to create an historical GIS?
  • GPS Waypoints
    • As of May 2000, good to within 10m. or better
feature extraction
Feature Extraction
  • Easy identification: Dates, Money
  • Known Keywords and Classes
    • The Getty TGN (1 m. places and lon/lats)
    • The Bartholomew Gazzetteer (10,000)
    • Indices to Maps (e.g. Cruchley 1826, 4200)
    • The Index/Abstract of the DNB (30,000+)
  • Clean-up with rule based Proper Name classification: Mr NAME; NAME street
runtime links
“Runtime” Links
  • Runtime links supplement in file tagging
  • 1) Where metadata is less precise
    • Metadata from unedited headers and captions
  • 2) Where the source does not contain data
    • If no dates, then scan for them
  • Use tagging for “high confidence” data
    • Ideal situation: automated tags hand proofed
strategic questions
Strategic Questions
  • “Editions” a foundation for scholarship
  • Where does the editor’s job start?
  • How does editor’s job change?
  • How do we define “Corpus Editors”?
    • People with domain expertise in content
    • Expertise in software and Library systems
      • Need for scholarly automated processing
delivering integrated data
Delivering Integrated Data
  • “Good” and “rough” maps for Cic’s Letters
  • Coleman delivers quite useful results
  • Map locates Coleman Street.
  • Streets in description of "Portsoken Ward”.
  • Historical Views of this section of London
  • Timeline 1: A Linear History
  • Timeline 2: “Encyclopedic Scatter”
further work
Further Work
  • Disambig., auto-cataloguing, Time/Space
  • VR Interface: Tallis 1, 2 and Headset
  • New challenging document types
  • Geospatial Data in : Patterson's Journeys
  • Urban data in Booth and City Directories.
    • Tallis Map for Oxford Street with overall and more focused directories.
research projects
Research Projects
  • Robert Jacob and VR Interfaces
    • Figure: Tallis VR Conversion 1.
    • Figure: Tallis VR Conversion 2..
    • Figure: Head mounted VR navigation.
  • Holly Taylor and Cognitive Analysis
    • Spatial Cognition
    • Text Comprehension
  • Baseline Knowledge Environment
    • Practical and useful
  • “Corpus Editions”
  • Midway between editions and library digitiz.
  • Requires a new config. of skills
  • The “Diplomatic” Federated DL model weak
    • Need access to full data for visualizations
perseus document manager
Perseus Document Manager
  • Works with XML
    • Multiple granularities: sentence, section, chapter
    • Deals with overlapping doc hierarchies
    • Combines internal and external metadata
    • Our metadata in RDF and can be XML
  • Since all data and metadata —> XML
    • Well suited to Federated DL Applications
scalable dl
Scalable DL
  • SGML/XML need translation for display
    • Can’t maintain stylesheets for millions of docs
  • Intelligent display of various DTDs
    • “Cheaply” acquires XML/SGML docs
    • Individual Custom Style sheets allowed
  • Integration of Geo-spatial Data
  • Multilingual support, feature extraction
  • Integrated multi-resolution image support
perseus document manager28
Perseus Document Manager
  • Short term development:
    • Collecting new datasets to the Perseus DL
      • (leveraging Internet 2 investment)
    • Adding value: e.g.,
      • Sources for the History of Mechanics (Max Planck)
      • Duke Databank of Documentary Papyri
      • Books, maps etc. on the City of London
      • Shakespeare and Early modern English
perseus document manager29
Perseus Document Manager
  • Longer Term: Distribution of the System
  • How best to maintain and expand the system?
    • Open source?
    • Commercial Licensing?
    • Wait for third party to match PDM features?
automatic integration
Automatic Integration
  • Content Analysis: Various Languages
  • Time: extracting and visualizing dates
  • Space: Integrating historical Geographic Data
  • Names: establishing authority lists
    • Getty Thesaurus of Geographic Names
      • Names and Coordinates
    • Encyclopedias: e.g., Harpers, DNB
      • Names and Dates
our research agenda
Our Research Agenda
  • Developing a self-sustaining models
    • Publication of documents
    • Maintenance of software
  • Exploring Problem Sets in different domains
    • E.g., sparse data (antiquity) vs. rich (London)
  • Helping humanists rethink their position
    • Reaching new audiences
    • Changing habits
technology matters e g 19th c printing in england
Technology matters: e.g.19th c. Printing in England
  • 20th Century Radio/Film/TV: ambiguous
  • 19th Century Print Technology
    • 1810: c. 10,000 copies for a successful book
      • Audience for literature mainly upper class
    • 1850: hundreds of thousands
      • Audience vastly expands
      • Huge numbers read Dickens, etc.
  • 21st Century Network Technology?
the future
The Future?
  • Two models:
    • Reproduce current world in new form
      • Narrow/expensive distribution
    • Think about how that world may change
      • Broader/inexpensive distribution
  • What happens now sets the stage for …
    • “talk show” cyber culture? or
    • a new dispersal of intellectual life?