Developing a digital library for the humanities
1 / 33

Developing a Digital Library for the Humanities - PowerPoint PPT Presentation

  • Uploaded on

Developing a Digital Library for the Humanities. Gregory Crane ([email protected]) Winnick Family Chair in Technology and Entrepreneurship Professor of Classics Director, Perseus Digital Library Project Http:// Perseus Digital Library.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Developing a Digital Library for the Humanities' - sherlock_clovis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Developing a digital library for the humanities l.jpg
Developing a Digital Library for the Humanities

  • Gregory Crane ([email protected])

  • Winnick Family Chair in Technology and EntrepreneurshipProfessor of ClassicsDirector, Perseus Digital Library ProjectHttp://

Perseus digital library l.jpg
Perseus Digital Library

  • On-going areas of Development

  • 1987: DL on Classical Greek Culture

  • 1993: History of Science

  • 1996: Began work on Latin and Rome

  • 1997: Early Modern English

  • 1999: History and Topography of London

  • 2000: Ancient Egyptian Giza

  • 2000: Slavery and the US Civil War

Partner institutions l.jpg
Partner Institutions

  • Max Planck Institute for the History of Science (Berlin)

  • Museum of Fine Arts, Boston

  • Stoa Publishing Consortium

  • New Variorum Shakespeare Series, Modern Language Association

  • Special Collections at Tufts, Brandeis, the University of Pennsylvania

On going support l.jpg
On-Going Support

  • National Endowment for the Humanities(DLI2, Preservation & Access, Education)

  • National Science Foundation (DLI2)

  • Fund for the Improvement of Postsecondary Education, Dept of Ed.

  • Max Planck Society

The whole greater than the sum l.jpg
The Whole greater than the sum

  • Tufts Health Sciences Database:

  • An on-line Medical School Curriculum

    • First iteration: 70% of the value

    • Second Iteration: 90%

    • Third Iteration: 130%

  • “Data” and “system” interact in increasingly dynamic ways.

Persistent value over time space l.jpg
Persistent value over time &space

  • How many ages hence Shall this our lofty scene be acted over,In states unborn and accents yet unknown?

    • Brutus in Julius Caesar

  • How do we structure data for

    • Contemporary users we can’t directly anticipate?

    • Systems not yet designed?

Radically new documents l.jpg
Radically New Documents

  • Reconstructions of Historical Spaces, e.g.

    • UVA’s Crystal Palace (London)

    • UCLA’s Rome and VR Lab

  • Integrating Virtual Spaces with Sources

    • Museum of Fine Arts, Tombs at Giza

    • Greek Sculpture

    • The Streets of 19th Century London

Traditional docs rethought l.jpg
Traditional Docs Rethought

  • Concordance: “Obsolete”

  • Bibliographies — databases

  • Encyclopedias — automatic linking

  • Lexica and lexicography —

    • Automatically discovered semantic rel-s

    • THEN lexicographic work

Development is two part l.jpg
Development is two part

  • Ultimate end: Radically new docs?

  • Short term: Electronic Incunabula

    • New Variorum Shakespeare

    • Electronic Marlowe

    • Tallis Street Maps

  • FIRST we thoroughly analyze what we have

  • THEN radical redesign emerges

Technology outruns practice l.jpg
Technology outruns Practice

  • The 3D Reconstruction/Virtual Space

    • Cutting edge technology

    • Still nascent scholarly practices

  • Mature Document Structures

    • Textual Notes: 1908 Richard 3

    • Traditional Text Citations: 1887 Commentary

The more things stay the same l.jpg
The More Things Stay the same...

  • “Content” can remain unchanged

  • “Presentation” is dynamic and flexible

    • The Dictionary knows what you are reading

    • Citations —> Bidirectional links

    • Automatic Linking by keyword

    • Text and Atlas: Plot sites in a document

Current paradigm dl dipomacy l.jpg
Current Paradigm: DL Dipomacy

  • Monolithic Systems (e.g., Perseus!)

    • One way to view each document

  • Intercommunication via metadata

    • DL as metadata for “opaque” objects

  • Major Problems

    • Renting access, rather than collecting content

    • All publications become ephemera

Three strategies l.jpg
Three Strategies

  • 1) The Editing Problem —

    • How do real authors create structured docs?

  • 2) Developing Radically New Docs —

    • Archimedes DL on Mechanics

    • MFA Excavations at Giza

  • 3) Radical Repurposing of Print

    • Bolles Collection on London

Bolles collection at tufts l.jpg
Bolles Collection at Tufts

  • documenting the history and topography of London and its environs

    • 35 "full-size” maps

    • 320 more specialized maps

    • 400 books (284 linear feet of shelf space)

    • 1,000 pamphlets.

    • “Paper Hypertexts”

      • 10,000+ “extra illustrations”

Bolles electronic archive l.jpg
Bolles Electronic Archive

  • A Testbed for the Perseus Digital Library

  • “Level 5” TEI Encoded Full Text

    • Quotes, languages, proper names, dates, money

  • High-end OCR and Double Keyboarding

    • OCR ideal for some but not all

    • Keyboarding much the best — money permitting

Bolles initial texts l.jpg
Bolles — Initial Texts

  • Five Million Words now in L5 TEI

    • Will exceed 10 million by year’s end

  • Surveys of London History and Topography

    • Stow, Maitland, Wilkinson, Allen, Thornbury

  • Commentary on social conditions

    • Mayhew, Archer, Hollingshead, Booth

  • Literary works with London as backdrop

    • Defoe, Dickens, “Sherlock Holmes”

Images l.jpg

  • 10,000 Grayscale Images

    • Mainly engravings of people and places

    • “opportunistic” metadata (=captions & context)

  • 2,400 Contemporary Images

    • Well catalogued and geo-referenced

  • QTVR Panoramas

  • 70 Tallis Map “Elevations”

Geospatial data l.jpg
Geospatial Data

  • Bartholomew 1:5000 Data set for London

    • Modern data as reference and interchange

  • Historical maps georeferenced to Barth. Data

    • 10 so far (c. 2 hours each)

    • Urban maps do not easily “line up”

    • How to create an historical GIS?

  • GPS Waypoints

    • As of May 2000, good to within 10m. or better

Feature extraction l.jpg
Feature Extraction

  • Easy identification: Dates, Money

  • Known Keywords and Classes

    • The Getty TGN (1 m. places and lon/lats)

    • The Bartholomew Gazzetteer (10,000)

    • Indices to Maps (e.g. Cruchley 1826, 4200)

    • The Index/Abstract of the DNB (30,000+)

  • Clean-up with rule based Proper Name classification: Mr NAME; NAME street

Runtime links l.jpg
“Runtime” Links

  • Runtime links supplement in file tagging

  • 1) Where metadata is less precise

    • Metadata from unedited headers and captions

  • 2) Where the source does not contain data

    • If no dates, then scan for them

  • Use tagging for “high confidence” data

    • Ideal situation: automated tags hand proofed

Strategic questions l.jpg
Strategic Questions

  • “Editions” a foundation for scholarship

  • Where does the editor’s job start?

  • How does editor’s job change?

  • How do we define “Corpus Editors”?

    • People with domain expertise in content

    • Expertise in software and Library systems

      • Need for scholarly automated processing

Delivering integrated data l.jpg
Delivering Integrated Data

  • “Good” and “rough” maps for Cic’s Letters

  • Coleman delivers quite useful results

  • Map locates Coleman Street.

  • Streets in description of "Portsoken Ward”.

  • Historical Views of this section of London

  • Timeline 1: A Linear History

  • Timeline 2: “Encyclopedic Scatter”

Further work l.jpg
Further Work

  • Disambig., auto-cataloguing, Time/Space

  • VR Interface: Tallis 1, 2 and Headset

  • New challenging document types

  • Geospatial Data in : Patterson's Journeys

  • Urban data in Booth and City Directories.

    • Tallis Map for Oxford Street with overall and more focused directories.

Research projects l.jpg
Research Projects

  • Robert Jacob and VR Interfaces

    • Figure: Tallis VR Conversion 1.

    • Figure: Tallis VR Conversion 2..

    • Figure: Head mounted VR navigation.

  • Holly Taylor and Cognitive Analysis

    • Spatial Cognition

    • Text Comprehension

Conclusions l.jpg

  • Baseline Knowledge Environment

    • Practical and useful

  • “Corpus Editions”

  • Midway between editions and library digitiz.

  • Requires a new config. of skills

  • The “Diplomatic” Federated DL model weak

    • Need access to full data for visualizations

Perseus document manager l.jpg
Perseus Document Manager

  • Works with XML

    • Multiple granularities: sentence, section, chapter

    • Deals with overlapping doc hierarchies

    • Combines internal and external metadata

    • Our metadata in RDF and can be XML

  • Since all data and metadata —> XML

    • Well suited to Federated DL Applications

Scalable dl l.jpg
Scalable DL

  • SGML/XML need translation for display

    • Can’t maintain stylesheets for millions of docs

  • Intelligent display of various DTDs

    • “Cheaply” acquires XML/SGML docs

    • Individual Custom Style sheets allowed

  • Integration of Geo-spatial Data

  • Multilingual support, feature extraction

  • Integrated multi-resolution image support

Perseus document manager28 l.jpg
Perseus Document Manager

  • Short term development:

    • Collecting new datasets to the Perseus DL

      • (leveraging Internet 2 investment)

    • Adding value: e.g.,

      • Sources for the History of Mechanics (Max Planck)

      • Duke Databank of Documentary Papyri

      • Books, maps etc. on the City of London

      • Shakespeare and Early modern English

Perseus document manager29 l.jpg
Perseus Document Manager

  • Longer Term: Distribution of the System

  • How best to maintain and expand the system?

    • Open source?

    • Commercial Licensing?

    • Wait for third party to match PDM features?

Automatic integration l.jpg
Automatic Integration

  • Content Analysis: Various Languages

  • Time: extracting and visualizing dates

  • Space: Integrating historical Geographic Data

  • Names: establishing authority lists

    • Getty Thesaurus of Geographic Names

      • Names and Coordinates

    • Encyclopedias: e.g., Harpers, DNB

      • Names and Dates

Our research agenda l.jpg
Our Research Agenda

  • Developing a self-sustaining models

    • Publication of documents

    • Maintenance of software

  • Exploring Problem Sets in different domains

    • E.g., sparse data (antiquity) vs. rich (London)

  • Helping humanists rethink their position

    • Reaching new audiences

    • Changing habits

Technology matters e g 19th c printing in england l.jpg
Technology matters: e.g.19th c. Printing in England

  • 20th Century Radio/Film/TV: ambiguous

  • 19th Century Print Technology

    • 1810: c. 10,000 copies for a successful book

      • Audience for literature mainly upper class

    • 1850: hundreds of thousands

      • Audience vastly expands

      • Huge numbers read Dickens, etc.

  • 21st Century Network Technology?

The future l.jpg
The Future?

  • Two models:

    • Reproduce current world in new form

      • Narrow/expensive distribution

    • Think about how that world may change

      • Broader/inexpensive distribution

  • What happens now sets the stage for …

    • “talk show” cyber culture? or

    • a new dispersal of intellectual life?