1 / 53

Introduction to Digital Libraries

Introduction to Digital Libraries. Thanks to Michael L. Nelson Robert Allen. The Ascent of Homo Nettus -. DLs vs archives vs repositories. All deal with collections, objects, processes DLs change over time Archives are never touched Unpublished works - grey literature

etracy
Download Presentation

Introduction to Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Digital Libraries Thanks to Michael L. Nelson Robert Allen

  2. The Ascent of Homo Nettus -

  3. DLs vs archives vs repositories All deal with collections, objects, processes • DLs change over time • Archives are never touched • Unpublished works - grey literature • Repository - more general term • A mechanism for storing any information about the definition of a system at any point in its life-cycle. Repository services would typically be provided for extensibility, recovery, integrity, naming standards, and a wide variety of other management functions. • A repository is a network-accessible storage system in which digital objects may be stored for possible subsequent access or retrieval.

  4. SOAP Model for collection management • Selection • Organization • Access • Persistence

  5. What is a Library? Main Entry: li·brary Pronunciation: 'lI-"brer-E; British usually and US sometimes -br&r-E; US sometimes -brE, ÷-"ber-E Function: noun Inflected Form(s): plural -brar·ies Etymology: Middle English, from Medieval Latin librarium, from Latin, neuter of librarius of books, from libr-, liber inner bark, rind, book 1 a :a place in which literary, musical, artistic, or reference materials (as books, manuscripts, recordings, or films) are kept for use but not for sale b:a collection of such materials 2 a :a collection resembling or suggesting a library <a library of computer programs> <wine library > b:MORGUE 2 3 a :a series of related books issued by a publisher b:a collection of publications on the same subject 4:a collection of sequences of DNA and especially recombinant DNA that a re maintained in a suitable cellular environment and that represent the genetic material of a particular organism or tissue http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=library&x=0&y=0

  6. A Tool For Communicating With The Future… SCROLLS FROM THE DEAD SEA The Ancient Library of Qumran and Modern Scholarship http://www.ibiblio.org/expo/deadsea.scrolls.exhibit/intro.html

  7. A History of Libraries in 1 Slide • Lyceum - Ancient Greece • http://en.wikipedia.org/wiki/Lyceum • Alexandria - Ancient Egypt • http://en.wikipedia.org/wiki/Library_of_Alexandria • (…skipping a bit…) • Boston Public Library - First US public lending library (1848) • http://en.wikipedia.org/wiki/Boston_Public_Library • http://www.bpl.org/ • “The commonwealth requires the education of the people as the safeguard of order and liberty” more info: http://www.dlib.org/dlib/january00/01levy.html

  8. “Lone Scientist” Stereotypes Max Munk http://history.nasa.gov/SP-4103/ch4.htm H. J. E. Reid http://history.nasa.gov/SP-4103/ch4.htm Enrico Fermi http://www.anl.gov/Media_Center/logos20-1/fermi01.htm John Stack http://www.hq.nasa.gov/office/pao/History/x1/stack.html Albert Einstein http://www.artnet.com/artist/92724/Vishniac_Roman.htm

  9. Vannevar Bush (1890-1974) • Director of the Office of Scientific Research and Development • lead 6000 scientists in R&D for WWII • previously, science lacked large scale teams • also director of NACA (1939)! • Predicted many technological advances • the “memex” is one whose spirit we are implementing • the purpose was to provide scientists the capability to exchange information; to have access to the totality of recorded information image from: http://www.ibiblio.org/pioneers/bush.html

  10. Memex • Integrated computer, keyboard, and desk • “mechanized private file and library” • remove drudgery from information retrieval • suggested implementation was microfilm • various user operations are suggested • Associative indexing was the main purpose • “the process of tying two items together is the important thing” • prelude to hypertext... Image from: http://www.dynamicdiagrams.com/case_studies/mit_memex.html

  11. Memex • Information could come pre-associatively indexed, but the key point was user customization • WWW still does not provide that today • Bush observes that tools change our way of doing, and expand the horizons before us • full impact of WWW and DLs still not known • Interesting: Bush’s AM article did not predict free-text searching... • knowledge trails only; DMOZ w/o keyword searching

  12. digital preservation research www.digitalpreservation.gov “digital information lasts forever -- or 5 years, whichever comes first” -- Jeff Rothenberg from Lesk, http://www.lesk.com/mlesk/

  13. from Lesk, http://www.lesk.com/mlesk/

  14. What is a Digital Library (DL)? • “…a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network” (Arms) • there are any number of alternate definitions, but this seems fair enough • no mention of architecture, implementation, content, etc.

  15. How is a DL different from a database? • A traditional SQL database has as its basic element data items in a relation: • select name • from employee, project • where employee.deptnumber = “25” AND • project.number = “100” • databases exploit known structures and relations • DBMS retrieval is not probabilistic (Frakes, Baeza-Yates)

  16. How is a DL different from the WWW? • The keyword is managed • The WWW is not managed • Some meta searchers (Yahoo, Lycos, DMOZ) attempt to add an organizational framework to their web holdings • However, most are focused on keyword searching (i.e., Google)

  17. A Garden vs. Desert Scrub http://www.gingerbread-mansion.com/ourgarden.html http://www.filmdeserts.com/Open%20Desert%20Areas_1.htm

  18. How is a DL different from the WWW? • Another key difference is who controls the input into the system • most web searchers hunt down their holdings • Lycos is short for Lycosidae lycosa (the “wolf spider”), which pursues its prey and does not build a web (Mauldin, IEEE Expert, 1/97) • some (DMOZ, Yahoo) have humans in the loop for review and classification • DLs are generally more tightly controlled, and have a targeted customer set

  19. DL = Content + Services • “Why not just use the WWW” ? • WWW by itself has low archival & management characteristics • “Why not use a RDBMS?” • In the same way that a card catalog is not a TL, a RDBMS is a candidate technology for use in DLs • DL is the union of the content and services defined on the content

  20. The Study of Digital Libraries is Multidisciplinary • computer science • tools, protocols, transport, indexing, ontology, dbs • information science • information access and storage, services • human factors • usability, adaptability • law • rights management, open access • economics • business models, services

  21. How is a DL Different from a Traditional Library? • TL has as its focus physical objects • even if the card catalog (metadata) is electronic, the purpose is to point you to a physical location • trafficking in physical objects has both obvious and subtle implications • object can exist only in 1 place • if you have it, I can’t have it (zero-sum distribution) • I have to go to the object, or wait for it to come to me

  22. TLs vs. DLs • DLs clearly better than TLs at: • Dissemination, storing information variety • TL objects are more survivable • Who will archive the research information? • the publishers? • the institutions? • the authors? • Will the average DL object still be accessible in 10 years? • Internet archive image from: http://www.ancientegypt.co.uk/writing/rosetta.html

  23. How is a DL Different from a Traditional Library? • Digital Library • removing the physical restriction has obvious benefits • multiple access, multiple listings, electronic transmission • also complicates many other issues... • intellectual property, terms and conditions, etc. • Note that a TL offers additional social and educational benefits • Most TLs also offer hybrid services too.

  24. from Lesk, http://www.lesk.com/mlesk/

  25. TLs vs. DLs • Where does publishing stop, and libraries begin? • there has always been tensions between TLs and traditional publishers, but the roles were fairly well defined • DLs can muddle the separation of these responsibilities • result: conflict, and/or new models

  26. Traditional Players book store publisher service library archive responsibility over time

  27. What is Scientific and Technical Information (STI)? • STI is the collection of materials, independent of format,used in research, development, and other technical activities • Papers, reports, data sets, images, videos, software, etc. • It is also the output of such R&D activities • STI includes both white and grey literature

  28. White and Grey Literature • The line between the two is not always clear • Grey Net offers an admittedly obsolete definition of grey literature: • “Information produced on all levels of government, academics, business and industry in electronic and print formats not controlled by commercial publishing" • http://www.greynet.org/ • CiteSeer indexes the grey literature and counts citations

  29. White and Grey Literature • Intuitively : • White: author and publisher are often different, the work has been independently reviewed, how to obtain the work is straightforward • Grey: may not be reviewed, often “published” from the source origin, may be difficult to obtain

  30. Literature Examples • White • Journals, books, edited conference proceedings, etc. • Grey • technical reports, government reports, unedited proceedings, non-document STI, etc. • others?

  31. So Why Worry About Grey Literature? • White is generally perceived as having a higher pedigree, easier to obtain (in a sense), etc. • it is generally less timely • and is often a summary or abstract of a larger body of work Pyramid of STI

  32. History of STI Distribution • Originally, scientists published books to document their findings • but the delay was terribly long • Then, scientists exchanged personal letters among themselves for rapidity • but this is point-to-point communication, not broadcast

  33. History of STI Distribution • The current system of journals evolved in the 17th century as the synthesis of both previous models • more timely than books, more available than letters • in fact, some journals with the emphasis on “speed” still have “Letters” in their title • historical information from (Odlyzko, 1995)

  34. But Are Journals Still Relevant? • People still publish in them (tenure and promotions are still largely “count the journal publications” exercises) • But do people read them? • The current use of journals is now: • “a medium for priority claiming, quality control, and archiving scientific work” (Bennion, 1994)

  35. Unavailable, or Not Worth Citing? M Lesk

  36. But Are Journals Still Relevant? • How important is refereeing anyway? • Most rejected papers end up published somewhere else (Lesk) • Referees have rejected many worthy papers, including some that are the most cited in their respective journals (Campanario, 1996)

  37. But Are Journals Still Relevant? • Different disciplines have adapted: • physics - “the small amount of filtering provided by refereed journals plays no effective role in our research” (Ginsparg, 1994) • math - “it is rare for experts in any mathematical subject to learn of a major new development in their area through a journal publication” (Odlyzko, 1995)

  38. But Are Journals Still Relevant? • computer science - • “in his area, journals have become irrelevant” (Odlyzko, quoting Rob Pike) • “if it did not happen at a conference, it didn’t happen” (Odlyzko, quoting Joan Feigenbaum) • “if I read it in a journal, I’m not in the loop” (Grycz, 1992)

  39. Solutions by Discipline • Physics • pre-prints • arxiv • Mathematics • pre-prints • Computer Science • technical reports, conference proceedings • citeseer • Chemistry • still mainly journals, but review is cursory (Quinn, 1995) • Economics • working papers

  40. Journal System - Economic Problems • 20,000 primary research journals (Bennion, 1994) • the number of scientific papers published annually doubles every 10-15 years (Price, 1956) • STI does not enjoy economies of scale • intended audiences are generally static; the content becomes more specialized (Odlyzko, 1995)

  41. Journal System - Economic Problems • Because of the academic pressures, journals tend to stay the same size, but the number of titles goes up (Quandt, 1996) • The acquisition budget of a library is constant (or decreasing), so it must be more selective in which titles it provides • If libraries cancel subscriptions, the cost to the remainder of the subscribers goes up

  42. Journal System - Economic Problems • The rising cost causes other libraries to cancel subscriptions, causing the price to go up further... • Journals driving themselves out of business is a well studied problem - contact me for more information • Odlyzko estimates that: American universities spend as much buying mathematics journals as the NSF spends doing mathematical research

  43. DL Economic Drivers Google Scholar? M. Lesk

  44. original data from the ALA; slide from http://lib-www.lanl.gov/~herbertv/presentations/vala-2004-hvds.pdf

  45. Journal System - Economic Problems • Chemical Abstracts (Lesk) • begun in 1950s, used to cost “dozens of dollars per year, and invidual chemists subscribed” • today, it costs $17,400 / year. • Okerson & Stubbs, 1992 • university book purchases down 15% 1986-1991 • journals/faculty 14 -> 12 in same period • by year 2017, libraries would buy nothing at all!

  46. from Lesk, http://community.bellcore.com/lesk/columbia/session1/ figure 9.2 in text

  47. Journal System - Coverage Problems • But journals only cover a fraction of available STI • approximately 100K domestic, unrestricted STI technical reports (grey literature) produced annually (Esler & Nelson, 1998) • Print journals, by definition, cannot provide access to non-report STI • software, datasets, etc.

  48. Electronic Journals? • An experiment that most scholars agree is good, is the eventual path, and is a great idea for everyone else’s papers... • until tenure is given based on publications in electronic journals, they will not be fully accepted

  49. Many DL Projects Are “Journal Centric” • Many DL projects (JSTOR, TULIP, etc.) are focused on automating the traditional journal methods • this is acceptable for archiving past issues, but seems unsatisfying for future STI

  50. Prediction for Journals M. Nelson • Highly specialized titles will go completely electronic, driven by the rising cost and static readership • economics and academic acceptance will determine when this happens • “Popular” titles with broader appeal will exist in a hybrid format, both paper and electronic version • “subscribers” are likely to receive the value added material (soft copy, additional materials, etc.)

More Related