1 / 39

e-Science, Information Infrastructure, and Digital Libraries

e-Science, Information Infrastructure, and Digital Libraries. Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05). Outline of talk.

holli
Download Presentation

e-Science, Information Infrastructure, and Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. e-Science, Information Infrastructure, and Digital Libraries Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05)

  2. Outline of talk • What are we building, for whom, and why? • e-Science / e-Social Science / Cyberinfrastructure • Digital libraries • Case studies • Alexandria Digital Earth Prototype (ADEPT) • Center for Embedded Networked Sensing (CENS) • Summary of issues

  3. Goals of Cyberinfrastructure • Cyberinfrastructure (U.S. initiative)* • “technology … now make(s) possible a comprehensive “cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.” • “NSF must ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad, long-term access by scientists everywhere.” *Atkins, D., Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon panel on Cyberinfrastructure. 2003, January. Executive summary. http://www.communitytechnology.org/nsf_ci_report/

  4. Goals of e-Science • e-Science (U.K. initiative) • “e-Science: large scale science … carried out through distributed global collaborations enabled by the Internet.” * • “collaborative scientific enterprises … require access to very large data collections, very large scale computing resources and high performance visualisation” * • “e-Science data generated from sensors, satellites, high-performance computer simulations, high-throughput devices, scientific images … will soon dwarf all of the scientific data collected in the whole history of scientific exploration.””** *About the UK e-Science Programme. http://www.rcuk.ac.uk/escience/ **Hey & Trefethen, 2003, The data deluge: an E-science perspective.

  5. Digital Libraries • Systems that support searching, use, creation of content • Institutions with people, digital collections, and services • Repositories of digital data and documents, as a component of cyberinfrastructure, e-research, e-science, e-social science, e-learning… • Institutional repositories • Open archives • Data collections

  6. Digital libraries and e-Science: Architecture • JISC/NSF working group on digital libraries and e-science / cyberinfrastructure • Layered model of cyberinfrastructure • Venn diagram model Images courtesy of Norman Wiseman, JISC

  7. Cyberinfrastructure: AnEvolvingView Applications Space Content Digital Libraries User Interfaces & Tools Scientific DBs Information & knowledge layer Middleware services layer ITC Infrastructure Processors, memory, network

  8. The Focus for DL Infrastructure:Version 2 E-Science Or E-Research E-Learning E-resources: common to all, need demonstrators of how it will work Hybrid Libraries

  9. Digital libraries and e-Science: Content • Scholarly communication: Secondary resources • Conference proceedings • Journal articles • Working papers, manuscripts, pre-prints • Books • Research data: Primary sources • Raw data from research • Instrumented data collection (labs, sensor networks) • Field notes • Archival sources • Unique documents • Records of individuals and organizations • Value chain: linking data, documents, and related sources E-bank image courtesy of Jeremy Frey and Liz Lyon; Crystallography image courtesy of Tony Hey

  10. eBank Project Virtual Learning Environment Reprints Peer-Reviewed Journal & Conference Papers Technical Reports LocalWeb Preprints & Metadata Institutional Archive Publisher Holdings Certified Experimental Results & Analyses Data, Metadata & Ontologies Undergraduate Students Digital Library Graduate Students E-Scientists E-Scientists E-Scientists Grid 5 E-Experimentation Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning

  11. Crystallographic e-Prints • Direct Access to Raw Data from scientific papers Raw data sets can be very large and these are stored at National Datastore using SRB server

  12. Digital libraries and e-Science: Curation • Curation: maintenance of data over its entire life-cycle and over time for current and future generations of users. Involves active management and appraisal… adding value… to a trusted body of information.* • Scholarly communication: Secondary resources • Libraries • Disciplinary repositories (arXiv) • Institutional repositories (ePrints, Dspace) • Publishers • Research data: Primary sources • Archives (arts and humanities) • Disciplinary repositories (e.g., IVOA, BADC, IRIS, BIRN, NEON, GEON) • Research groups *What is digital curation? (2004). Edinburgh, Digital Curation Center. 2004. Image of British Atmospheric Data Centre courtesy of Bryan Lawrence

  13. Simulations Assimilation Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre British Oceanographic Data Centre

  14. Digital libraries: Uses of secondary resources • Community orientation of researchers • “open science” depends upon sharing, transparency, rapid dissemination • documents may be self-archived, deposited, shared with peers • incentive and reward system based on publication • researchers contribute to digital collections via publication • Individual orientation of students • searchers of digital collections, not contributors • reliant upon search mechanisms and bibliographic control • Scholarly publications: inputs and outputs of research • Digital libraries: “boundary objects” between experts and novices

  15. Digital libraries: Uses of primary sources • Community orientation of researchers • e-Science goal to facilitate creation and use of research data by collaborating groups • Sharing practices vary widely by research topic and discipline • Agreements made for access to data, credit for publications • Data: input and output of research • Providing context to interpret data • Scholarly publications provide context • Digital libraries remove context

  16. Digital libraries: Sharing primary sources • Incentives to share data • Tradition of “open science” to promote access, transparency • Establish trust and reciprocity within a research group • Ability to mine large data sets, compare results • Ability to replicate experiments, studies • Requirement of some funding agencies • Incentives not to share data • Rewards for publication, not for data management • Benefits of contributing data may accrue to other parties • Risks of misinterpretation of your data • Risks of losing control over data • Risks of loss of intellectual property

  17. Digital libraries, e-Science and e-Learning • Case studies in the design of scientific digital libraries that will support research and teaching applications • Alexandria Digital Earth Prototype (ADEPT) • http://is.gseis.ucla.edu/adept/ • Center for Embedded Networked Sensing (CENS) • http://www.cens.ucla.edu

  18. Museum Artifacts Earth Art Alexandria DL of Distributed Spatial Information Objects Other Digital Archives Zoological Habitat Study If it has a latitude and longitude then it can be in ADL Botanical Study Ocean Science Data Archeological Dig What information do you have about her?

  19. Data models for habitat monitoring and sensor networks Contaminant Transport Group • Multimedia, Multiscale problems (time and space) • Multidisciplinary (current and as yet unknown) problems • Management, visualization, exploration of massive, heterogeneous data streams

  20. Center for Embedded Networked Sensing:Education and Data Management Projects • Goals • Make sensor data useful for scientists • Make sensor data useful for teaching high school science • Facilitate inquiry learning with scientific data • User communities • Research scientists (habitat ecology, seismology) • High school science teachers (biology and physics) • High school students • Activities to be supported • Scientific data management by scientists • Enable students to formulate and test hypotheses • Enable students to design experiments “tasking” sensors

  21. Some CENS Results-1 • CENS committed in principle to sharing data • Data management practices vary widely by discipline • Seismic: High experience, use of community repository (IRIS) and metadata format (SEED) • Habitat ecology: Low experience, exploring new community repository (Morpho) and metadata (Environmental Metadata Language) • Avian biology (localization of birdsongs): High experience across multiple disciplines • Education: Standards exist for educational modules but not for data

  22. Some CENS Results-2 • No single metadata model or crosswalk will serve all CENS scientific applications • Metadata for individual disciplines, e.g., • Environmental Metadata Language: biocomplexity data • SEED: seismic data • Metadata for technical devices, e.g., Sensor Markup Language • Geospatial coordinates required for most applications • Geospatial data standards exists for 2D points • Context descriptors also needed (distance from sea level, local distance from ground, above/below leaf, north/south side of tree)

  23. METADATA FOR SENSOR DATA FOR HABITAT MONITORING METADATA FOR EDUCATION MODULES FOR HABITAT MONITORING CENS Schema SensorML EML 2.0 LOM GEM ADN CENS_Node.Node_Name Name of Node Sml:IdentifiedAs (2.2.2) CENS_Node.Node_Desc Description of Node AssetDescription: sml:description (2.2.12) CENS_Location.Location_ID Unique location ID CrsID (2.2.5) Eml-Coverage(2.4.4) CENS_Location.X_Pos (Position on X axis) HasCRS (2.2.5) ObjectState (3.3.6) Eml-Coverage- GeographicCoverage (2.4.4) CENS_Location.Time_Recorded Time location was captured Eml-Coverage- TemporalCoverage (2.4.4) CENS_Location.Time_Type_ID Refers to type of time of Time_Type ID table Eml-Coverage (2.4.4) Educational-Typical Age Range (5.7) Audience-Age Audience Life Cycle-Contribute (2.3) Creator Resource Creator General-Coverage (1.6) Coverage-Spatial, Temporal Coverage (spatial and temporal) Life Cycle-Date (2.3.3) DateTime (8) Date Creation dateAccession date General-Description (1.4) Description Description Educational (5) Pedagogy Educational

  24. CENS Research Directions • Infrastructure goals for CENS • Support scientists in collecting, managing, preserving, sharing data • Develop modular, extensible metadata architecture (XML-based) • Develop filtering tools to extract and visualize scientific data for educational use • Conduct behavioral studies of scientists, teachers, and students • How do they determine their data requirements? • What are their criteria for selecting, preserving data? • How do they use scientific data? • How do their uses evolve over time? • What are their incentives and disincentives to contribute data to repositories?

  25. Summary • Technical infrastructure of e-Science under construction • Social aspects of e-Science infrastructure poorly understood • Value chain of e-Science requires linkage between data and documents • Scholarly communication infrastructure evolving, but exists • Data infrastructure fragmented across disciplines, communities • Digital libraries: intersection of e-science, e-learning, and libraries • Data practices vary more widely than do publication practices • Communities may represent, organize, and use the same data differently • Individuals may use the same content differently for different purposes • Research on digital libraries and e-learning can identify some of the challenges for building an e-science infrastructure

  26. Acknowledgements • ADEPT is funded by National Science Foundation grant no. IIS-9817432, Terence R. Smith, University of California, Santa Barbara, Principal Investigator.http://is.gseis.ucla.edu/adept/ • ADEPT collaborators: Gregory Leazer, Anne Gilliland, Richard Mayer • CENS is funded by National Science Foundation Cooperative Agreement #CCR-0120778, Deborah L. Estrin, UCLA, Principal Investigator. http://www.cens.ucla.edu • CENS collaborators: Jonathan Furner, William Sandoval, Noel Enyedy, Michael Hamilton

  27. ADEPT Project: Geospatial digital libraries • Goals • Add services to Alexandria Digital Library for teaching undergraduate courses in geography • Facilitate inquiry learning by providing access to primary sources • User communities • Faculty, as researchers • Faculty, as teachers of undergraduate courses • Undergraduate students • Activities to be supported • Information searching and retrieval • Composing lectures that incorporate text, concepts, and objects • Constructing learning modules in which students can formulate and test hypotheses

  28. Some ADEPT Results (1999-2004)-1 • Technology use by geographers • Research: sophisticated users of IT; mapping software, atmospheric simulations, sensor networks • Teaching: minimal users of IT; rely on overhead projectors, chalk, paper maps, boxes of rocks, lecturing from hand-written notes • Information seeking by geographers • Research: typical library use, online searching • Teaching: irregular, non-directed, often a by-product of research activities • Information resources used by geographers • Research: varies by specialty; all want maps and images • Teaching: varies by course; all want maps and images

  29. Some ADEPT Results (1999-2004)-2 • Search queries of geographers • Research: concept, place (place name, latitude/longitude) • Teaching: concept, place, process (examples of erosion, population movements, etc.) • Selection of digital resources for teaching • Clarity of presentation more important than familiarity of location • Desire to annotate, highlight, resize, for presentation • Use of primary data in instruction • Preference for use of own research data • Tools to manage own research data would make DL teaching services more attractive • Most prefer to hold their data privately; want control over personal digital libraries of their teaching and research resources

More Related