Cyberinfrastructure, E-Science and the San Diego Supercomputer Center - PowerPoint PPT Presentation

Cyberinfrastructure e science and the san diego supercomputer center l.jpg
Download
1 / 52

Cyberinfrastructure, E-Science and the San Diego Supercomputer Center Chaitan Baru San Diego Supercomputer Center California Institute for Telecommunications and Information Technology University of California, San Diego Acknowledgements US National Science Foundation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Cyberinfrastructure, E-Science and the San Diego Supercomputer Center

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cyberinfrastructure e science and the san diego supercomputer center l.jpg

Cyberinfrastructure, E-Science and the San Diego Supercomputer Center

Chaitan Baru

San Diego Supercomputer Center

California Institute for Telecommunications and Information Technology

University of California, San Diego


Acknowledgements l.jpg

Acknowledgements

  • US National Science Foundation

    • Sponsors of GEON, and GEON international activities

  • The University of Auckland

    • Local hosts


Cyberinfrastructure and e science l.jpg

Cyberinfrastructure and E-science

  • Cyberinfrastructure:

    • “…The comprehensive infrastructure needed to capitalize on dramatic advances in information technology…”

    • “…essential to support the frontiers of research and education in this field…”

    • From NSF’s Cyberinfrastructure Vision for 21st Century Discovery, www.nsf.gov/od/oci/ci-v7.pdf, July 20, 2006

  • “E-Science”- thescience enterprise enabled by the use of such cyberinfrastructure

    • “Science increasingly performed through distributed global collaborations enabled by the Internet, using very large data collections, terascale computing resources and high performance visualizations.”

    • From Oxford e-Science Center,http://e-science.ox.ac.uk/ public/general/definitions.xml


Sdsc s support for ci and e science l.jpg

SDSC’s Support for CI and e-Science

  • Production Services

    • For nationally allocated supercomputer platforms, as well as computational platforms and storage systems for other projects

  • User Services

    • For nationally allocated supercomputers

  • Research and Development Collaborations

    • In support of computational science and informatics in a wide variety of science, engineering, humanities, and other disciplines

    • To develop common cyberinfrastructure (software) components

  • R&D constitutes >50% of SDSC’s activities

    • In funding as well as staffing


Integrated cyberinfrastructure system source dr deborah crawford chair nsf ci working committee l.jpg

Integrated Cyberinfrastructure SystemSource: Dr. Deborah Crawford, Chair, NSF CI Working Committee

Domain-specific Cybertools (software)

Shared Cybertools (software)

Distributed Resources (computation, storage, communication, etc.)

  • Applications

  • Geosciences

  • Environmental Sciences

  • Neurosciences

  • High Energy Physics …

Education and Training

DevelopmentTools & Libraries

Discovery & Innovation

Middleware Services

Hardware


Teragrid network l.jpg

TeraGrid Network

Grid Infrastructure Group (UChicago)

UW

PSC

UC/ANL

NCAR

PU

NCSA

UNC/RENCI

IU

Caltech

ORNL

U Tenn.

USC/ISI

SDSC

LSU

TACC

Resource Provider (RP)

Software Integration Partner


Teragrid science gateways l.jpg

TeraGrid Science Gateways

  • Provide entry points into TeraGrid for community-specific tools

  • Community-led initiative for the TeraGrid

  • URL

    • http://www.teragrid.org/programs/sci_gateways/


Computational science and informatics and the cs it context l.jpg

Computational Science and Informatics: And the CS/IT context

  • Computational physics and chemistry

    • Born at the time of Fortran, file-based systems, and expensive supercomputers, Internet, ftp, and HTML

  • Bioinformatics

    • Born at the time of Relational Database Management Systems (RDBMS), microprocessors, client-server computing, the Web, 3-tier architectures, CORBA, XML

  • Geoinformatics

    • Being born at the time of Web2.0, Google, mySpace, YouTube, mashups, social networking, and ontologies…

      Ref: Caring and Sharing of e-Science Data, C. Baru, Commentary, International Journal of Digital Libraries, October 2007


Community cyberinfrastructure projects l.jpg

Community Cyberinfrastructure Projects

Your Specific Tools & User Apps.

Shared Tools ScienceDomains

Friendly Work-Facilitating Portals

Authentication - Authorization - Auditing - Workflows - Visualization - Analysis

DevelopmentTools & Libraries

Ecological Observatories (NEON)

High Enegy Physics (GriPhyN)

Ocean Observing (OOI)

Biomedical Informatics (BIRN)

Geosciences (GEON)

Earthquake Engineering (NEES)

Middleware

Services

Hardware

Distributed Computing, Instruments and Data Resources

Adapted from: Mark Ellisman

UC San Diego


Portal based science environments support for resource sharing and collaborations l.jpg

Portal-based Science EnvironmentsSupport for resource sharing and collaborations


Common ci software elements l.jpg

Common CI Software Elements

  • NSF Software Development for Cyberinfrastructure (SDCI) Program

    • ROCKS -- Cluster Management Software

    • SRB/IRODS -- Collection-based Data Management

    • Kepler -- Scientific Workflow Software

    • Open Source DataTurbine -- Streaming Data Middleware

    • Inca -- Testing and Monitoring Software

  • Other Common Software

    • GAMA -- Grid Account Management Architecture

    • GridSphere -- Portlet-based Portal Infrastructure

    • RDV -- Realtime Data Viewer

  • Common Portlets

    • GEON portlets: Registration, Search, myWorkspace, TeraGrid Gateway

    • Used in several other CI projects


Observing systems l.jpg

Observing Systems

  • An important area for several US agencies, including National Science Foundation

    • Several agencies support observing system networks, e.g. USGS, NOAA, EPA, DoE, DOD, NASA, DHS, etc

  • A range of projects

    • Major research equipment: deployment of coordinated, regional, continental, international-scale instrumentation and sensor networks

      •  standardized instrumentation and protocols

    • Cyberinfrastructure: development of IT and software for managing sensor networks; collecting, analyzing, distributing data; data assimilation and execution of forecasting models

      •  standardized IT infrastructure (interfaces, technology implementations)

    • Individual investigator, or small group-driven research:

      • Local (regional) sensor networks, to study specific phenomena

      • Analysis of collected data

      • Modeling and data assimilation


Observing systems efforts l.jpg

Observing Systems Efforts

  • Some NSF Projects

    • EarthScope: Obtain “snapshot” of the lithospheric structure of the continental US

      • US Array; Plate Boundary Observatory (PBO); San Andreas Fault Observatory at Depth (SAFOD)

    • Ocean Observing Initiative: Understand ocean phenomena in the deep ocean and at the coastal margins

      • Regional Coastal Observatory; Global Observatory

    • National Ecological Observatory Network (NEON): model and predict the state of the ecosystem of the US

      • 17 climatic domains across contiguous states + 2 in Alaska + 1 in Hawaii

    • Long-Term Ecological Research Network (LTER): intensive studies at local and regional scales

      • >30 LTER sites across US

    • WATERS: monitor watersheds across US to study hydrologic as well as environmental engineering issues

      • CLEANER: Environmental engineering-based observatory projects

      • Hydrologic Information System (HIS): Hydrology-based observing systems projects

    • NEES, NVO, …

  • Moore Foundation-funded Projects

    • CAMERA: Metagenomics and marine microbials

    • GLEON: Global Lake Observatory Network

    • TEAM: Tropical Ecological Assessment and Monitoring Network


Cyberinfrastructure ci components in observing systems l.jpg

Cyberinfrastructure (CI) Components in Observing Systems

  • “Embedded CI”

    • Software for managing instruments, dataloggers, and data in sensor networks, including metadata generation

    • “Cyberdashboard” for management of instruments/sensor networks

  • Data Management

    • of data streams (with metadata) from dataloggers (in the field) to data repositories, to data archives

    • “Cyberdashboard” to keep track of data collection protocols

  • Analysis and Computation

    • Support for model runs, data assimilation, data analysis, data mining, including periodic reprocessing of archived data

  • Data Access

    • Authenticated access to a range of data products, from raw to highly derived, including the ability to “push” data to client applications


Nsf ocean observing initiative ooi l.jpg

NSF Ocean Observing Initiative (OOI)

Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego


Ooi coastal scale observatory l.jpg

OOI - Coastal Scale Observatory

Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego


Ooi regional l.jpg

OOI - Regional

Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego


Ooi global node l.jpg

OOI - Global Node

Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego


Ooi from construction to operations l.jpg

OOI - From Construction to Operations

Courtesy: John Orcutt, SIO Matt Arrott, Calit2, University of California, San Diego


Ooi conceptual view of the cyberinfrastructure l.jpg

OOI - Conceptual View of the Cyberinfrastructure


Neon cyberinfrastructure l.jpg

NEON Cyberinfrastructure

NEON Domains


The neon single string testbed l.jpg

The NEON “Single String” Testbed

NEON Single String Testbed (SSTB)

James Reserve, CA

SDSC, San Diego


Movebank for animal tracking and photo monitoring data l.jpg

MoveBankFor Animal Tracking and Photo Monitoring Data

  • A data repository

  • A live data pipeline

  • Online mapping and analysis tools

  • An educational tool

  • A community of collaborators

  • www.movebank.org

  • NSF BD&I: 0756920

PIs: Roland Kays (NY History Museum), Martin Wikelski (Princeton), Tony Fountain (SDSC, UCSD), Sameer Tilak (SDSC, UCSD)


Movebank current activities l.jpg

MoveBankCurrent Activities

  • Designing Data System

    • Requirements analysis

    • Schema definitions for camera trap and tracking data (trajectories)

  • Extending DataTurbine streaming data system for animal tracking and photo monitoring

    • Integration of cameras to data acquisition system

    • Event detection and notification system design

  • Building a knowledge base of best practices

  • Networking with other animal tracking communities and researchers to build collaborations


Moore observing systems projects l.jpg

Moore Observing Systems Projects

  • Some projects funded by Gordon and Betty Moore Foundation at UCSD

  • CAMERA: Metagenomics project

    • Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (Craig Venter, Larry Smarr)

    • Provide access to metagenomics databases collected from ocean water samples from around the world

  • OceanLife: Biodiversity in seamounts

    • Karen Stocks & Amarnath Gupta, SDSC

    • Integrated information source for seamount biodiversity

  • GLEON: Global Lake Ecological Observatory Network

    • Peter Arzberger, Calit2/UCSD, Tony Fountain, SDSC

    • Tim Kratz, Paul Hanson, U.Wisc

  • Cyberinfrastructure for TEAM

    • Tropical Ecology Assessment and Monitoring


Slide26 l.jpg

Source: Paul Hanson, U.Wisc

Courtesy: Peter Arzberger, Calit2/UCSD


Gleon s mission l.jpg

GLEON’s Mission

Facilitate interaction and build collaborations among an international, multidisciplinary community of researchers focused on understanding, predicting, and communicating the impact of natural and anthropogenic influences on lake ecosystems by developing, deploying, and using networks of emerging observational system technologies and associated cyberinfrastructure.

http://gleon.org

Source: Tim Kratz, U.Wisc


Slide28 l.jpg

  • 19 countries participating

  • More than 120 scientists

  • Most sites are developing

Source: Paul Hanson,

U.Wisc


Slide29 l.jpg

3 Networks

People

Data

Lake observatories

Source: Paul Hanson


Tropical ecology assessment and monitoring team network l.jpg

Tropical Ecology Assessment and Monitoring (TEAM) Network

  • Conservation International project

    • PI: Sandy Andelman, Vice President, Conservation International

    • Funded by Gordon and Betty Moore Foundation

  • Monitor wildland plots in tropical regions

    • Current sites: Brazil (3), Costa Rica, Suriname

    • Upcoming site: Madagascar

  • Cyberinfrastructure provided by SDSC


Team cyberinfrastructure goals l.jpg

TEAM Cyberinfrastructure Goals

  • Provide secure, reliable access to near real-time data from all TEAM sites

  • Facilitate timely, efficient, consistent data entry

    • By assisting with adherence to site-specific protocols

    • Providing up-to-date status of data entry

    • Providing ready visualizations of cross-site, network-level data

  • Manage a variety of different data types

    • Field collections, sensor data, museum collections, remote sensing data

    • Sensor data includes images and acoustic data

  • Provide customized portals (portlets)

    • E.g. site-specific information (with multi-lingual support), and project specific data and tools

  • CI goals are similar to those of other environmental observatory projects, e.g. NEON…


Team initial implementation l.jpg

TEAM Initial Implementation

  • Local PoP node:

  • E.g. at a site in a given country, or

  • One PoP node for a country

  • Future capability


Team portal and data management l.jpg

TEAM Portal and Data Management

  • Portal based on

    • Drupal: for content management

    • GridSphere: for sharing and collaboration of data and tools

  • Support different data types

    • Observational data

      • Climate data; Photos / images

    • Spatial (GIS) data

      • Different layers, e.g. including socioeconomic data

    • Museum collections

      • E.g. MetaCat, EcoGrid

    • Acoustic data

      • Algorithms for classifying acoustic data

    • Remote sensing data

      • Landsat, MODIS, ASTER, LiDAR


Cuahsi hydrologic information system his l.jpg

CUAHSI Hydrologic Information System (HIS)

  • Hydrology Data Portal

  • Digital Watershed

  • Hydrologic Analysis

(Source: David Maidment, UT Austin)


His service oriented architecture l.jpg

HIS Service Oriented Architecture

Web portal Interface (HDAS)

Information input, display, query and output services

Preliminary data exploration and discovery. See what is available and perform exploratory analyses

3rd party servers

Web services interface

e.g. USGS, NCDC

GIS

Matlab

Observatory servers

Workgroup HIS

IDL

SDSC HIS servers

Splus, R

D2K, I2K

Programming (Fortran, C, VB)

Downloads

Uploads

HTML -XML

Data access through web services

WaterOneFlow Web Services

WSDL - SOAP

Data storage through web services


Waterml and cuahsi his mediation l.jpg

WaterML and CUAHSI HIS Mediation

  • Develop WaterML as an interchange standard for hydrologic data

  • HIS serves as a mediator across multiple agency and individual PI data

    • Provides identifiers for sites, variables, etc. across observation networks

    • Manages and publishes controlled vocabularies, and provides vocabulary/ontology management and update tools

    • Provides common structural definitions for data interchange

    • Provides a sample protocol implementation

    • Governance framework: a consortium of universities, MOUs with federal agencies, collaboration with key commercial partners, led by renowned hydrologists, and NSF support for core development and test beds


Neesit and the nees user community l.jpg

NEESit and the NEES User Community

  • NEES Equipment Sites (15 large-scale labs)

  • NEESR Research Grants (>40 NSF projects)

  • Earthquake Engineering Researchers

  • Earthquake Engineering Practitioners

  • K-12 and Undergraduate students


Slide38 l.jpg

The NEESit System

Scientific Collaboration Environment (NEES Portal)

Telepresence

Video, Data, Audio

Archiving

Secure Communication

Data Repository

Structured Metadata

Graphical User Interface

Phys/Comp.

Curated

EOT

Cyber Accessibility

Community Content

On-line experiment

e-publications

Computational Tools

High Performance Computing

Hybrid Simulation (Phys/Comp.)

Visualization

Scientific Workflows


Nees portal parallel computing teragrid access l.jpg

NEES Portal: Parallel Computing & TeraGrid Access


Emergency response projects l.jpg

Emergency Response Projects

  • Katrinasafe and Disastersafe

    • Collaboration between American Red Cross and SDSC during Hurricane Katrina

    • Continuing now as disastersafe.redcross.org

    • Funded by an NSF grant for exploratory research on cyberinfrastructure preparedness

  • Spatiotemporal analysis of 911 call data

    • Collaboration with Public Safety Network

    • Funded by the NSF Digital Government program

  • UCSD Hazards Initiative


Disastersafe redcross org l.jpg

disastersafe.redcross.org

  • Outcome of collaboration on Katrinasafe

    • Site hosted at SDSC


Spatiotemporal analysis of 9 1 1 emergency call streams l.jpg

Spatiotemporal Analysis of 9-1-1 Emergency Call Streams

  • Funded by NSF Digital Government program

  • Project Goals

    • Provide situational awareness at a command and decision level (vs operational)

    • Assist local and State level emergency responses by

      • Generating immediate and dynamic information about the impact of medium- to large-scale events

      • Facilitating dynamic resource allocation

      • Serving as an early warning system of emergency events

  • Collaboration among

    • California Office of Emergency Services (OES)

    • University of California, San Diego

    • Public Safety Network


Temporal extent of collected data l.jpg

Temporal Extent of Collected Data

  • San Francisco Bay Area:

    30 months of data

  • San Diego County:

    16 months of data

  • Total of

    5,301,191 calls


Spatial extent of collected data l.jpg

Spatial Extent of Collected Data

= landline call

= cellular call

San Francisco Bay Area, 69 PSAPs

San Diego County, 20 PSAPs

(Dithered to approx. 300m; One day of 9-1-1 call activity shown)


Call stream shows temporal regularity l.jpg

Call Stream Shows Temporal Regularity

Average daily call volume for the San Francisco Combined Emergency Communications Center (CECC) PSAP.

Average hourly call volume for the San Francisco Combined Emergency Communications Center (CECC) PSAP.


Daily call volume l.jpg

Daily Call Volume

4th of July

Data collection process offline

Histogram of daily call volume for the collected data

Times series of daily call volume for the collected data (SF)


Animation clustering of phone calls l.jpg

Animation: Clustering of phone calls


Cessna plane collision in san diego l.jpg

Cessna plane collision in San Diego


Other projects l.jpg

Other projects

  • PRAGMA: Pacific Rim Assembly for Grid Middleware Applications

    • PI: Dr. Peter Arzeberger, UCSD; Co-PI: Phil Papadopoulos

    • GEON is a participant in PRAGMA, and co-chairs the PRAGMA Geosciences Working Group

  • Optic fiber links and Lamba Grid

    • PI: Prof. Larry Smarr


Technical interoperability issues l.jpg

Technical Interoperability Issues

  • Authentication

    • Need a common authentication framework, to provide role-based access to distributed resources

    • Else, users will be burdended with too many accounts and passwords, one for each site

  • Information security

    • Provenance, IP issues

  • Distributed Data (and Metadata)

    • Metadata search interoperability

      • Large archives will remain distributed. Need metadata search interoperability so that a single search can search several metadata catalogs

    • Caching and replication of frequently used (large) data

    • “Distributed curation with centralized hosting” could be an option


Technical interoperability issues51 l.jpg

Technical Interoperability Issues

  • Distributed Computing

    • “Portlet aggregation”

      • A set of functionality, e.g. data+Web services, can be implemented as a portlet

      • A portal can be deployed containing a number of such distributed portlets

    • Portals can provide “gateways” to large storage and computing resources, e.g. including the TeraGrid

    • “Federated portlets”

      • A set of portlets shared by more than one community

  • Technologies to Support Collaborations in Virtual Organizations

    • Standard tools (email, forums, wikis)

    • Social networking

    • Development of ontologies, and recommendation systems


Thanks l.jpg

Thanks!

  • email: baru@sdsc.edu


  • Login