Sdsc and cieg overview cieg workshop april 2007
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

SDSC and CIEG Overview CIEG Workshop April, 2007 PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

SDSC and CIEG Overview CIEG Workshop April, 2007. Anke Kamrath Division Director, San Diego Supercomputer Center [email protected] ~400 Staff Production HPC and Data Staff Numerous Science Research Projects and Computational Scientists Software & Technology R&D.

Download Presentation

SDSC and CIEG Overview CIEG Workshop April, 2007

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Sdsc and cieg overview cieg workshop april 2007

SDSC and CIEG OverviewCIEG WorkshopApril, 2007

Anke Kamrath

Division Director, San Diego Supercomputer Center

[email protected]

Sdsc overview

~400 Staff

Production HPC and Data Staff

Numerous Science Research Projects and Computational Scientists

Software & Technology R&D

Data and Knowledge Systems


Science Research and Development

Next-Generation Storage



User Services and Training

SDSC Overview

Science is a team sport



Data Managementand Mining




Modeling and Simulation

Science is a Team Sport

Life Sciences

Cyberinfrastructure a unifying concept

Cyberinfrastructure – A Unifying Concept

Cyberinfrastructure= resources(computers, data storage, networks, scientific instruments, experts, etc.) + “glue”(integrating software, systems, and organizations).

NSF’s “Atkins Report” provided a compelling vision for integrated Cyberinfrastructure

Empowering science and engineering communities

Next GenerationBiology Workbench




Empowering Science and Engineering Communities

  • Empowering Scientific Communitiesinvolves deep community collaborations and the development of software tools for transforming data to discovery

A deluge of data

Data from instruments

Data from sensors

Data from simulations

Data from analysis

A Deluge of Data

  • Today data comes from everywhere

    • “Volunteer” data

    • Scientific instruments

    • Experiments

    • Sensors and sensornets

    • Computer simulations

    • New devices (personal digital devices, computer-enabled clothing, cars, …)

  • And is used by everyone

    • Researchers, educators

    • Consumers

    • Practitioners

    • General public

  • Turning the deluge of data into usable information for the research and education community requires an unprecedented level of integration, globalization, scale, and access

Volunteer data

Using data as a driver sdsc cyberinfrastructure


Summer Institute


Using Data as a Driver: SDSC Cyberinfrastructure

Community Databasesand Data Collections,Data management, mining and preservation

Data-oriented HPC, Resources,

High-end storage,

Large-scale data analysis, simulation, modeling



SDSCData Cyberinfrastructure

Data-oriented Tools, SW Applications, and Community Codes

Data- and Computational Science Education and Training

Collaboration, Service and Community Leadership for Data-oriented Projects

Sdsc production resources

SDSC Production Resources


  • 2.4 PB Storage-area Network (SAN)

  • 25 PB StorageTek/IBM tape library

  • HPSS and SAM-QFS archival systems

  • DB2, Oracle, MySQL

  • Storage Resource Broker

  • Supporting servers: IBM 32-way p690s,

  • 72-CPU SunFire 15K, etc.


Support for community data collections and databases

Data management, mining, analysis, and preservation


  • DataStar

    • 15.6 TFLOPS Power 4+ system

    • 7.125 TB total memory

    • Up to 4 GBps I/O to disk

    • 115 TB GPFS filesystem

  • Blue Gene Data

    • First academic IBM Blue Gene system

    • 17.1 TF

    • 1.5 TB total memory

    • 3 racks, each with 2,048 PowerPC processors and 128 I/O nodes

  • TeraGrid Cluster

    • 524 Itanium2 IA-64 processors

    • 2 TB total memory

    • Also 16 2-way data I/O nodes



  • User Services

  • Application/Community Collaborations

  • Education and Training

  • SDSC/Cal-IT2 Synthesis Center

  • Data-oriented Community SW, toolkits, portals, codes


Getting an allocation it s free

Getting an Allocation – It’s Free!

  • Open to researchers affiliated with U.S. academic and non-profit research institutions

  • Proposals reviewed quarterly

  • Several types of allocations:

    • Development Allocations

      • Quick turnaround

      • Up to 10,000 service units (CPU-hours)

    • Medium Allocations

      • Reviewed quarterly

      • Between 10,000-500,000 service units

    • Large Allocations

      • Reviewed twice a year

      • Over 500,000 service units

  • Getting Started:

  • SDSC Data Allocations:

    • Getting Started:

Serving many disciplines

Serving Many Disciplines

Sdsc strategic applications collaborations sac program

SDSC Strategic Applications Collaborations (SAC) Program

  • Goal:

    • Make significant impact in enabling and enhancing HPC, Data, Vis for user community.

  • Approach:

    • SDSC’s domain science and HPC/data/viz expert staff paired with PIs for projects lasting 3-12+ months

    • Recruit new users from traditional fields and new users from non-traditional fields

    • Generalize solutions applicable to wider user community

  • Examples:

    • “Scale up” newly recruited users 100sK – 1000sK SU users

    • Optimize parallel algorithms, I/O performance, parallel scaling (extreme scaling for petascale), single processor performance;

Sac example sdsc terashake

SAC Example: SDSC TeraShake

Sac example dns turbulence

SAC Example: DNS Turbulence

  • DNS (Direct Numerical Simulation) code used for years to simulate a range of phenomena in turbulence and turbulent mixing.

  • Over the years PI had millions of allocated SUs on SDSC and other NSF center’s machines

  • Original code was limited in scalability by N processors for N^3 grid problem. SAC improvements increased to N^2 processor scalability.

    • Allows significantly bigger problems

    • Allows faster time to solution

  • Now computing at 2048^3 grid resolution on DataStar. Would like to reach the grid size capabile on the Earth Simulator i.e. 4096^3 grid resolution, to better understand physics at micro scales

Sac example dns turbulence cont

SAC Example: DNS Turbulence (cont)

  • Reimplemented in 2-D parallel decomposition of the compute-intensive part (3D FFT)

  • Now capable of scaling up to N2 processors

    • 16M processors for 4096 grid

  • New code successfully tested on 32,768 BG processors at IBM Watson lab

    • Achieved 4096^3 grid

    • First ever in US!

  • By-product: optimized library for scalable 3D FFT, for use in other codes. Beta version available at SDSC Web site. The library was used in another turbulence code (PI; Krishnana, U. Minn); other PIs and IBM also interested.

The execution speed (# of steps per second of execution), normalized by the problem size, is plotted on the Y-axis.

Better neurosurgery through cyberinfrastructure

Radiologists and neurosurgeons at Brigham and Women’s Hospital, Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment

Transmission repeated every hour during 6-8 hour surgery.

Transmission and output must take on the order of minutes

Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons

Better Neurosurgery Through Cyberinfrastructure

  • PROBLEM:Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue

  • Brain deforms during surgery

  • Surgeons must align preoperative brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation

Community data repository sdsc datacentral

Community Data Repository: SDSC DataCentral

  • Provides “data allocations” on SDSC resources to national science and engineering community

    • Data collection and database hosting

      • Batch oriented access

      • Collection management services

    • First broad program of its kind to support research and community data collections and databases

  • Comprehensive resources

    • Disk:300 TB accessible via HPC systems, Web, SRB, GridFTP

    • Databases:DB2, Oracle, MySQL

    • SRB:Collection management

    • Tape:25 PB, accessible via file system, HPSS, Web, SRB, GridFTP

    • 24/7 operations, collection specialists

DataCentral infrastructure includes: Web-based portal, security, networking, UPS systems, web services and software tools

Sampling of public data collections hosted in datacentral

Sampling of Public Data Collections Hosted in DataCentral

Earth Sciences






Tsunami Data


AfCS Molecule Pages

Bee Behavior

Biocyc (SRI)




Encyclopedia of Life

Gene Ontology

Interpro Mirror



TreeBaseYeast Regulatory Network

Apoptosis Database


Backbone Header Traces

Backscatter Data





3D Ground Motion Collection




NVO - Digsky


Hayden Planetarium


Galactic ALDA HI Survey



Neural Basis of Visual Perception

Human Brain Dynamics Resource


Merced Library





Stripe Glasses

Higgs Boson at LHC

Data fundamental component of new discovery in science and engineering

Identifying Brain Disorders

Remote visualization allows analysis of multi-TB brains without high data transfer costs, expanding productivity by more than ten-fold

New information about the Heavens

Aggregate information from the world’s largest telescopes compared to provide new information on the existence and behavior of astronomical objects

Simulating the Universe from First PrinciplesLarge-scale ENZO runs enable spatial mapping and simulated sky surveys; 26 TB output

Data Fundamental Component of New Discovery in Science and Engineering

How much data is there

How much Data is there?*

iPod Shuffle (up to 120 songs) = 512 MegaBytes

Printed materials in the Library of Congress = 10 TeraBytes

1 human brain at the micron level= 1 PetaByte

SDSC HPSS tape archive =25 PetaBytes

1 novel = 1 MegaByte

All worldwide information in one year = 2 ExaBytes

1 Low Resolution Photo = 100 KiloBytes

* Rough/average estimates

Sdsc services tools and technologies for data management and synthesis

Data Systems





Data Services

Data migration/upload, usage and support (SRB)

Database selection and Schema design (Oracle, DB2, MySQL)

Database application tuning and optimization

Portal creation and collection publication

Data analysis (e.g. Matlab) and mining (e.g. WEKA)


Data-oriented Toolkits and Tools

Biology Workbench

Montage (astronomy mosaicking)

Kepler (Workflow management)

Vista Volume renderer (visualization), etc.

SDSC Services, Tools, and Technologies for Data Management and Synthesis

Cyberinfrastructure experiences for graduate students cieg program

Cyberinfrastructure Experiences for Graduate Students (CIEG) Program

  • Preparing Students for high-end computational and data science and engineering

  • Using Cutting-Edge Resources

  • Anticipating Future Technology Directions and their applicability to your field

  • 10-week Summer Program that Partners Students with SDSC experts on SAC Team (Compute, Data, Vis)

From NSF Announcement

  • “help foster a generation of researchers for whom such tools are incorporated naturally into advancing the research field.”

  • “expand the community of researchers with the necessary skills and experience to conduct sophisticated research involving cyberinfrastructure.”

Thank you

Thank You

[email protected]

  • Login