Sdsc and cieg overview cieg workshop april 2007
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

SDSC and CIEG Overview CIEG Workshop April, 2007 PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

SDSC and CIEG Overview CIEG Workshop April, 2007. Anke Kamrath Division Director, San Diego Supercomputer Center [email protected] ~400 Staff Production HPC and Data Staff Numerous Science Research Projects and Computational Scientists Software & Technology R&D.

Download Presentation

SDSC and CIEG Overview CIEG Workshop April, 2007

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sdsc and cieg overview cieg workshop april 2007

SDSC and CIEG OverviewCIEG WorkshopApril, 2007

Anke Kamrath

Division Director, San Diego Supercomputer Center

[email protected]


Sdsc overview

~400 Staff

Production HPC and Data Staff

Numerous Science Research Projects and Computational Scientists

Software & Technology R&D

Data and Knowledge Systems

Grids

Science Research and Development

Next-Generation Storage

SDSC

HPC

User Services and Training

SDSC Overview


Science is a team sport

GAMESS

Geosciences

Data Managementand Mining

Astronomy

Physics

QCD

Modeling and Simulation

Science is a Team Sport

Life Sciences


Cyberinfrastructure a unifying concept

Cyberinfrastructure – A Unifying Concept

Cyberinfrastructure= resources(computers, data storage, networks, scientific instruments, experts, etc.) + “glue”(integrating software, systems, and organizations).

NSF’s “Atkins Report” provided a compelling vision for integrated Cyberinfrastructure


Empowering science and engineering communities

Next GenerationBiology Workbench

GPFS

/

SYNTHESIS CENTER

Empowering Science and Engineering Communities

  • Empowering Scientific Communitiesinvolves deep community collaborations and the development of software tools for transforming data to discovery


A deluge of data

Data from instruments

Data from sensors

Data from simulations

Data from analysis

A Deluge of Data

  • Today data comes from everywhere

    • “Volunteer” data

    • Scientific instruments

    • Experiments

    • Sensors and sensornets

    • Computer simulations

    • New devices (personal digital devices, computer-enabled clothing, cars, …)

  • And is used by everyone

    • Researchers, educators

    • Consumers

    • Practitioners

    • General public

  • Turning the deluge of data into usable information for the research and education community requires an unprecedented level of integration, globalization, scale, and access

Volunteer data


Using data as a driver sdsc cyberinfrastructure

SRB

Summer Institute

IT

Using Data as a Driver: SDSC Cyberinfrastructure

Community Databasesand Data Collections,Data management, mining and preservation

Data-oriented HPC, Resources,

High-end storage,

Large-scale data analysis, simulation, modeling

Biology

Workbench

SDSCData Cyberinfrastructure

Data-oriented Tools, SW Applications, and Community Codes

Data- and Computational Science Education and Training

Collaboration, Service and Community Leadership for Data-oriented Projects


Sdsc production resources

SDSC Production Resources

  • SDSC DATA COLLECTIONS, ARCHIVAL AND STORAGE SYSTEMS

  • 2.4 PB Storage-area Network (SAN)

  • 25 PB StorageTek/IBM tape library

  • HPSS and SAM-QFS archival systems

  • DB2, Oracle, MySQL

  • Storage Resource Broker

  • Supporting servers: IBM 32-way p690s,

  • 72-CPU SunFire 15K, etc.

  • http://datacentral.sdsc.edu/

Support for community data collections and databases

Data management, mining, analysis, and preservation

SDSC HIGH PERFORMANCE COMPUTING SYSTEMS

  • DataStar

    • 15.6 TFLOPS Power 4+ system

    • 7.125 TB total memory

    • Up to 4 GBps I/O to disk

    • 115 TB GPFS filesystem

  • Blue Gene Data

    • First academic IBM Blue Gene system

    • 17.1 TF

    • 1.5 TB total memory

    • 3 racks, each with 2,048 PowerPC processors and 128 I/O nodes

  • TeraGrid Cluster

    • 524 Itanium2 IA-64 processors

    • 2 TB total memory

    • Also 16 2-way data I/O nodes

      http://www.sdsc.edu/

      user_services/

  • SDSC SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES

  • User Services

  • Application/Community Collaborations

  • Education and Training

  • SDSC/Cal-IT2 Synthesis Center

  • Data-oriented Community SW, toolkits, portals, codes

  • http://www.sdsc.edu/


Getting an allocation it s free

Getting an Allocation – It’s Free!

  • Open to researchers affiliated with U.S. academic and non-profit research institutions

  • Proposals reviewed quarterly

  • Several types of allocations:

    • Development Allocations

      • Quick turnaround

      • Up to 10,000 service units (CPU-hours)

    • Medium Allocations

      • Reviewed quarterly

      • Between 10,000-500,000 service units

    • Large Allocations

      • Reviewed twice a year

      • Over 500,000 service units

  • Getting Started: http://www.sdsc.edu/user_services/

  • SDSC Data Allocations:

    • Getting Started: http://datacentral.sdsc.edu


Serving many disciplines

Serving Many Disciplines


Sdsc strategic applications collaborations sac program

SDSC Strategic Applications Collaborations (SAC) Program

  • Goal:

    • Make significant impact in enabling and enhancing HPC, Data, Vis for user community.

  • Approach:

    • SDSC’s domain science and HPC/data/viz expert staff paired with PIs for projects lasting 3-12+ months

    • Recruit new users from traditional fields and new users from non-traditional fields

    • Generalize solutions applicable to wider user community

  • Examples:

    • “Scale up” newly recruited users 100sK – 1000sK SU users

    • Optimize parallel algorithms, I/O performance, parallel scaling (extreme scaling for petascale), single processor performance;


Sac example sdsc terashake

SAC Example: SDSC TeraShake


Sac example dns turbulence

SAC Example: DNS Turbulence

  • DNS (Direct Numerical Simulation) code used for years to simulate a range of phenomena in turbulence and turbulent mixing.

  • Over the years PI had millions of allocated SUs on SDSC and other NSF center’s machines

  • Original code was limited in scalability by N processors for N^3 grid problem. SAC improvements increased to N^2 processor scalability.

    • Allows significantly bigger problems

    • Allows faster time to solution

  • Now computing at 2048^3 grid resolution on DataStar. Would like to reach the grid size capabile on the Earth Simulator i.e. 4096^3 grid resolution, to better understand physics at micro scales


Sac example dns turbulence cont

SAC Example: DNS Turbulence (cont)

  • Reimplemented in 2-D parallel decomposition of the compute-intensive part (3D FFT)

  • Now capable of scaling up to N2 processors

    • 16M processors for 4096 grid

  • New code successfully tested on 32,768 BG processors at IBM Watson lab

    • Achieved 4096^3 grid

    • First ever in US!

  • By-product: optimized library for scalable 3D FFT, for use in other codes. Beta version available at SDSC Web site. The library was used in another turbulence code (PI; Krishnana, U. Minn); other PIs and IBM also interested.

The execution speed (# of steps per second of execution), normalized by the problem size, is plotted on the Y-axis.


Better neurosurgery through cyberinfrastructure

Radiologists and neurosurgeons at Brigham and Women’s Hospital, Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment

Transmission repeated every hour during 6-8 hour surgery.

Transmission and output must take on the order of minutes

Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons

Better Neurosurgery Through Cyberinfrastructure

  • PROBLEM:Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue

  • Brain deforms during surgery

  • Surgeons must align preoperative brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation


Community data repository sdsc datacentral

Community Data Repository: SDSC DataCentral

  • Provides “data allocations” on SDSC resources to national science and engineering community

    • Data collection and database hosting

      • Batch oriented access

      • Collection management services

    • First broad program of its kind to support research and community data collections and databases

  • Comprehensive resources

    • Disk:300 TB accessible via HPC systems, Web, SRB, GridFTP

    • Databases:DB2, Oracle, MySQL

    • SRB:Collection management

    • Tape:25 PB, accessible via file system, HPSS, Web, SRB, GridFTP

    • 24/7 operations, collection specialists

DataCentral infrastructure includes: Web-based portal, security, networking, UPS systems, web services and software tools


Sampling of public data collections hosted in datacentral

Sampling of Public Data Collections Hosted in DataCentral

Earth Sciences

Nexrad

ERESE

UCI ESMF

Earthref.org

ERDA

ERR

Tsunami Data

Biology

AfCS Molecule Pages

Bee Behavior

Biocyc (SRI)

CKAAPS

CIPRES

DigEmbryo

Encyclopedia of Life

Gene Ontology

Interpro Mirror

JCSG Data

PDB

TreeBaseYeast Regulatory Network

Apoptosis Database

Networking

Backbone Header Traces

Backscatter Data

HPWREN

IMDC

Skitter

Seismology

3D Ground Motion Collection

Terashake

CyberShake

Astronomy

NVO - Digsky

SLOAN

Hayden Planetarium

LUSciD/ENZO

Galactic ALDA HI Survey

Neuroscience

Salk

Neural Basis of Visual Perception

Human Brain Dynamics Resource

Education

Merced Library

Transana

NSDL

Physics

AMANDA

Stripe Glasses

Higgs Boson at LHC


Data fundamental component of new discovery in science and engineering

Identifying Brain Disorders

Remote visualization allows analysis of multi-TB brains without high data transfer costs, expanding productivity by more than ten-fold

New information about the Heavens

Aggregate information from the world’s largest telescopes compared to provide new information on the existence and behavior of astronomical objects

Simulating the Universe from First PrinciplesLarge-scale ENZO runs enable spatial mapping and simulated sky surveys; 26 TB output

Data Fundamental Component of New Discovery in Science and Engineering


How much data is there

How much Data is there?*

iPod Shuffle (up to 120 songs) = 512 MegaBytes

Printed materials in the Library of Congress = 10 TeraBytes

1 human brain at the micron level= 1 PetaByte

SDSC HPSS tape archive =25 PetaBytes

1 novel = 1 MegaByte

All worldwide information in one year = 2 ExaBytes

1 Low Resolution Photo = 100 KiloBytes

* Rough/average estimates


Sdsc services tools and technologies for data management and synthesis

Data Systems

SAM/QFS

HPSS

GPFS

SRB

Data Services

Data migration/upload, usage and support (SRB)

Database selection and Schema design (Oracle, DB2, MySQL)

Database application tuning and optimization

Portal creation and collection publication

Data analysis (e.g. Matlab) and mining (e.g. WEKA)

DataCentral

Data-oriented Toolkits and Tools

Biology Workbench

Montage (astronomy mosaicking)

Kepler (Workflow management)

Vista Volume renderer (visualization), etc.

SDSC Services, Tools, and Technologies for Data Management and Synthesis


Cyberinfrastructure experiences for graduate students cieg program

Cyberinfrastructure Experiences for Graduate Students (CIEG) Program

  • Preparing Students for high-end computational and data science and engineering

  • Using Cutting-Edge Resources

  • Anticipating Future Technology Directions and their applicability to your field

  • 10-week Summer Program that Partners Students with SDSC experts on SAC Team (Compute, Data, Vis)

From NSF Announcement

  • “help foster a generation of researchers for whom such tools are incorporated naturally into advancing the research field.”

  • “expand the community of researchers with the necessary skills and experience to conduct sophisticated research involving cyberinfrastructure.”


Thank you

Thank You

[email protected]

www.sdsc.edu


  • Login