1 / 27

Scientific Computing for SLAC Science

Scientific Computing for SLAC Science. Bebo White Stanford Linear Accelerator Center October 2006. Scientific Computing The relationship between Science and the components of Scientific Computing. Drivers for SLAC Computing. Computing to enable today’s data-intensive science

miller
Download Presentation

Scientific Computing for SLAC Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Computing for SLAC Science Bebo White Stanford Linear Accelerator Center October 2006

  2. Scientific ComputingThe relationship between Science and the components of Scientific Computing

  3. Drivers for SLAC Computing • Computing to enable today’s data-intensive science • Clusters, interconnects, networks, mass storage, etc. • Computing research to prepare for tomorrow’s challenges • Massive memory, low latency, petascale databases, detector simulation, etc.

  4. SLAC Scientific Computing

  5. Data Challenge in High Energy Physics2006 example SLAC ~10TB/s Online System Selectionand Compression • Raw data written to tape:10MB/s • Simulated and derived data: 20 MB/s • International network data flow to “Tier A Centers” 50 MB/s (400Mb/s)

  6. Data Challenge in High Energy Physics: CERN / LHC High Energy Physics Data 2008 onwards ~100 MBps Event Simulation Online System ~PBps EventReconstruction HPSS Tier 0 +1 CERN LHC CMS detector 12,500 tons, $700M 2.5-40 Gbps Tier 1 Germany Italy FermiLab, USA France ~0.6-2.5 Gbps Tier 2 Analysis Tier 3 ~0.6-2.5 Gbps Institute ~0.25TIPS • 2000 physicists in 31 countries are involved in this 20-year experiment in which DOE is a major player. • Grid infrastructure spread over the US and Europe coordinates the data analysis Physics data cache 100 - 1000 Mbps Tier 4

  7. SLAC-BaBar Computing Fabric HEP-specific ROOT software (Xrootd) +Objectivity/DB object database some NFS Disk Server Disk Server Disk Server Disk Server Disk Server Disk Server Tape Server Tape Server Tape Server Tape Server Tape Server Client Client Client Client Client Client 1700 dual CPU Linux (over 3700 cores) IP Network (Cisco) 120 dual/quad CPU Sun/Solaris~700 TB Sun RAID arrays (FibreChannel +some SATA) IP Network (Cisco) HPSS + SLAC enhancements to ROOT and Objectivity server code 25 dual CPU Sun/Solaris40 STK 9940B6 STK 9840A6 STK Powderhornover 1 PB of data

  8. Used/Required Space

  9. ESnet: Source and Destination of the Top 30 Flows, Feb. 2005 DOE Lab-International R&E Lab-U.S. R&E (domestic) 12 SLAC (US)  RAL (UK) Lab-Lab (domestic) Fermilab (US)  WestGrid (CA) Lab-Comm. (domestic) 10 Terabytes/Month 8 SLAC (US)  IN2P3 (FR) LIGO (US)  Caltech (US) 6 SLAC (US)  Karlsruhe (DE) Fermilab (US)  U. Texas, Austin (US) SLAC (US)  INFN CNAF (IT) LLNL (US)  NCAR (US) Fermilab (US)  Johns Hopkins Fermilab (US)  Karlsruhe (DE) Fermilab (US)  UC Davis (US) Fermilab (US)  SDSC (US) Fermilab (US)  U. Toronto (CA) IN2P3 (FR)  Fermilab (US) U. Toronto (CA)  Fermilab (US) Fermilab (US)  MIT (US) LBNL (US)  U. Wisc. (US) 4 Qwest (US)  ESnet (US) DOE/GTN (US)  JLab (US) NERSC (US)  LBNL (US) CERN (CH)  Fermilab (US) NERSC (US)  LBNL (US) NERSC (US)  LBNL (US) NERSC (US)  LBNL (US) NERSC (US)  LBNL (US) BNL (US)  LLNL (US) BNL (US)  LLNL (US) CERN (CH)  BNL (US) BNL (US)  LLNL (US) BNL (US)  LLNL (US) 2 0

  10. Growth and Diversification • Continue shared cluster growth as much as possible • Increasing MPI (parallel) capacity and support (astro, accelerator, and more) • Grid interfaces and support (Atlas et.al) • Large SMPs (Astro) • Visualization

  11. Research - PetaCache • The PetaCache architecture aims at revolutionizing the query and analysis of scientific databases with complex structure • Generally this applies to feature databases (terabytes-petabytes) rather than bulk data (petabytes-exabytes) • The original motivation comes from HEP • Sparse (~random) access to tens of terabytes today, petabytes tomorrow • Access by thousands of processors today, tens of thousands tomorrow

  12. Latency Ideal

  13. Latency Current Reality

  14. Latency Practical Goal

  15. PetaCache Summary • Data-intensive science increasingly requires low-latency access to terabytes or petabytes • Memory is one key: • Commodity DRAM today (increasing total cost by ~2x) • Storage-class memory (whatever that will be) in the future • Revolutions in scientific data analysis will be another key • Current HEP approaches to data analysis assume that random access is prohibitively expensive • As a result, permitting random access brings much-less-than-revolutionary immediate benefit • Use the impressive motive force of a major HEP collaboration with huge data-analysis needs to drive the development of techniques for revolutionary exploitation of an above-threshold machine

  16. Research – Very Large Databases • 10-year, unique experience with VLDB • Designing, building, deploying, and managing peta-scale production datasets/database – BaBar – 1.4 PB • Assisting LSST (Large Synoptic Survey Telescope) in solving data-related challenges (effort started 4Q 2004)

  17. LSST – Data Related Challenges (1/2) • Large volumes • 7 PB/year (image and catalog data) • 500 TB/year (database) • Todays VLDBs ~10s TB range • High availability • Petabytes -> 10s of 1000s of disks -> daily disk failures • Real time requirement • Transient alerts generated in < 60 sec

  18. LSST – Data Related Challenges (2/2) • Spatial and temporal aspects • Most surveys focus on a single dimension • All data made public with minimal delay • Wide range of users – professional and amateur astronomers, students, general public

  19. VLDB Work by SCCS • Prototyping at SCCS • Close collaboration with key MySQL developers • Working closely with world-class database gurus

  20. Research – Geant4 • A toolkit simulating elementary particles passing through and interacting with matter, and modeling the detector apparatus measuring the passage of elementary particles and recording the energy and dose deposition • Geant4 is developed and maintained by an international collaboration • SLAC is the second largest center next to CERN

  21. Acknowledgements • Richard Mount, Director, SCCS • Chuck Boeheim, SCCS • Randall Melen, SCCS

  22. WWW 2008April 21-25, 2008Beijing, China

  23. Host Institution and Partners • Beihang University • School of Computer Science • Tsing-Hua University, Peking University, Chinese Academy of Sciences, … • Microsoft Research Asia • City Government of Beijing (pending)

  24. BICC: Beijing International Convention Center

  25. Key Personnel • General Chairs: • Jinpeng Huai, Beihang University • Robin Chen, AT&T Labs • Conference Vice Chair: • Yunhao Liu, HKUST • Local volunteers • 6-10 grad students led by Dr. Zongxia Du • In cooperation with John Miller (TBD) • IW3C2 Liaison: Ivan Herman • PCO: two candidates under consideration

  26. Local Organizing Committee • Composition of Local Organizing Committee: • Vincent Shen, The HK University of Science and Technology • Zhongzhi Shi, Chinese Academy of Sciences • Hong Mei, Peking University • Dianfu Ma, Beihang University • Guangwen Yang, Tsinghua University • Hsiao-Wuen Hon, Microsoft Research Asia • Minglu Li, Shanghai Jiao Tong University • Hai Jin, Huazhong University of Science and Technology • … and Chinese Internet/Software/Telecom companies

More Related