1 / 13

Midrange Computing

DOE Perspective on Cyberinfrastructure - LBNL Gary Jung Manager, High Performance Computing Services Lawrence Berkeley National Laboratory Educause CCI Working Group Meeting November 5, 2009. Midrange Computing

Download Presentation

Midrange Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DOE Perspective on Cyberinfrastructure - LBNLGary JungManager, High Performance Computing ServicesLawrence Berkeley National LaboratoryEducause CCI Working Group MeetingNovember 5, 2009

  2. Midrange Computing • DOE ASCR hosted a workshop in Oct 2008 to assess the role of mid-range computing in the Office of Science and revealed that this computation continues to play an increasingly important role in enabling the Office of Science. • Although it is not part of ASCR's mission, midrange computing, and the associated data management play a vital and growing role in advancing science in disciplines where capacity is as important as capability. • Demand for midrange computing services is… • growing rapidly at many sites (>30% growth annually at LBNL) • the direct expression of a broad scientific need • Midrange computing is a necessary adjunct to leadership-class facilities November 5, 2009

  3. Berkeley Lab Computing • Gap between desktop and National Centers • Midrange Computing Working Group 2001 • Cluster support program started in 2002 • Services for PI-owned clusters include: Pre purchase consulting; development of specs and RFP, facilities planning, installation and configuration, ongoing cluster support, user services consulting, cybersecurity, computer room colocation • Currently 32 clusters in production, over 1400 nodes, 6500 processor cores • Funding: Institution provides support for infrastructure costs, technical development. Researchers pay for cluster and incremental cost of support. November 5, 2009

  4. Cluster Support Phase II: Perceus Metacluster • All clusters interconnected into shared cluster infrastructure • Permits sharing of resources, storage • Global home file system • One ‘super master’ node, used to boot nodes across all clusters • multiple system images supported • One master job scheduler, submitting to all clusters • Simplifies provisioning new systems and ongoing support • Metacluster model made possible by Perceus software • successor to Warewulf (http://www.perceus.org) • can run jobs across clusters, recapturing stranded capacity. November 5, 2009

  5. November 5, 2009

  6. Laboratory-Wide Cluster - Drivers “Computation lets us understand everything we do.” – LBNL Acting Lab Director Paul Alivisatos 38% of scientists depend on cluster computing for research. 69% of scientists are interested in cycles on a Lab-owned cluster. • early-career scientists twice as likely to be ‘very interested’ than later-career peers Why do scientists at LBNL need midrange computing resources? • ‘on ramp’ activities in preparation for running at supercomputing centers (development, debugging, benchmarking, optimization) • scientific inquiry not connected with ‘on ramp’ activities November 5, 2009

  7. Laboratory-Wide Cluster “Lawrencium” • Overhead funded program • Capital equipment dollars shifted from business computing • Overhead funded staffing - 2 FTE • Production in Fall 2008 • General purpose Linux cluster suitable for a wide range of applications • 198-nodes, 1584 cores, DDR Infiniband interconnect • 40TB NFS home directory storage; 100TB Lustre parallel scratch • Commercial job scheduler and banking system • #500 on the Nov 2008 Top500 • Open to all LBNL PIs and collaborators on their project • Users are required to complete a survey when applying for accounts and later provide feedback on science results • No user allocations at this time. This has been successful to date. November 5, 2009

  8. Networking - LBLNet • Peer at 10GBE with ESNET • 10GbE at core. Moving to 10GbE to the buildings • Goal is sustained high speed data flows with cybersecurity • Network based IDS approach - traffic is innocent until proven guilty • Reactive firewall • Does not impede data flow. no stateful firewall. • Bro cluster allows us to scale our IDS to 10GBE November 5, 2009

  9. Communications and Governance • General announcements at IT council • Steering committees used for scientific computing • Small group of stakeholders, technical experts, decision makers • Helps to validate and communicate decisions • Accountability November 5, 2009

  10. Challenges • Funding (past) • Difficult for IT to shift funding from other areas of computing to support for science • Recharge can constrain adoption. Full cost recovery definitely will. • New Technology (ongoing) • Facilities (current) • Computer room is approaching capacity despite upgrades • Environmental Monitoring • Plenum in ceiling converted to hot air return • Tricks to boost underfloor pressure • Water cooled doors • Underway • DCIE measurement in process • Tower and heat exchanger replacement • Data Center container investigation November 5, 2009

  11. Next Steps • Opportunities presented by cloud computing • Amazon investigation earlier this year. Others ongoing • Latency sensitive applications ran poorly as expected • Performance dependent of specific use case • Data migration. Economics of storing vs moving • Certain LBNL factors favor costs for build instead of buy • Large storage and computation for data analysis • GPU investigation November 5, 2009

  12. Points of Collaboration • UC Berkeley HPCC • Recent high profile joint projects between UCB and LBNL encourages close collaboration • 25-30% of scientists have dual appointment • UC Berkeley proximity to LBNL facilitates the use of cluster services • University of California Shared Research Computing Services pilot (SRCS) • LBNL and SDSC joint pilot for the ten UC campuses • Two 272-node clusters located at UC Berkeley and SDSC • Shared computing is more cost-effective • Dedicated CENIC L3 connecting network for integration • Pilot consists of 24 research projects November 5, 2009

  13. November 5, 2009

More Related