1 / 49

Mission

Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration. Mission. Provide advanced computing resources required by a major research university Software Hardware Training Support. User Base. 40 Research groups

jovan
Download Presentation

Mission

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research ComputingUniversity Of South FloridaProviding Advanced Computing Resources for Research and InstructionthroughCollaboration

  2. Mission Provide advanced computing resources required by a major research university Software Hardware Training Support

  3. User Base 40 Research groups 6 Colleges 100 faculty 300 students

  4. Hardware System was build on the condominium model and consists of 300 Nodes 2400 Processors University provides infrastructure and some computational resources Faculty funding provides bulk of computational resources

  5. Software Over 50 scientific codes Installation Integration Upgrades Licensing

  6. Support Personnel Provide all systems administration Software support One-on-one consulting System efficiency improvements Users are no longer just the traditional “number crunchers

  7. Current Projects Consolidating the last standalone cluster (of appreciable size) Advanced Visualization Center Group of 19 Faculty applied for funding Personnel Training Large Resolution 3D display

  8. Current Projects New computational resources Approximately 100 nodes GPU resources Upgrade parallel file system Virtual Clusters HPC for the other 90 % FACC

  9. Florida State University's Shared HPC Building and Maintaining Sustainable Research Computing at FSU

  10. Shared-FSU HPC Mission • Support multidisciplinary research • Provide a general access computing platform • Encourage cost sharing by departments with dedicated computing needs • Provide a broad base of support and training opportunities

  11. Turn-key Research SolutionParticipation is Voluntary • University provides staffing • University provides general infrastructure • Network fabrics • Racks • Power/Cooling • Additional buy-in incentives • Leverage better pricing as a group • Matching funds • Offer highly flexible buy-in options • Hardware purchase only • Short-term Service Level Agreements • Long-term Service Level Agreements • Shoot for 50% of hardware costs covered by Buy-in

  12. Research Support @ FSU • 500 plus users • 33 Academic Units • 5 Colleges

  13. HPC Owner Groups • 2007 • Department of Scientific Computing • Center for Ocean-Atmosphere Prediction Studies • Department of Meteorology • 2008 • Gunzburger Group (Applied Mathematics) • Taylor Group (Structural Biology) • Department of Scientific Computing • Kostov Group (Chemical & Biomedical Engineering) • 2009 • Department of Physics (HEP, Nuclear, etc.) • Institute of Molecular Biophysics • Bruschweiler Group (National High Magnetic Field Laboratory) • Center for Ocean-Atmosphere Prediction Studies (with the Department of Oceanography) • Torrey Pines Institute of Molecular Studies • 2010 • Chella Group (Chemical Engineering) • Torrey Pines Institute of Molecular Studies • Yang Group (Institute of Molecular Biophysics) • Meteorology Department • Bruschweiler Group • Fajer Group (Institute of Molecular Biophysics) • Bass Group (Biology)

  14. Research Support @ FSU • Publications • Macromolecules • Bioinformatics • Systematic Biology • Journal of Biogeography • Journal of Applied Remote Sensing • Journal of Chemical Theory and Computation • Physical Review Letters • Journal of Physical Chemistry • Proceeding of the National Academy of Science • Biophysical Journal • Journal Chemical Theory Computation • Journal: J. Phys. Chem. • PLoS Pathogens • Journal of Virology • Journal of the American Chemical Society • The Journal of Chemical Physics • PLoS Biology • Ocean Modeling • Journal of Computer-Aided Molecular Design

  15. FSU’s Shared-HPCStage 1: Infiniband Connected Cluster Sliger Data Center Shared-HPC pfs

  16. Single and Multiprocessor UsageYear 1

  17. FSU’s Shared-HPCStage 2: Alternative Backfilling DSL Building Condor Sliger Data Center Shared-HPC pfs

  18. Backfilling Single Proc Jobs on Non-HPC Resources Using Condor

  19. Condor Usage • ~1000 processor cores available for single processor computations • 2,573,490 processor hours used since Condor was made available to all HPC users in September • Seven users have been using Condor from HPC • Dominate users are Evolutionary Biology, Molecular Dynamics, and Statistics (same users that were submitting numerous single proc. jobs) • Two workshop introducing it to HPC users

  20. Single vs. Multi-processor JobsYear 2

  21. Single vs. Multi-processor JobsYear 3

  22. FSU’s Shared-HPCStage 3: Scalable SMP DSL Building Condor Sliger Data Center Shared-HPC pfs SMP

  23. FSU’s Shared-HPCStage 3: Scalable SMP • One MOAB Queue for SMP or very large memory jobs • Three “nodes” • M905 blade with 16 cores and 64GB mem • M905 blade with 24 cores and 64GB mem • 3Leaf system with up to 132 cores and 528 GB mem

  24. DSL Building Condor Sliger Data Center Shared-HPC pfs SMP DSL Data Center 2° fs Vis

  25. Interactive ClusterFunctions • Facilitates data exploration • Provides venue for software not well suited for a batch scheduled environment • (e.g., some MatLab, VMD, R, Python, etc.) • Provides access to hardware not typically found on standard desktops/laptops/mobile devises (e.g. lots of memory, high-end GPUs) • Provides licensing and configuration support for software applications and libraries

  26. Interactive ClusterHardware Layout • 8 high-end CPU based host nodes • Multi-core Intel or AMD processors • 4 to 8 GB of memory per core • 16X PCIe connectivity • QDR IB connectivity to Luster storage • IP (read-only) connectivity to Panasas • 10 Gbps connectivity to campus network backbone • One C410x external PCI chassis • Compact • IPMI management • Supports up to 16 NVIDIATesla M2050 • Up to 16.48 teraflops

  27. DSL Building Condor Sliger Data Center Shared-HPC pfs SMP DSL Data Center 2° fs Vis Db.Web

  28. Web/Database HardwareFunction • Facilitates creation of Data analysis Pipelines/Workflows • Favored by external funding agencies • Demonstrated cohesive Cyberinfrastructure • Fits well into required Data Management Plans (NSF) • Intended to facilitate access to data on Secondary storage or cycles on owner share of HPC • Basic Software Install, no development support • Bare Metal or VM

  29. Web/Database HardwareExamples

  30. Web/Database HardwareExamples

  31. FSU Research CI HTC HPC DB and Web 2° Storage 1° storage Vis and interactive SMP

  32. Florida State University's Shared HPC • Universities are by design multifaceted and lack a singular focus of support • Local HPC resources should also be multifaceted and have a broad basis of support

  33. University of Florida HPC Center HPC Summit

  34. Short history Started in 2003 2004 Phase I: CLAS – Avery – OIT 2005 Phase IIb: COE – 9 investors 2007 Phase IIb: COE – 3 investors 2009 Phase III: DSR – 17 investors - ICBR - IFAS 2011 Phase IV: 22 investors HPC Summit

  35. Budget Total budget 2003-3004 $0.7 M 2004-2005 $1.8 M 2005-2006 $0.3 M 2006-2007 $1.2 M 2007-2008 $1.6 M 2008-2009 $0.4 M 2009-2010 $0.9 M HPC Summit

  36. Hardware 4,500 cores 500 TB storage InfiniBand connected In three machine rooms Connected by 20 Gbit/sec Campus Research Network HPC Summit

  37. System software RedHat Enterprise Linux through free CentOS distribution upgrade once per year Lustre file system mounted on all nodes Scratch only Provide backup through CNS service Requires separate agreement between researcher and CNS HPC Summit

  38. Other software Moab scheduler (commercial license) Intel compilers (commercial license) Numerous applications Open and commercial HPC Summit

  39. Operation Shared cluster some hosted systems 300 users 90% - 95% utilization HPC Summit

  40. Investor Model Normalized Computing Unit $400 per NCU Is one core In fully functional system (RAM, disk, shared file system) For 5 years HPC Summit

  41. Investor Model Optional Storage Unit $140 per OSU 1 TB of file storage (RAID) on one of a few global parallel file systems (Lustre) For 1 year HPC Summit

  42. Other options Hosted system Buy all hardware, we operate No sharing Pay as you go Agree to pay monthly bill Equivalent (almost) to $400 NCU prorated on a monthly basis Or rates are 0.009 cents per hour Cheaper than Amazon Elastic Cloud HPC Summit

  43. www.ccs.miami.edu

  44. Mission Statement • UM CCS is establishing nationally and internationally recognized research programs, focusing on those of an interdisciplinary nature, and actively engaging in computational research to solve the complex technological problems of modern society. We provide a framework for promoting collaborative and multidisciplinary activities across the University and beyond

  45. CCS overview • Started in June 2007 • Faculty Senate approval in 2008 • Four Founding Schools: A&S, CoE, RSMAS, Medical • Offices in all Campus • ~30 FTEs • Data Center at the NAP of Americas

  46. UM CCS Research Programs and Cores Physical Science & Engineering Data Mining Computational Biology & Bioinformatics Visualization Computational Chemistry Social Systems Informatics High Performance Computing Software Engineering

  47. Over 1,000 UM users • 5,200 cores of Linux Based Cluster • 1,500 cores of Power-based Cluster • ~2.0 PT of Storage • 4.0 PT of Back-up • More at: • http://www.youtube.com/watch?v=JgUNBRJHrC4 • www.ccs.miami.edu Quick Facts

  48. High Performance Computing • UM Wide Resource Provides Academic Community & Research Partners with Comprehensive HPC Resources: • Hardware & Scientific Software Infrastructure • Expertise in Designing & Implementing HPC Solutions • Designing & Porting Algorithms & Programs to Parallel Computing Models • Open Access of compute processing (first come serve) • Peer Review for large projects – Allocation Committee • Cost Center for priority access • HPC services • Storage Cloud • Visualization and Data Analysis Cloud • Processing Cloud

More Related