Download
ibm platform lsf industry leading workload management n.
Skip this Video
Loading SlideShow in 5 Seconds..
IBM Platform LSF Industry-leading workload management PowerPoint Presentation
Download Presentation
IBM Platform LSF Industry-leading workload management

IBM Platform LSF Industry-leading workload management

531 Views Download Presentation
Download Presentation

IBM Platform LSF Industry-leading workload management

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. IBM Platform LSFIndustry-leading workload management

  2. IBM Platform ComputingLeader in cluster, grid and HPC cloud management software • Acquired by IBM in 2012 as part of mainstream Technical Computing strategy • 20 year history delivering leading workload and resource management software for high-performance application environments • 2500+ global customers including 23 of 30 largest enterprises • Market leading scheduling engine with high-performance, mission-critical reliability and extreme scalability • Comprehensive capability footprint from ready-to-deploy complete cluster systems to large global grids • Heterogeneous systems support • Large ISV and global partner ecosystem • Global professional services, training and support coverage De facto Standard for Commercial HPC 60% of top Financial Services Over 5 MM CPUs under management

  3. Businesses Need to Overcome Infrastructure Limitations To maximize the value of compute and data-intensive applications Future Today • Application • Examples • Simulation • Analysis • Design • Big data Application A (Packaged ISV) Application B (Custom, MPI) Application C (Big Data) Application • Benefits • High utilization • Throughput • Performance • Prioritization • Reduced cost Platform Computing • IT constrained • Long wait times • Low utilization • IT Sprawl • Clusters • Grid • HPC Cloud • Repeated for many applications and groups Faster time to results Use fewer resources

  4. IBM Platform Computing Accelerates Business Results For technical computing and analytics distributed computing environments • Batch and highly parallelized • Policy & resource-aware scheduling • Service level agreements • Automation / workflow Optimizes Workload Management • Compute & Data intensive apps • Heterogeneous resources • Physical, virtual, cloud • Easy user access AggregatesResource Pools • Multiple user groups, sites • Multiple applications and workloads • Governance (intelligent policies) • Administration/ Reporting / Analytics Delivers Shared Services • Workload-driven dynamic clusters • Bursting and “in the cloud” • Enhanced self-service / on-demand • Multi-hypervisor and multi-boot Transforms Static Infrastructure to Dynamic 4

  5. IBM Platform Computing Offerings Powerful. Comprehensive. Intuitive. Technical Computing -High Performance Computing Platform LSF Family Scalable, comprehensive workload management suite for demanding, mission-critical heterogeneous environments Platform HPC Simplified, integrated, purpose-built HPC management software bundled with systems Analytics Infrastructure Platform Symphony Family • High-throughput, low-latency compute and data intensive analytics applications • Highest performance & utilization • Complex Computations (i.e., risk) • Big Data Analytics via MapReduce • Extract Transform Load (ETL) Flexible Clusters Platform Cluster Manager Provisioning and management of HPC clusters, including self-service creation and optimization of heterogeneous HPC clusters by multiple user groups

  6. IBM Platform LSF Powerful, comprehensive high-performance technical computing workload management for demanding, distributed environments.

  7. Common HPC Customer Challenges Faster Time to Results Simplify Management Boost Productivity Maximize Efficiency

  8. Common HPC Customer Challenges

  9. IBM Platform LSF Product Family

  10. Electronics Financial Services Industrial Mfg. Oil & Gas Gov & Edu Life Sciences The Choice of Industry Leaders • AMD • ARM • Broadcom • Cisco • Cypress • Freescale • Infineon • Hitachi • Nvidia • PMC Sierra • Qualcomm • Samsung • SanDisk • Sony • ST Micro • TI • Toshiba • Aviva • BNP • Citigroup • Fortis • HSBC • JPMC • LBBW • International Monetary Fund • Mass Mutual • Morgan Stanley • MUFG • Nomura • Prudential • Société Générale • Unicredit • Airbus • BAE Systems • Boeing • Bombardier • Bridgestone • Ericsson • Honda • General Electric • General Motors • Goodrich • Lockheed Martin • Northrop Grumman • Pratt & Whitney • RedBull Racing • Toyota • TaylorMade • Volkswagen • Agip • BP • British Gas • ConocoPhillips • EMGS • Gaz de France • Hess • Kuwait Oil • Nexen • Petrobras • PetroChina • StatoilHydro • Suncor • Total • Woodside • Atomic Energy Canada • Broad Institute • CERN • DE Shaw • Harvard Medical • Harvard Business • NC State • NEON • Sanger Institute • St Jude Children • Stanford • Tufts University • U of Miami • U. of Oklahoma • UNC • Washington U. • Abbott Labs • Amgen • AppliedBio/LifeTech • AstraZeneca • Covance • Covidien • DuPont • Forest Labs • GenomeQuest • Genentech • GlaxoSmithKline • Health Dialog • Monsanto • Partners Health • Pfizer • Pioneer Hi-Bred • Sanofi-Aventis Other Industries Walmart Bell Canada AT&T IRI Telecom Italia Telefonica

  11. IBM Platform LSF: The HPC Workload Management Standard

  12. Integrated Product Suite • Minimize internal integration & development costs • Reduce operational risk & risk of obsolescence • Deploy solutions more quickly & efficiently • Focus on core competencies

  13. Intelligent Workload Scheduling to Clusters, Grids and Clouds Regardless where the work is submitted from, the IBM Platform LSF workload manager schedules and dispatches it to the most eligible compute node Workload Management Job Submission Application Execution Model Creation 1 Open/Community Source Codes Scripts 2 HPC cluster Job Scheduling Master Integrated App Client Side Desktops 3 Remote clusters Web Portal

  14. Advanced Features • Heterogeneous platform support • Supports traditional as well as private & public cloud infrastructures • Comprehensive, intelligent scheduling policies • GPU-aware scheduling • Advanced self-management • Extensible security • Always-on availability • Delegated administration • Resizable jobs • Live cluster re-configuration • Complete API, web-services interfaces • Ease-of use features • much more …

  15. Intelligent, Policy-driven Scheduling Features • Fairshare scheduling • Topology & core-aware scheduling • Preemption • Backfill scheduling • Resource reservations • Serial/Parallel controls • Advanced reservation • Job starvation • License scheduling • SLA based scheduling • Absolute priority scheduling • Checkpoint / resume • Job arrays • GPU-aware scheduling • Plug-in schedulers Advanced scheduling features are the key to maximizing efficiency and productivity while minimizing cost.

  16. Multi-Dimensional Scalability * Scalability is dependent on a variety of factors, including type of workload (e.g. parallel vs. sequential), and product edition

  17. Product Functionality Scales to Meet Your Evolving Requirements Advanced Edition Architecture to support extreme scalability and throughput 100k+ cores & concurrent jobs Standard Edition Full featured version supporting a wide range of scheduling policies and job management functions. Express Edition Entry level product for those with simple scheduling requirements and small clusters

  18. Cluster A Cluster B Grid Cluster E Cluster C Cluster D Manage your assets as a single, virtual computer from anywhere in the world Global Scalability • Transparently combine multiple Platform LSF clusters into a single grid without costly & complex meta-scheduler implementations. • Flexible, hierarchical, peer-to-peer sharing models • Local administrators retain control, while sharing assets

  19. Tangible Business Value • Do the same work in less time • Do the same work with less hardware • Do more work on the same hardware Reduced cost Better productivity Faster time to market Not just in theory, but in practice! Number of simulations Number of verifications Number of analyses With Platform LSF Productivity Without LSF Time to Market

  20. Users need: Intuitive interfaces Remote application access Control over jobs & data Access to resource when needed Fast job submission & turnaround Reporting and statistics Administrators need: Sophisticated management controls Better cluster automation Better diagnosability Tools to reduce support requests Tools to improve utilization and service levels while reducing cost Balancing the Needs of Users and Administrators USERS ADMINISTRATORS

  21. Fully Certified and Supported • Commercially supported by IBM Platform Computing • Two decades of HPC experience • Significant investment in certification & validation • Professional Services delivers speedy installation and application Integration • Training courses available to help you get full understanding and value of the HPC software • Support “from the source” • 24x7 global support • eSupport • Automated subscriptions for bulletins, patches & updates • Software upgrade available with paid professional support & maintenance service • Hardware and software support from the same source • IBM – the Technical Computing market leader

  22. Thank You

  23. IBM Platform LSF 9 Family • IBM Platform Application Center • IBM Platform Process Manager • IBM Platform License Scheduler • IBM Platform Session Scheduler • IBM Platform Dynamic Cluster • IBM Platform RTM • IBM Platform Analytics

  24. IBM Platform Application Center

  25. IBM Platform Application Center • Access resources from any browser • Monitor cluster health • Integrated job submission & management • Customization of application forms & pages • Drive jobs, arrays & workflows from application forms • Fine grained access control • Support for 2D/3D remote visualization • Integrated with: • Platform License Scheduler • Platform Process Manager • Platform Analytics The most powerful and comprehensive web-based portal for HPC application job submission and management

  26. Intuitive Application Interfaces Guided, self-documenting interfaces boost productivity, reduce training and lower support costs. Integrated console boosts productivity enabling access to multiple interactive applications through the browser.

  27. Monitor and Manage Jobs and Data Users can monitor and manage jobs from any device with a browser. Upload local data or access sharable server side repositories and improve collaboration. Proactive notification of job status changes makes application users more efficient.

  28. Customizable Interface Builder Extensible set of built-in application integrations selectively sharable by user or group. Intuitive drag-and-drop interface builder enables forms to be easily tailored. Use native field types or plug-in your own, with a simple, flexible scripting interface.

  29. Role-based Access Controls Fine-grained access controls govern what users and groups can do on the cluster.

  30. Easily Integrate Custom Applications Extend the Platform Application Center with sophisticated custom applications. Reliably trigger workflows behind the scenes to automate processes or manage efficient data movement. Simulation results are presented through the portal with files available in one or more repositories.

  31. Integrated Reporting Extensive library of built-in, relevant reports related to resource usage and jobs. Access reporting and analysis functions directly through the Platform Application Center.

  32. Summary Benefits for Users • Increase productivity • Enable collaboration • Reduce helpdesk calls • Minimize user-errors Benefits for Administrators • Simplify application integration • Powerful HPC management • Reduce training and support • Improve resource security • Scalable for large HPC datacenters

  33. IBM Platform Process Manager

  34. IBM Platform Process Manager Sophisticated flow logic, sub-flows, alarms and scriptable interfaces improve process reliability & dramatically reduce administrator workload.

  35. A Powerful, Flexible Interface for Designing & Executing HPC Workflows • Enables automation of workflows over a distributed infrastructure. • Integrated environment to design, publish and manage flows. • Reliable with rich conditional logic and error handling – inspect variables and roll-back to point of resume. • Modular management: sub-flows dynamically updated, sub-flow version control. • Built for IBM Platform LSF environments. • Best practices built-in.

  36. Key Components Platform Process Manager Flow Editor • Intuitive drag-and-drop interface • Creates self-documenting flows • Support for sub-flows, job arrays • Rich error-handling / retry capability • Save workflows in XML format • Publish flows directly to Flow Manager Platform Process Manager Flow Manager • Manages multiple flows for multiple users and groups simultaneously • Monitor workflow execution graphically • Trigger flows automatically through calendar events, the flow manager or the command line.

  37. Key Benefits • Fully visual environment, no programming required • Enables capture of repeatable best-practices • Quickly design and deploy complex workflows • Run HPC processes faster & more reliably • Scale seamlessly • Dramatically reduce administration

  38. IBM Platform License Scheduler

  39. IBM Platform License Scheduler • Share application licenses according to policy • Ensure licenses are allocated to critical projects • Establish cross-functional license sharing policies • Enforce license ownership with optional pre-emption • Workloads “pend” while awaiting license resources • Support for “WAN-able” licenses • Tightly integrated with FlexNet Manager • Integrated with Platform RTM and Platform Analytics for sophisticated analysis • Correlate license use with users, projects and design centers Intelligent, policy-driven license sharing for IBM Platform LSF environments

  40. Flex Server C Cluster A: target - 70% of VCS Cluster B: target - 30% of VCS Allocation seeks to meet this target policy. Allocation can flex when a cluster is not using its allocation “WAN-able” Policy-driven License Optimization HSIM VCS NCSIM Specman Calibre ModelSim Flex Server A Flex Server B Allocate licenses between clusters or based on local sharing policies. License allocations flex based on supply & demand. Calibre VCS NCSIM ModelSim HSIM VCS Specman ModelSim Cluster A Cluster B Project Beta Project Alpha Project Charlie Project Beta Intelligent, policy-driven license sharing for Platform LSF environments

  41. Multiple Scheduling Models Cluster 1 Project 1 Cluster mode Across clusters Cluster 2 License token allocation Static or dynamic, independent of scheduling cycle Project 2 Cluster 1 Project 1 Project 2 Project mode Within clusters License token allocation Based on project demand each scheduling cycle Cluster 2 Project 1 Project 2

  42. Robust, Easy-to-use Monitoring Monitor license use by feature, job, project or cluster. Monitor license availability across multiple FlexLM servers and service domains.. Graphically monitor jobs that check out multiple license features, visually identify bottlenecks.. Monitor license usage in real-time (including non-LSF usage)

  43. Key Benefits • Significantly improved license utilization • Designed for extensibility • Improve service levels • Increase user productivity • Improve visibility to license usage • Get maximum return from software license investments Intelligent, policy-driven license sharing for IBM Platform LSF environments

  44. IBM Platform Session Scheduler

  45. Submit large volumes of jobs as a single job Higher throughput / lower latency Superior management of related tasks Supports > 50,000 tasks / per user Two-tier scheduling – preserves existing job semantics Particularly beneficial for large volume of short duration jobs Scheduling Large Job Volumes LSF Scheduler • # bsub –n 100 ssched –task infile • Syntax similar to job arrays • Run extremely large numbers of tasks without impacting the scheduler • Support up to 1,000 simultaneous session schedulers ssched ssched

  46. IBM Platform Dynamic Cluster

  47. IBM Platform Dynamic Cluster Manages & allocates infrastructure dynamically • Workload driven dynamic node re-provisioning • Dynamically switch nodes between physical & virtual machines • Automated VM live migration and checkpoint / restart • Flexible policy controls • Smart performance controls • Automated pending job requirement based provisioning of VM templates

  48. IBM Platform Dynamic Cluster Static LSF Environment Dynamic Cluster Enabled LSF 9.x LSF LSF DC Physical Provisioning Virtual Machines Physical Machines Virtual Machines Physical Machines • Resources are static in number and type • Jobs are locked onto hardware once running • VM definitions are static and immobile • Overall utilization & throughput is not optimized • Resources are dynamically allocated for demand • Jobs can float between hardware resources • VMs start, stop, move, expand, shrink automatically • Overall utilization & throughput is optimized

  49. Platform Dynamic Cluster Complements Platform Cluster Manager IBM Platform Dynamic Cluster + Platform Cluster Manager = Dynamic, Flexible Cloud Environment

  50. Transform static, low utilization clusters into highly dynamic, shared HPC Cloud resources Optimize resource utilization Maximize throughput, reduce time to results Eliminate costly, inflexible silos Increase reliability of critical workload Maintain maximum performance Improve user and administrator productivity Increase automation, decrease manual effort IBM Platform Dynamic Cluster 50