1 / 10

Summary of OpenClovis Training

Summary of OpenClovis Training. Claude Saunders. Participants. Trainer: Ron Yorgason of OpenClovis Trainees From FNAL Linux Cluster Control and Monitoring Project Jim Kowalkowski Amitoj Singh Nirmal Seenu From FNAL ILC/NML and Instrumentation Kevin Krause Alexei Semenov From ANL

Download Presentation

Summary of OpenClovis Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary of OpenClovis Training Claude Saunders Global Design Effort

  2. Participants • Trainer: Ron Yorgason of OpenClovis • Trainees • From FNAL Linux Cluster Control and Monitoring Project • Jim Kowalkowski • Amitoj Singh • Nirmal Seenu • From FNAL ILC/NML and Instrumentation • Kevin Krause • Alexei Semenov • From ANL • Claude Saunders • Shifu Xu • We were trained on the not-yet-released new version of OpenClovis (was called 2.3, now called 3.0)

  3. Product Overview High Availability System Management SNMP(sub-)agents CLI Checkpointingservice HAmanagement Managedobjectrepository Chassismanagement Faultmanagement Groupmembershipservice Alarmmanagement Eventservice Nameservice Developer friendlyinfrastructure Communicationinfrastructure Remote procedurecalls Inter-Processcommunication GUI based IDE Code Generation Sys and binary Logging HA container Debug CLI Timers Basicinfrastructure Debug Support Memorymanagement OSabstraction Hardwareabstraction

  4. High AvailabilityTechniques to achieve it, and OpenClovis solutions • Offline Techniques • Engineering • Design • Discipline • Over design for spare capacity • Runtime Techniques: Infrastructure • Monitor the system • Recovery and repair • Services to offload application need to deal with failure • IDE automation results in fewer bugs and errors • MIB Import • Code Generation • Code base “hardened” in multiple environments • Professional design services • Platform Management • Alarm Management • Component Management • Availability Management Framework • Fault Management

  5. HA Modeling • SAF System Model made easy • 1:1 UML constructs • Wizards to automate and simplify the process • Pull down menus to customize properties • Validation tools to ensure correctness of model SAF HA Model

  6. AMF: SAF HA Model Example SG:SG1 Node X Node W Node U Node V SU: S2 SU: S1 C3 C1 C4 C2 SG:SG2 SU: S3 SU: S4 SU: S5 standby standby active standby active standby active active CSI:A1 CSI:A2 CSI:B1 CSI:C1 SI:A SI:C SI:B C7 C5 C6 Cluster

  7. High Availability Availability Management (2Nredundancy models) Checkpointing Service Asynchronous Collocated Fault Repair Cluster Membership Platform Management Alarm Management Provisioning Support Pre-provisioning HW proxies Chassis Management HPI MIB support Core Infrastructure Component Management Boot Management Component health monitoring Basic Infrastructure Debug and Logging support Transaction with two-phase commit and rollback Platforms Big-endian support Mixed-endian support Linux 2.6 based distributions SMP Itanium 64-bit support OpenClovis Release 2.2

  8. Overall Upgradeability support Integrated with Wind River tool chain Platforms SUN Netra ATCA Platform Radisys Promentum 60x0 ATCA chassis and other chassis Additional Linux 2.6 based distributions (WindRiver PNE Linux) Basic Infrastructure Intelligent memory management Support for binary log streams Runtime and Offline Log Viewers Communication TIPC migration High Availability Generalized Group Membership N+M redundancy model Node Management Infrastructure Arbitrarily nested managed objects (MOs) Node/blade independent MOs MO attribute access modes Transient MO attributes Run-time metadata retrieval System Manageability Integrated platform management SDK / OpenClovis IDE Improved work-flow support Tool integration via XML files Enhanced MIB import Usability improvements SNMP code auto-generation Project model templates Improved application code generation SAF API Generation OpenClovis Release 2.3 Features

  9. OpenClovis Release 3.0 – in planning • Beta in Q3 of 2008 • Platform • Full AMC support • MicroTCA support • Additional Processors (i.e. Cavium) • Multi-Chassis support • Solaris Support (Collaboration with Sun) • Middleware Manageability • Run-time model configuration upgrade • Run-time north bound middleware configuration • Upgrade Services • Currently in-process in SAF • IDE • Increase code generation coverage • Tighter build integration • Target deploy and debug capability

  10. Summary • We were duly impressed with the depth and breadth of OpenClovis, despite the fact that: • Ron the trainer was • A) Not very good at teaching • B) Only familiar with parts of OpenClovis • A good part of what we got from training was a result of concentrated self-study and discussion (ie. being locked in a room for 3 days). • Many questions remain. • Product behaved well. • One problem with multi-failover lab. • Scalability an open question. Hopefully Linux Cluster project can test these limits (1000+ nodes). • A good portion of what we learned was about what the SAF specifications mean in practice. • This knowledge should be transferable to other SAF implementations if needed.

More Related