1 / 18

ILC Controls: High Availability Software

ILC Controls: High Availability Software. Outline. Opening comments ILC software architecture refresher The HA stack Primary and management protocols HPI (Hardware Platform Interface) summary AIS (Application Interface Specification) summary Bottom-up, are these a good fit?

nan
Download Presentation

ILC Controls: High Availability Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ILC Controls: High Availability Software

  2. Outline • Opening comments • ILC software architecture refresher • The HA stack • Primary and management protocols • HPI (Hardware Platform Interface) summary • AIS (Application Interface Specification) summary • Bottom-up, are these a good fit? • HPI and HPI-ATCA • AIS • Conclusions • A proposed “stack” for ILC HA research • Tasks

  3. Opening Comments • Don’t build any critical path software infrastructure without access to source code • HA software is a hard problem • SAF specifications are an impressive unification of known techniques • SAF implementations won’t “solve” HA problem • You still have to determine what you want to do and encode it in the framework – this is where work lies • What are failures • How to identify failure • How to compensate (redundancy or reconfiguration or both) • How long for known reliable, SAF compliant products to come out? • Compare to time between OMG CORBA spec and good implementations… • Is resultant software complexity manageable? • Potential fix worse than the problem

  4. Architecture Refresher

  5. CPU1 I/O 1 I/O 2 CPU2 SM SAF and ILC Controls Architecture GUI Client Tier Report upwards AIS Cluster Membership Service CLM container Crashed middleware container: escalate object Services Tier (middleware) checkpoints Report upwards Hung task: escalate HPI Failed I/O card or power supply: fix locally (localization) Real-Time Tier Shelf Manager sensor

  6. Primary and Management Protocols • How do they interact? • Primary connection mgmt. informed by management protocol • Specific actions carried out over primary protocol based on info from management protocol State Info Level N+1 Primary Controls Protocol HA Management Protocol Level N

  7. HPI (Hardware Platform Interface) Summary • HPI subsumes IPMI(established), SNMP, Others Sessions Client access to events Domains manage Resources - RDR repository (SNMP OIDs) manage Entities - Physical components • HPI passes info as IPMI packets over RMCP • HPI-ATCA • Expose ATCA entities through HPI (hot swap LEDs, etc..)

  8. AIS (Application Interface Specification) Summary • C-code interface specification • No protocols or other language bindings given • AMF (Application Mangement Framework) – the tie that binds • Object lifecycle state diagrams (behavior) • Services • Message – similar to JMS, MQSeries, Tuxedo • Log, Notification, Events • Cluster Membership – redundant instances within a “group” • Checkpoint – save my state so standby can take over • Distributed Lock – basic need of distributed, coordinated system • IMMS – what is out there configured and deployed • LDAP-like DN (Distinguished Names) identify resources

  9. Bottom-up, Are these a good fit? • HPI and HPI-ATCA • Yes! – IPMI and SNMP implementations all gravitating to HPI • Interoperability very useful to us here • Unified view of hardware resources • Front-end CPU’s and I/O cards • Servers (database and application) • NADs (network attached devices) • AIS • Hard problem • Anyone promoting they’ve produced solid 100% compliant AIS product is probably exaggerating • C-code interface only so far • Not clear that components will be interoperable • Are we really going to be shopping for COTS control system middleware components?

  10. HA Middleware:The Contenders (SAF presentation dated 4/26/05) (note: not a good story…) • Commercial Cluster SW • Pro: Transparent to application; ISV support • Con: Failover too slow; Proprietary • FT OS Single System Image • Pro: Transparent • Con: Scalability; Very complex to implement • FT CORBA • Pro: Reasonably Transparent; Industry Standard • Con: Failover times; Heterogeneity; Management • Telco HA Middleware • Pro: Fast Fail-over; Extensible; Management • Con: Intrusive; Non-Intuitive Model

  11. FT-CORBA (fault tolerant)

  12. FT-CORBA • No existing CORBA-based control system is HA • Tango – uses open-source JacORB • ACS – uses open-source ORBacus • NIF uses Visibroker with custom connection management • No Commercial FT-CORBA ORB as of beginning of 2004 • Spec out since 2001 – not a good sign • There exists very little open-source FT-CORBA (mostly academic) • GroupPAC • OCI (Object Computing Inc.) TAO

  13. CORBA Alternative - ZeroC ICE • ICE (Internet Communications Engine) www.zeroc.com • High performance middleware • Open-Source GPL licensed • Multiple language bindings (C++, Java, PHP, Python, C# so far) • Used by Hewlett Packard and FCS (Future Combat Systems) • Very much like CORBA, but addresses substantial complexity and performance issues with CORBA (not designed by committee) • HA Features • Has explicit support for storing object state to db • Coarse-grain failover only so far (server to server) • Could possibly even use this to unify RTP (Real Time Protocol) and DOP (Distributed Object Protocol)

  14. Options from world of Java Web Development • JBoss • Open source middleware container • Lots of sophisticated, solid features for redundant deployment • JINI • Java RMI service lookup/discovery protocol • Very useful for connection management • Spring Framework • Lightweight middleware container • Alternative to EJB 2.0 • EJB 3.0 • Response to Spring and flaws in EJB 2.0

  15. Middleware HA – my conclusions • This is a hard problem to solve • It’s OK if this part of our efforts here take longer to solidify • OS based clustering too slow and complex • SAF AIS specification is great on paper, but… • No implementations yet that offer full compliance • No bindings other than C so far as I can tell • FT-CORBA not looking good • Proprietary Telco solutions – need I say more • Success stories seem to use non-HA standards to build HA system • Use set of standards that matches your culture • Ie. Java (JINI/RMI) or non FT-CORBA • Build needed HA behavior custom to your requirements • Add in checkpointing, active/standby, connection mgmt, etc.

  16. Middleware HA – conclusions (2) • My inclination is to look at ICE and/or standard CORBA • Build basic HA features following model of SAF AIS where reasonable • Need more knowledge to even evaluate SAF AIS compliant products • Wait for commercial and open-source implementations of AIS… • In the mean-time, build a la carte from known stable frameworks

  17. Proposed Stack for ILC HA Research Java GUI Applications ICE protocol • ICE Middleware Tier • Examine suitability • build prototype HA features IPMI V1.5 over RMCP Channel Access • Arrow ATCA Starter Kit • Pigeon Point shelf manager • need SM SDK ? • Dual (Quad) X86 processors • we need board developers kit CPU1 COTS Custom CPU2 SM Run EPICS iocCore on dual CPU’s

  18. Tasks • Study and document points of failure (look at FNAL project…) • How to identify failure • How to recover (redundancy and/or reconfiguration) • Port EPICS iocCore to ATCA CPU’s • RTOS ? • Explore redundancy and checkpointing within iocCore • Establish middleware server • Explore HA feature development within ICE • RMCP to ATCA shelf manager • Channel Access to ATCA CPU’s • Look at custom hardware development in ATCA, including potential associated additions to shelf manager software

More Related