1 / 43

Middleware renovation – technical overview AND plans for migration CMW@GSI 25 th april 201 3

Middleware renovation – technical overview AND plans for migration CMW@GSI 25 th april 201 3. Wojciech Sliwinski BE-CO-IN for the Middleware team: Felix Ehm, Kris Kostro, Joel Lauener, Radoslaw Orecki, Ilia Yastrebov , [Andrzej Dworak]

walter
Download Presentation

Middleware renovation – technical overview AND plans for migration CMW@GSI 25 th april 201 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Middleware renovation– technicaloverview ANDplans for migrationCMW@GSI25thapril 2013 Wojciech SliwinskiBE-CO-IN for the Middleware team: • Felix Ehm, Kris Kostro, Joel Lauener, • Radoslaw Orecki, Ilia Yastrebov, [Andrzej Dworak] • Special thanks to: Vito Baggiolini and Pierre Charrue

  2. Agenda • Context & Motivation for Renovation • Middleware Reviewprocess • Technical evaluation of the transport layer • Changes in the MW Architecture in LS1 • MW Upgrade milestones in 2013 • Risk assessment and mitigation • Conclusions Wojciech Sliwinski, Middleware Renovation

  3. Agenda Context & Motivation for Renovation Wojciech Sliwinski, Middleware Renovation

  4. MW Mandate & Scope • Standard set of MW solutions • Centrally managed services • Track & optimize runtime parameters • Well defined feedback channel for users • Provide support & follow-up issues • Scope: CERN Accelerator Complex • Operational 24*7*365 • Must be Reliable & High Quality • 73’000 HW devices, 3’150 servers • In all Eqp. groups (4dpts: BE, EN, GS, TE) Wojciech Sliwinski, Middleware Renovation

  5. CMW inthe Controls System CMW client (C++/Java) JAPC GUIs, LabView, RADE JMS client (Java) GUIs CMW client (Java) JAPC Logging, LSA, InCA, SIS CMW client/server (C++/Java) Proxy, DIP, AlarmMon, AQ JMS client (Java) Servers: Logging, InCA, SIS CMW server (C++) PVSS (Cryo, Vacuum) CMW server (C++) FESA, FGC, GM Wojciech Sliwinski, Middleware Renovation

  6. Motivations for MW Renovation • Current CORBA-based CMW-RDA • Integrated in the Control system • Used to operate all CERN accelerators • Provides widely accepted Device/Property model • > 10 years old • Why to review & upgrade MW ? • CORBA was choosen15 years ago • Technical limitations of CORBA-based transport • Functional limitations of the current CMW-RDA • Codebase with long history difficult to maintain, needs architecture review • Major issue of long-term support & future evolution • Evolution of technology over last 10 years: HW, OS, middleware, 3rd party libraries • Human factor  less & less CORBA expertise on the market Wojciech Sliwinski, Middleware Renovation

  7. Technical limitations of CORBA transport • Became legacy, not actively supported  maintenance issue • Shrinkingcommunity, slowresponsetime • omniORB (C++) – 1 developer/maintainer, lastrelease mid-2011 • JacORB (Java) – fewdevelopers, small community • Major technical limitations • Lack of fullyasynchronousprocessing channel • Blockingcommunication infamous JacORB blockingissue • Lack of low-level control of IO resources (sockets, requestqueues) • Development issues • Difficult to extend the wireprotocol Backward compatibility issue • Complex, error prone API • Heavy in memoryusage Wojciech Sliwinski, Middleware Renovation

  8. Summary: Whychange CORBA? • CORBA was choosen15 years ago • Not activelymaintained big risk for the MW project • Bettersolutionsexist on the market • Invest in futuresolutionratherthanmaintainingold one Wojciech Sliwinski, Middleware Renovation

  9. Functionallimitations of CMW-RDA • Severalpendingoperational issues • Difficult (orhardlypossible) to resolve with currentlibrary • Any major changeverydifficult to introduce • Technical Stops & Xmasbreakstooshort for massivedeployment • High risk Major impact on front-end frameworks and applications • No protection against ’slow/bad’ client applications • Misbehavingapplicationmaydestabilise front-end server • Affectsreliability of the subscription channel • Workaround: introduction of Proxy • Poor scalability when many clients subscribed • Stabilityissuesobservedwhen >200 clientssubscribed (even for Proxy) • Threading model doesn’tscalewell with manyclients • Missing support for priority clients (e.g. SIS, PM, InCA, Logging) • Non-criticalclients (e.g. GUIs) have the same communicationpriority • + others… Wojciech Sliwinski, Middleware Renovation

  10. Summary: Whychange CMW-RDA? • With current CORBA-basedmiddlewarewe can’tsolvethe pendingoperationalissues • We can’tprovidebetterscalability & reliability • CMW-RDA isdifficult to evolve& extend Wojciech Sliwinski, Middleware Renovation

  11. Agenda Middleware Reviewprocess Wojciech Sliwinski, Middleware Renovation

  12. Middleware Renovationprocess • MW Renovation = MW Review + MW Upgrade • MW Review aims to provide the most appropriate technical solution satisfying theuser requirements • MW Upgrade establishes the plan & strategy for introductionof the new MW • Objective: LS1 the uniqueopportunityfor the major MW upgrade • Middleware Review Process • Gathering of users feedback and requirements (2010-11) • Review of communication and serialization libraries (2011-12) • Prototyping using selected communication products (2012) • Design & impl. of new RDA3: Data, Client & Server (2012-13) • Testing & validation of core MW infrastructure (summer’13) • Upgrade of all dependent MW libraries & services (2013-14) • JAPC, Directory Service, Proxy, DIP Gateway Wojciech Sliwinski, Middleware Renovation

  13. Review of usersrequirements • 2010-11 – series of interviews with major users • Lars Jensen, Stephen Jackson (BI) • Andy Butterworth, Frode Weierud, Roman Sorokoletov (RF) • Brice Copy, Clara Gaspar (DIP, DIM) • Frederic Bernard,Herve Milcent, Alexander Egorov (PVSS) • Alexey Dubrovskiy (CTF), Kris Kostro (DIP gateways) • Marine Gourber-Pace, Nicolas Hoibian (Logging) • Nicolas De Metz-Noblat (Front-Ends), Alastair Bland (Infrastructure) • Michel Arruat (FESA), Stephen Page (FGC) • Niall Stapley, Mark Buttner, Marek Misiowiec (LASER & DIAMON) • Nicolas Magnin, Christophe Chanavat (ABT) • Stephane Deghaye, Jakub Wozniak (InCA, SIS) • Vito Baggiolini, Roman Gorbonosov (JAPC & DA systems) • + regularfeedback from OP • + internalteam input • http://wikis/display/MW/Interviews+with+Experts Wojciech Sliwinski, Middleware Renovation

  14. New RDA3: Acceptedrequirements New requirement • General • Java & C++ API, Win (64-bit) & Linux (SLC5 32-bit & SLC6 64-bit) • Accelerator Device Model (i.e. Device/Property) • Get, Set, Async-Get, Async-Set, Subscribe • Early detection of communication failures • Improve error reporting in all the layers: client, server, gateways • Admin interface & runtime diagnostics & statistics • Data support • Data object: primitives, n-dim arrays, data structures • Subscription mechanism • Subscription behaviour the same regardless condition of the server (active, down) • Several client subscription policies (default: continuous) • Providesubscription notification ordering • First-Update enforced via CMW on server-side • Providecallback to front-end framework for the server-side Get • Drop support for on-change flag • Standardise use of subscription filtersand updateflags (e.g. immediate update) • Addheader for acquiredDatacommonmetadata (e.g. acq. stamp, cyclename) • Allloss of data (droppedupdates) must be notified to clients Wojciech Sliwinski, Middleware Renovation

  15. New RDA3: Acceptedrequirements New requirement • Client side • RDA3 client API connects with both: RDA2 (old) & RDA3 (new) servers • Efficientmechanism for: connection, disconnection & reconnection • Must be able to recover from anyinterruption of communication with the server • Server restarts, IP addresschange, rename/move of a device to anotherserver • Improvedsemantics of ArrayCalls, i.e. handling of individualparameters • Enhanced diagnostics & collection of statistics • Server side • Policies for discarding notifications, i.e. deal with overflowsand ’badclients’ • Instrument with counters & timingsallowing to diagnose the notificationsdelivery • Prioritisation of Get/Setrequests for high-priorityclients • Server-side subscription tree fully managed by CMW • Server does not need to manageclientsubscriptionsanymore • Manage the clientconnections, e.g. forceddisconnect of a client • Client lifetime callbacks (i.e. connected, disconnected) Wojciech Sliwinski, Middleware Renovation

  16. New RDA3: Acceptedrequirements New requirement • Server side (cont.) • Client discovery for the diagnostics purposes (i.e. connectedclients with payload) • Enhanced diagnostics & collection of statistics • Ongoingdiscussions (not acceptedyet) • Prioritisation of subscription notifications for high-priorityclients • Technical notes • Invest in asynchronous & non-blocking communication • Prefer0-copy & lock-free data structures, message queues • http://wikis/display/MW/Design+of+New+RDA Wojciech Sliwinski, Middleware Renovation

  17. New RDA3: Summary of requirements • Unchanged • Device/Property model • Set of basicoperations (Get, Set, Subscribe) • Fixes & improvements • Subscription mechanism • Connection management • Diagnostics & statistics • New functionality • Policies for subscription management (client & server) • Client priorities • Server-sidesubscriptiontree • Extended Datasupport • StandardiseFirst-Updateconcept Wojciech Sliwinski, Middleware Renovation

  18. Agenda Technical evaluation of the transport layer Wojciech Sliwinski, Middleware Renovation

  19. Middleware transport requirements Desirable • Lightweight • Friendly API, documentation • Request/reply & pub/sub patterns • Asynchronous • Performance & Scalability • Stability, Maturity & Longevity Mandatory • Active community • Open source license • C++/Java • Linux/Windows • Over TCP/IP LAN Fundamental Wojciech Sliwinski, Middleware Renovation

  20. Evaluation process –> our criteria • Appearance • Simple usage • Testing • Communication patterns • Performance • Exceptional situations • QoS • Configuration • Creators • specification • documentation • Users • forums • bug reports • Internet • Download • licensing • Compile • Linux & gcc • Run examples CRITERIA Resources, binary size, memory QoS Community,maturity API, look & feel, documentation Communications patterns Performance Wojciech Sliwinski, Middleware Renovation Andrzej Dworak, ICALEPCS 2011

  21. Evaluated middleware products All opinions are based only on our knowledge and evaluation. Each of the products, depending on the requirements, may constitute a good solution. OpenAMQ CoreDX RTI DDS QPid ZeroMQ OpenSpliceDDS RabbitMQ YAMI Ice omniORB MQtt RSMB JacORB Thrift Mosquito Wojciech Sliwinski, Middleware Renovation Andrzej Dworak, ICALEPCS 2011

  22. Products comparison (according to the criteria) Wojciech Sliwinski, Middleware Renovation Andrzej Dworak, ICALEPCS 2011

  23. Conclusions • Several good middleware solutions available • The choice is dictated by the most critical requirements • Not easy performance matters but also ease of use, community, … • Prototyping was done with the most promising candidates: • ZeroMQ, Ice& YAMI • Finally we decided to chooseZeroMQ (http://www.zeromq.org/) • Asynchronous & non-blocking communication • 0-copy& lock-free data structures, message queues • Nice API, gooddocumentation & activecommunity Wojciech Sliwinski, Middleware Renovation

  24. New RDA3 Java – Sync Get round-triptime Test setup: 1kB messagepayload, cs-ccr-* machines, 1 server host & 10 clienthosts Wojciech Sliwinski, Middleware Renovation

  25. New RDA3 Java – subscriptionnotificationlatency Test setup: 1kB messagepayload, cs-ccr-* machines, 1 server host & 10 clienthosts Wojciech Sliwinski, Middleware Renovation

  26. New RDA3 Java – subscriptionnotificationlatency Test setup: 1kB messagepayload, cs-ccr-* machines, 1 server host & 10 clienthosts Wojciech Sliwinski, Middleware Renovation

  27. Agenda Changes in the MW Architecture in LS1 Wojciech Sliwinski, Middleware Renovation

  28. Current MW Architecture User written Middleware Java Control Programs Central services VB, Excel, LabView C++ Programs Administration console JAPC API Passerelle C++ Clients RDA Client API (C++/Java) Device/Property Model Directory Service Directory Service RBAC A1 Service RBAC Service Configuration Database CCDB CMW Infrastructure CORBA-IIOP RDA Server API (C++/Java) Device/Property Model CMW integr. CMW int. CMW int. CMW int. CMW int. CMW int. Servers Virtual Devices (Java) FESA Server FGC Server PS-GM Server PVSS Gateway More Servers Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …) Wojciech Sliwinski, Middleware Renovation

  29. User written Changes in MW Architecture in LS1 Middleware Central services Java Control Programs Upgrade in LS1 VB, Excel, LabView C++ Programs Administration console JAPC API Passerelle C++ Clients RDA Client API (C++/Java) Device/Property Model Directory Service Directory Service RBAC A1 Service RBAC Service Configuration Database CCDB CMW Infrastructure ZeroMQ RDA Server API (C++/Java) Device/Property Model CMW integr. CMW int. CMW int. CMW int. CMW int. CMW int. Servers Virtual Devices (Java) FESA Server FGC Server PS-GM Server PVSS Gateway More Servers Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …) Wojciech Sliwinski, Middleware Renovation

  30. Agenda MW Upgrade milestonesin 2013 Wojciech Sliwinski, Middleware Renovation

  31. MW Upgrade Milestones in 2013 July’13 July-Oct’13 September’13 December’13 Winter’13/14 August’14 End-of-Life for RDA2: LS2 Wojciech Sliwinski, Middleware Renovation

  32. MW Upgrade strategy in LS1 and towards LS2 • No BIG-BANG migration but gradual • Backward compatible (connection-wise)newRDA3 client library • New RDA3 clients can communicatewith RDA2 & RDA3 servers • FESA3 willexist with both: old RDA2 (FESA3.1) and new RDA3 (FESA3.2) Client appswillmigrateduring LS1 Only for justified, exceptionalcases OldJAPC New JAPC RDA2  RDA3 Gateway Old RDA2client New RDA3client Old RDA2 server Old RDA2 server FEC developersshouldmigrate to FESA3.2 ASAP New RDA3 server FESA2.10 FESA3.1 FESA3.2 Wojciech Sliwinski, Middleware Renovation

  33. LS1: Changes in JAPC • New major JAPC version  upgrade for RDA3 (September’13) • Public API backward compatible • Possible API extensions, but always compatible • Announcement via accsoft-java-announce list • RequiredActions for JAPC Users • Update JAPC jars (via CommonBuild) • Re-release your product (via CommonBuild) • New JAPC will support communication with RDA2 & RDA3 servers Wojciech Sliwinski, Middleware Renovation

  34. LS1: Changes in RDA • New major version: RDA3 (June’13 – alpha version) • Public API NOT backward compatible • New protocol, new architecture, new design • Same Device/Property model & Get/Set/Subscribe calls • Announcementvia cmw-news & accsoft-java-announcelists • RequiredActions for RDA Users • For Java: Use new version of JAPC (API unchanged) • For Java: New JAPC will support communication with RDA2 & RDA3 servers • For C++: Upgrade user code to new RDA3 API • For C++: RDA3 will support communication with RDA2 & RDA3 servers • Consequences if NO Action staying with old RDA2 • NOT possible to communicate with new RDA3 servers (FESA3, FGC, etc.) Wojciech Sliwinski, Middleware Renovation

  35. Agenda Riskassessment and mitigation Wojciech Sliwinski, Middleware Renovation

  36. Riskassessment and mitigation Wojciech Sliwinski, Middleware Renovation

  37. Risk: Wrong product developed(wrong requirements) Mitigation: Early and continuous involvement ofclients & experts • We involved clients and experts since 2010 • Requirements review with all major clients • Technical discussions with eqp.experts • Iterative development involving the Reviewteam • Design meetings (API and internals) since January 2013 • Alpha versions will be available for feedback and validation severalmonths before the final release • Feedback is continuously integrated in development (= iterative) Wojciech Sliwinski, Middleware Renovation

  38. Risk: Product is (too) late Mitigation: Careful planning and follow-up Fall-back to less ambitious goals • Planning prepared and followedby the MW team • Taking into account needs and priorities of other CO projects and clients • Regular follow-up • In CO internally by TECcoordinator • In informal meetings with the MW experts (as doneso far) • Fall-back to less ambitious goals • Plan priorities of functionality • Drop (postpone) work with lower priority Wojciech Sliwinski, Middleware Renovation

  39. Risk: Product has bugs orincompatibilities Mitigation: Early, continuous testing (unit, functional & integrationtests) • Unit tests to asses quality inside the MW project • Requireddev. phase in the MW team • Functionality tests in CO Testbed • Functionality of CMW only • Integration tests to check interoperability • Integration with FESA in CO Testbed • Integration with FGC in FGC Lab Wojciech Sliwinski, Middleware Renovation

  40. Risk: Bugs affect operations Mitigation: Gradual Migration (1) • No BIG-BANGmigration but gradual • Backward compatible (connection-wise)newRDA3client library • New RDA3clients can talk to old RDA2servers • FESA3 willexist with both: old RDA2and new RDA3 OldJAPC New JAPC Old RDA2client New RDA3client Old RDA2 server Old RDA2 server New RDA3 server FESA2 FESA3 FESA3 Wojciech Sliwinski, Middleware Renovation

  41. Risk: Bugs affect operations Mitigation: Gradual Migration (2) • Deploy firston systems controlled by the MW team • E.g. Proxies, Gateways • Gainexperience and confidence • Start deploymentwith less critical systems first Wojciech Sliwinski, Middleware Renovation

  42. Risk: Bugs affect operations Mitigation: Fast deployment of bugfixes • If (inspite of all) something goes wrong in operations • Fast reaction from the MW team • In CO, we will study the need and mechanisms to quickly upgrade also servers Wojciech Sliwinski, Middleware Renovation

  43. Conclusions • We haveto replace CORBA with a newsolution • We collectedupdatedusersrequirements • MW upgradewill be performedduring LS1 • Interoperabilitybetween RDA2  RDA3 • Gradualcontrol system migrationuntil LS2 • End-of-Life for RDA2: LS2 Wojciech Sliwinski, Middleware Renovation

More Related