1 / 27

OVERVIEW

OVERVIEW. Status of the system Future development Planning elements. STATUS OF THE SYSTEM. CDOP 2 - PROCESSING CHAINS. STATUS OF THE SYSTEM. PROCESSING CHAINS. Operational infrastructure (2 chains). MSG (29 PCs – 48 CPUs). EPS ( 6 PCs – 12 CPUs). Parallel infrastructure (2 chains).

obert
Download Presentation

OVERVIEW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OVERVIEW Status of the system Future development Planning elements

  2. STATUS OF THE SYSTEM CDOP 2 - PROCESSING CHAINS

  3. STATUS OF THE SYSTEM PROCESSING CHAINS Operational infrastructure (2 chains) MSG (29 PCs – 48 CPUs) EPS ( 6 PCs – 12 CPUs) Parallel infrastructure (2 chains) MSG (29 PCs – 48 CPUs) EPS ( 8 PCs – 16 CPUs)

  4. STATUS OF THE SYSTEM PROCESSING CHAINS Development, test and operational support infrastructure

  5. STATUS OF THE SYSTEM ADVANTAGES OF THE DESIGN • Costs – Just hardware without any maintenance • Easier system upgrade (new hardware) • Software internally developed DISADVANTAGES • Different generations of machines being maintained and coexisting in the same system • System highly dependable on the communication • Large amount of data transferred through the network EXAMPLES OF IMPACT • Old version of CORBA used – recompilation of full system would be needed • Last hardware renovation made in July 2009 (failure of several machines over a short period)

  6. STATUS OF THE SYSTEM MAJOR FAILURE POINTS • POWER SYSTEM • Some problems with a large impact in the past. All problems fixed now and seem very reliable. Problems not expect now. External maintenance in place. • NETWORK • The impact can be very large. Difficult to monitor. • LSA SAF DATABASE (MySQL free) • Communications with machines critical. • Synchronization of two machines causing problems time to time. • Exponential growth of database not always accompanied with hardware upgrade. • VCS RECEIVING SYSTEM • It is now working without big issues, but last year cause several problems in data reception. Maintenance external (VCS) • ON-CALL SERVICE • Interventions out of working days – difficult

  7. STATUS OF THE SYSTEM IM ARCHIVE SYSTEM (Major failure) • Managing all relevant data in the LSA SAF • Interfacing with all LSA SAF processes, to produce or acquire data, either for archiving, distribution or visualization. • Including the cache of the LSA SAF processing system, sharing data among all computers (first line of the archive). • Critical system: • A problem in the archiving system has always a large impact on the performance of the system. • The source of most large impact anomalies were linked with problems detected on the archive system.

  8. STATUS OF THE SYSTEM IM ARCHIVE SYSTEM SAF SYSTEM 1st Line SAF SYSTEM Cache 2nd Line Data mover (CFA) • External storage (SAAB) • All SAF products (Output) except 2011 • Not possible to distribute to the users Abnormal performance

  9. STATUS OF THE SYSTEM DISTRIBUTION • Sporadic users User formats Mainly via web server Automatic distribution suspended since December 2011 ftp, CD, DVD, … HDF5 OFF-LINE Help desk services • Regular users ftp (distributed also in NRT) • NRT Users EUMETcast (timeliness)

  10. STATUS OF THE SYSTEM IM ARCHIVE SYSTEM • External storage (interim limit archive) • All SAF products (Output) except 2011 • Not possible to distribute to the users SAF SYSTEM 1st Line SAF SYSTEM Cache Distribution Data mover (CFA) 2nd Line Abnormal performance • External storage (SAAB) • All SAF products (Output) except 2011 • Not possible to distribute to the users

  11. STATUS OF THE SYSTEM Daily distribution of MSG data

  12. STATUS OF THE SYSTEM 2010 Distribution in 2010 1,829,152 files Distribution Jan to Apr 2010 725,485 files Distribution in 2011 1,079,940 files Distribution Jan to Apr 2011 371,730 files Distribution Jan to Apr 2012 359 files (all in March) 2011 +2012 (Jan-Apr)

  13. STATUS OF THE SYSTEM 2010

  14. STATUS OF THE SYSTEM • Due to lose of performance of archive system, LSA SAF has disabled the automatic off-line dissemination (via the webpage), since last December. • Alternative solutions (interim storage) are being tested, to be in place during the time to build a new LSA SAF archive system. • This interim solution, making use of existing storage facilities will be soon in place. At the moment tests are running using • 1) the full range of LSA SAF products for 2010 + part of 2012 (increasing) already available in this archive. It was necessary to improve the management of requests via the web;2) The interim archive will be after and progressively extended (2005 onwards). • A new archive system (non hierarchical LSA SAF archive system) will be available ASAP, depending on funds (transferred from EUMETSAT and internal of IM/IPMA) and time for administrative procedures.

  15. STATUS OF THE SYSTEM EUMETSAT Land SAF Technical and Operational Performance Analysis Analysis of the LSA SAF operational services reported in the period from Jan 2099 to Dec 2011, including an assessment of the underpinning support system and the performance of the HW and SW infrastructure

  16. STATUS OF THE SYSTEM LAND SAF TECHNICAL AND OPERATIONAL PERFORMANCE ANALYSIS • The assessment reflects the findings of a team of LSA-SAF/IM and EUMETSAT experts tasked to: • Assist the LSA SAF Leading Entity in the understanding of the issues incurred in the availability of the LSA SAF products • Jointly assess these issues and identify related problems on the LSA SAF System Infrastructure • Jointly identify the root causes of these problems • Propose recovery actions/solutions and a roadmap for their urgent implementation as part of the LSA SAF CDOP-2 • Jointly draw lessons learnt to the benefit of the LSA SAF CDOP-2 • Provide a written joint report on the outcome of the technical assessment.

  17. STATUS OF THE SYSTEM LAND SAF TECHNICAL AND OPERATIONAL PERFORMANCE ANALYSIS Actions Proposed and Roadmap • Review the technical solution and planning elements for the implementation of the new archiving system involving EUMETSAT experts, agreeing for such effect a formal review in Autumn 2012. • Complete the testing and the roll out of the interim archive, including the porting of data from the system, reviewing the testing results by September 2012. • Complete the back up of the data, retrieving from the EMC system the complete sets of LSA SAF data and products stored. • Improve and clarify the service performance requirements, addressing monthly availability and off-line performance targets. Implement new metrics as from H2 2012. • Restore the integrity of the archive and re-enable processing of orders generated by the users with the EUMETSAT UMARF. An action on the UMARF side should be considered to disable visibility plus ordering of the LSA SAF products, and inform users accordingly until automated orders processing for the full time range can be supported again on LSA SAF side. • Describe the Long Term Data Preservation mechanism at LSA SAF and disaster recovery policy, submitting the information for review at the PCR. • Improve the performance reporting and evaluation, including feedback and performance data collected by EUMETSAT concerning the dissemination and the catalogue entries added in the reporting periods. • Perform the backlog processing of the missing metadata in order to achieve a coherent catalogue by recovering the missing files (as already achieved for other SAFs –OSI SAF).

  18. FUTURE DEVELOPMENT FUTURE DEVELOPMENT

  19. FUTURE DEVELOPMENT FUTURE DEVELOPMENT

  20. FUTURE DEVELOPMENT • More compact solutions (blades) • LSA SAF archive • Recompiling everything (system + Algorithms) with new versions (HDF libraries, last version of CORBA, etc.), with new (and free gfortran and gcc) compilers. • Use of ECFlow ex-SMS (Supervisor Monitor Scheduler) • Better separation between production and distribution

  21. PLANNING Planning drivers Heritage continuity • Configuration level • Keeping high customisation levels • Engineering vs. Science cooperation • Ease teams interaction • Scalability • Software and hardware upgrade shall be a transparent process having no major impact • Load distribution • Parallel processing • Plug-in concept • Several chains to process MSG, EPS, etc.

  22. LSA SAF IM System future Main driver Use ECFlow (based on the SMS from ECMWF) Unsupported! Still not provided Meanwhile... • Current architecture should be kept running for 2 years more (estimated) • Prepare existent algorithms to support upgraded version of HDF5 library and gfortran (free) compiler. • Integration on LSA SAF System (following APID) • Algorithm output verification • Assess the effort / feasibility to support generation of Full Disk if necessary input data is provided on the Product Configuration File. Algorithm integration process Near future goal (until the end of the year 2012) Full library and compiler upgrade for the LSA SAF System.

  23. Compilers and libraries versions Compilers Current Future versions baseline pgf90 5.1-6 Gcc package 3.3.3 (12-04-2004): • g++ (c++) • gcc (c) • gfortran (fortran) - Unavailable pgf90 – Drop usage Gcc package 4.6.3 (01-03-2012): • g++ (c++) • gcc (c) • gfortran (fortran) Libraries HDF5-1.8.9 (05-2012) gribAPI 1.9.16 (07-03-2012) ACE 6.1.0 TAO 2.1.0 JacORB 2.2 (23-05-2012) Xerces 2.8 (21-11-2011) MySQL++ 3.1 (03-06-2010) HDF5-1.6.2 (15-06-2004) ACE 5.3a_p15 TAO 1.3a_p15 JacORB 2.2 (2004) Xerces 2.5 (01-10-2004) MySQL++ 1.7.9 (2001) (02-09-2005) (07-03-2012)

  24. Envisaged Support for algorithm Programming Languages Programming Languages • Fortran • C • C++ Support to Scripting Languages are not foreseen and subject to license cost. Reduced performance of scripting languages applications.

  25. Algorithm Plugging interface routines Configuration Files • Algorithm Configuration File • Format and content defined by the algorithm developer. (system won't read or write in this file) • Presented to the algorithm through the initialise routine. • Not mandatory • Product Configuration File • Fixed format defined in the APID and written by the System. Input files and generated output files and their paths are defined here. • Presented to the algorithm through the start routine • Mandatory

  26. Algorithm Plugging interface routines Routines called by the System • initialise • Could be used to initialise variable structures to prepare the algorithm to the actual processing run. • Implementation not mandatory, although routine must exists (even if empty) since system will call it. • start • Implementation of the actual algorithm processing run. • Mandatory • clean • Could be use to deallocate dynamic instantiated variables and / or remove any temporary files created during the algorithm processing run. • Implementation not mandatory, although routine must exists (even if empty) since system will call it.

  27. Algorithm Plugging interface routines Routines called by Algorithm • reportLog • Called by the algorithm to report log messages to the system. • Not mandatory • getStopStatus • To check (regularly) if the system as requested the algorithm to stop processing. • Mandatory • stopping • It must be called immediately before the algorithm is stopping its execution (whatever the reason why). • Mandatory

More Related