1 / 25

CIC Portal/COD Activities

CIC Portal/COD Activities. Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France. Contents. CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover. Use tools.

tricia
Download Presentation

CIC Portal/COD Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France

  2. Contents • CIC Portal Usage : who/how • Latest Release Portal Characteristics • On-going developments • CIC portal overview for COD • Statistics and results • Working groups • Zoom on Failover

  3. Use tools Each actor can use a set of operational tools (provided, integrated or interfaced) SITE Communicate USER Report on site activity, submit tests, configure Manage static information about my VO VO MANAGER Track, report, diagnose and follow-up problems OPERATOR REGIONAL CENTER Tools (CIC Portal) The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 22/08/2014 3

  4. Av connections Dec 2004-Dec 2007 What do people connect to the CIC portal for ?

  5. Connections and process

  6. Tasks handled by CIC portal Development team Between February 2007 and January 2008

  7. Contents • CIC Portal Usage : who/how • Latest Release Portal Characteristics • On-going developments • CIC portal overview for COD • Statistics and results • Working groups • Zoom on Failover

  8. Latest changes in 6 months • Last technical changes • authentication is now based on full certificate DN instead of CN • Work on VO ID cards • changes in Database schema for VO/VOMS information • VO ID card interface improved • Integration of the YAIM VO Configurator to the CIC portal • Downloadable XML dump of VO ID card info • Scheduled downtimes procedure • Integration of the regional 1rst line support dashboard – prototype with CE

  9. On-going developments • CIC Portal Usage : who/how • Latest Release Portal Characteristics • On-going developments • CIC portal overview for COD • Statistics and results • Working groups • Zoom on Failover

  10. What is left for next release in March • 2159 Adapt to new components released into production, cf YAIM tool. • 1559 Development of a new version report taking into account several feedback. • 1920 Follow SAM migration to gridview on CIC portal side  IDLE • Internal Tasks include quick fixes/bug fixes, documentation, background clean-up work, code optimization/prospective for EGEE-III.

  11. COD activity CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover ARM Meeting, EGEE’07, Budapest 22/08/2014 11

  12. A tool for Grid Operators: COD dashboard Monitoring tool #1 Sites info Operator Operator Monitoring tool #2 Sites info Dashboard Monitoring tool #n Monitoring tool #1 Monitoring tool #2 Mail client Monitoring tool #n Mail sender Ticketing system Ticketing system MANY ENTRY POINTS SINGLE ENTRY POINT Start of EGEE Now The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 22/08/2014 12

  13. Interaction with EGEE services OPERATIONS PORTAL FZK, Karlsruhe, Germany IN2P3-CC, Lyon, France - View ticket GGUS SOAP Site1 status status ticket #28 Site2 status status ticket #32 - Create ticket - Update ticket Site3 status status No ticket Site4 status status ticket #14 SQL queries CERN, Geneva, Switzerland ASGC, Taipei, Taiwan http GIIS status per site XSQL-based service Test results on nodes - Site info - Scheduled downtimes SAM GOC-DB Gstat The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 22/08/2014 13

  14. Outline CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 22/08/2014 14

  15. Statistics

  16. CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Duties and Working groups Zoom on Failover The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 22/08/2014 16

  17. COD Duties • Rotations of 10 federations/teams -- 1/5 weeks. • Quarterly face-to-face meetings to update tools, procedures and uniformize working habits. =================================== • 10 federations over 18 months in EGEE-I • Working groups for over 18 months now….

  18. There is more to it …. Straightforward mandate working groups: GSTAT -- TW, SAM -- CERN, SAMAP – CE, topped by • Tools for Improvement for COD, TIC – CE (EGEE’07)

  19. Working groups mandate • Integration of the existing tools CIC– FR Integration platform of all COD tools to ease-up the daily operational job • Improvement of BEST PRACTICES -- DE-CH Identifity, raise and analyse with COD how to have homogeneous operations  • Release of updated documentation OPM –SE Documentation under constant evolution • Set-up of Failover Mechanisms for GRID CORE SERVICES – SWE, What is done at a federation level, what is done at the project level (need help from JShiers group), what could be done (operational point of view) and what is needed at the ROC/Site level (from a m/w point of view). • Set-up of High Availability strategy of the operational tools for CODs FAILOVER– IT

  20. Failover working group CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover for Operational Tools The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 22/08/2014 20

  21. EGEE Failover: purpose • Propose, implement and document failover procedures for the collaboration, management and monitoring tools used in EGEE/WLCG Grid. • Solution is based on DNS and consists in: • mapping the service name to one or more destinations • update this mapping whenever some failure is detected • Geographical failover for the EGEE-WLCG Grid collaboration tools • CHEP 2007, Victoria BC, Canada (September 2007)

  22. COD Work aspects to keep in EGEE IIII • Dedication : Working groups recognized within federations to provide expertise and by federations to make the needs come to the central operations. • Collaboration : Up to now, each federation had found a way to contribute actively to improve their COD work environment, when not proactively leading a working group. Also, each person/tool developper/expert recognized as of « global interest » eventhough out of COD scope has been integrated happily in this « closed community », e.g SAMAP  TIC scope to monitor this aspect with Nagios prototype for example. • Flexibility : Purpose of the groups to evolve together with their mandate with time and the upcoming of the needs e.g. Core grid services HA, EGI • Anticipation : e.g. Strategy of the Operational Failover Working Group. • Experiment : e.g regionalisation of tools and the future modular « NGI dashboards » to widen the CE 1rst line support experience.

  23. COD Work aspects to make evolve in EGEE IIII • Mandate and Assessment of the COD activity  Integration of NDGF/NE as a COD team – other teams ?  Catch-all and global operations center -- what core services are to be monitored centrally , and how to monitor them and how to properly switch to backup -- How to aggregate local data and what local data would be concerned  Assess metrics in order to assess the most problematic m/w components, recurrently unreliable sites  Operational tools reliability assessment /ENOC test as a start base?  Strenghten need on HA/Failover of operational tools and grid core services • Vision of the COD tools long-term evolution : 1 set of tools /federation + aggregation? Which set of tools is to be regionalized ? SAM, GOC DB, COD? what else? How are they going to interact => need for a global schema, NOW.

  24. COD Work aspects to make evolve in EGEE IIII • Leverage on « project labeled » tools in order for operational use-cases for not to remain « pending ».  developements strategy/priorities are coherent. -- data workflow – synch GOCDB/BDII/SAM/COD -- development strategy – depends on the stretegy of the COD tools long-term evolution -- priority decision workflow – Who and how to drive the « project  labeled » tools requests priority for operational use-cases for not to remain « pending ». - critical tests monitoring/accounting or ARC CE. - ca update procedure, - need for SAM failover…  staffing is adequate for proper reactivity not only for bugfix. • Interoperability/interoperations (item to be followed up) • OSG : rather informal for the moment, BUT NOW, users do have problems and sites are the relay of their users cf GGUS ticket 31037. • NDGF : existing critical test monitoring ? and what are the consequences on operational procedures?

  25. Conclusions and References • Where, how, when do we adress these topics?? • Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations. • References: • CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations • Grid 2007 (IEEE), Austin Tx, United-States (September 2007) • Geographical failover for the EGEE-WLCG Grid collaboration tools • CHEP 2007, Victoria BC, Canada (September 2007)

More Related