AMOD report 6 Feb – 12 Feb 2012

AMOD report 6 Feb – 12 Feb 2012 Fernando H. Barreiro Megino CERN IT-ES-VOS

Overview: Analysis

Overview: Production Claire Gwenlan: “[…] we are now on the tail end of MC11c […] the load is not going to be like what you've seen for the past few weeks/months […] Until… MC12… coming soon…”

Overview: DDM ATLAS membership of ddmadmincertificate expired on 11 Feb 2012 and transfer jobs were rejected or failed

CERN and ADC • Sun 5th CERN-PROD_DATADISK: GGUS:78923 • lcg-cr failures • Caused by latest EMI release on "preprod" WNs (10%) • Rolled back to LCG WN on Wed morning • Mon 6thSchedconfig failed to update • Set IT and TW clouds offline in Panda over the morning • Recovery from dump - only expert procedures available • Dedicated postmortem • Tue 7th ADCR & ATLR intervention: • Oracle security updates • Almost transparent. Unavailability of Panda&DDM for a few minutes at 9:00

CERN and ADC: PandaMon issues Voatlas140&141 out of production • 2 out of 6 servers out of production for a week to prevent session count overload errors • Wed 8th-Thu 9th curl control commands failing intermittently • Machines using large amount of swap space: Alarm about voatlas180 using 50GB during Thu night Utilization of swap space 9th Feb 10th Feb

ddmadmin certificate renewal (1) • ddmadminis the robot certificate used to authenticate DDM and other ADCops agents • Yearly ddmadmin proxy expired 9th Feb • 23rd Jan (>2 weeks before) a campaign was started to renew the proxy on all DDM and ADCops machines • Some machines were forgotten • ddmusr01@voatlas125: Victor • ddmusr03@voatlas161: Functional Test subscription • ddmusr01@voatlas244: ADC monitoring collector • Maybe more  Need to elaborate a clear list of places where the ddmadmin proxy is installed

ddmadmin certificate renewal (2) • The ATLAS membership of ddmadmin expired on Sat 11th Feb…and caught everybody by surprise • All FTS job submissions were rejected • Few hours after the problem was reported, the membership was renewed • Proxies are cached via proxy delegation and it took several hours until new change was propagated to all services (FTS, SEs, …) • glite-delegation-destroy&init did not seem to make any effect • e.g. Hiro deleted all proxies from /tmp on all FTS agent hosts to speed up the recovery in the US cloud • RAL had to roll out the grid-mapfiles manually after the incident GGUS:79137

ddmadmin certificate renewal (3) Need recovery procedures, a tested backup proxy and notifications about the proxy sent out to the AMOD mailing list

Tier1s • IN2P3-CC downtime Tue 7th • Maintenance and upgrade of the various services and servers. • Affecting LFC, dCache, FTS, batch system, Worker nodes, etc. • Complete cloud offline in Panda and DDM • Downtime for CE and SE extended until Wed 8th • SARA downtime Tue 7th • Replacement of 6620 SAN storage hardware and firmware updates • Affecting services such as SRM, dCache and UI • RAL downtime Wed 8th • Intervention on core network • Affecting all services (LFC, FTS, SE, CE…) • UK cloud set offline • Failing jobs at SARA on Thu 9thGGUS:79089 • Not site issue • Panda brokerage did not recognize NIKHEF-ELPROD_PHYS-TOP as NIKHEF location • Tadashi fixed immediately • FZK transfer and staging failures on Sun 12thGGUS:79145 • High load and full disks • INFN-MILANO-ATLASC SRM problems GGUS:78998 • Recurring problem over many days: “failed to contact on remote SRM [httpg://t2cmcondor.mi.infn.it:8444/srm/managerv2]” • /etc/grid-security/vomsdir/atlas/vo.racf.bnl.gov.lsc missing on StoRM servers and therefore rejecting all proxies with VOMS extensions provided by BNL VOMS server • Later problem with the fetch-crl cronjob

Thanks to ADC experts and ADCoS shifters for their support • BEWARE: No AMODs in the next weeks

AMOD report 6 Feb – 12 Feb 2012

AMOD report 6 Feb – 12 Feb 2012

Presentation Transcript

Chapter 9 : SQL*PLUS REPORTS

CSUMentor Tutorial 2012-2013

Panel Meeting 196

Evaluation of the TRICARE Program: Access, Cost, and Quality Fiscal Year 2012 Report to Congress

2012 Semi-Annual Assessor/Instructor Meeting SDI Report January 21, 2012

NFB procurement report 2012

ASO Workshop

Highlights from the MN HIV Surveillance Report, 2012

Neonatal Sepsis

Report Reading

2013 AGM Friday 15 th November

Chapter 9 : SQL*PLUS REPORTS

Presentation of the 2011/2012 Necsa Annual Report

NERSA ANNUAL REPORT 2012/13

CapSim Final Report Team Digby

Create Actionable Data for Schools, 2012-13

Non-Academic Assessment: Completing an Assessment Report

2012/13 ANNUAL REPORT OCTOBER 2013

2012 The UK Photopheresis Society

ANNUAL REPORT OF THE DEPARTMENT OF BASIC EDUCATION FOR 2012/13

Now That We Have an Institutional Report Card---How Do We Use It?

eReporting