40 likes | 169 Views
This report covers the operational issues experienced from November 12-18, 2012. Key highlights include several problems with the LSF system, some of which were resolved, though not all root causes are understood. A minor incident occurred with a Castor server, promptly addressed by the DSS team. Contention issues on ADCR due to repeated attempts to read from Oracle's redo logs were noted, alongside DATADISK space issues. Additionally, 5 hours of INFN-T1 tape maintenance caused downtime. Overall, Tier2 operations were smooth with minimal issues reported.
E N D
AMOD report12-18 Nov 2012 Rodney Walker Guido Negri
Tier0 • several problems with LSF, mostly fixed though not all understood • small incident with a Castor server (alarm ticket opened, but problem already tackled by the DSS team) • contention on ADCR due to repeated attempts to read redo blocks from Oracle's redo log files. Index’s partition corrupted and reconstructed
Tier1 • DATADISK space issues at several T1s, deletion during the beginning of the week helped • problems with ASGC tape, hardware decommissioning and network glitches were the cause • INFN-T1 tape downtime for 5hr for maintenance, no problems
Tier2 • no major problems with Tier2s, only a very few GGUS tickets opened and tackled in a very timely way