1 / 10

AMOD Report ADC weekly , CERN, 29 November 2011

AMOD Report ADC weekly , CERN, 29 November 2011. Alexei Klimentov Brookhaven National Lab. DDM. Throughput (clouds). MB/s. MB/s. Throughput (activities). Extra TOP datasets planned replicas. MC, Group Production and Data Processing. 0. 86 M completed jobs (0.57M – MC production).

pete
Download Presentation

AMOD Report ADC weekly , CERN, 29 November 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMOD ReportADC weekly, CERN, 29 November 2011 Alexei Klimentov Brookhaven National Lab

  2. DDM Throughput (clouds) MB/s MB/s Throughput (activities) Extra TOP datasets planned replicas Alexei Klimentov – ADC weekly

  3. MC, Group Production and Data Processing 0.86 M completed jobs (0.57M – MC production) Alexei Klimentov – ADC weekly

  4. Grid Analysis 1.6 M completed jobs Alexei Klimentov – ADC weekly

  5. “Databases” • Tue Nov 22 : ~30’ ADCR outage • Transparent intervention was announced. It wasn’t transparent, because of human factor. The issue was quickly fixed by IT DB team. • Wed Nov 23 : LFC database at CERN contains ~9500 files w/o parent file id. The issue was fixed. No impact on database or applications performance • Wed Nov 23 (21:45) : ADCR problem. Multiple disks failure. Some records (8 records in PanDA table) were corrupted. The issue was fixed by IT DB team, the corrupted rows were cleaned by Gancho. • Thu Nov 24 : LFC problem (prod-lfc-atlas.cern.ch) and high ADCR rate (matview refresh issue). • Both issues were caused by the previous problem. Fixed by IT-DB team • Fri Nov 25 : ADCR high load (aggressive deletion service) • Fri Nov 25 : 22:00 – Sat Nov 26 11:00 ‘DDM dashboard statistics is not updated’ • LCGR database issue. MarcinBlaszyk and David Tuckett worked during Sat night. The problem was identified and fixed by 11:00am Statistics is regenerated (David) • Dashboard agent instructions to be reviewed • ddmusr01 has no access to the dashboard machine Alexei Klimentov – ADC weekly

  6. Tier-1s • Nov 21 : 19:00 SARA and INFN-T1 are in full production • Nov 28 : 6’ UPS outage in IN2P3-CC, site was in full production 3h later Alexei Klimentov – ADC weekly

  7. Tier-2s • DE,FR,IL,IT,US Tier-2s and Tier-3s off/on in DDM and Production • Found subscriptions to the Grid Sites, though sites are not in DDM FT. Missing sites are added to DDM FT. Alexei Klimentov – ADC weekly

  8. False Security Alarm • Fri Nov 26. Excellent work of CERN IT security team, Central Services (SB) and Operations (ADiGi) Alexei Klimentov – ADC weekly

  9. Misc. • CASTOR to EOS migration (ADiGi, GN) • Physics groups space • Started on Thu Nov 24 • Final step Mon Nov 25 • Issue with python version of dq2.cfg on one of VOBOXes, production was affected. Fixed (TM, SB) • Distributed Analysis monitoring tables are empty (reported by users). Fixed (VF) Alexei Klimentov – ADC weekly

  10. ADCoS and COMP@P1 Shifts • Many issues were identified and covered by ADC shift team. • Excellent work of shifters and excellent organization. Alexei Klimentov – ADC weekly

More Related