amod report 24 30 september 2012 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
AMOD report 24 – 30 September 2012 PowerPoint Presentation
Download Presentation
AMOD report 24 – 30 September 2012

Loading in 2 Seconds...

play fullscreen
1 / 11

AMOD report 24 – 30 September 2012 - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

AMOD report 24 – 30 September 2012. Fernando H. Barreiro Megino CERN IT-ES. Workload. Data transfers. > 1M files a day. High number of transfer failures caused by a few NL T2s. Tue25 - High load on PanDA Servers.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'AMOD report 24 – 30 September 2012' - brooke-gentry


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
amod report 24 30 september 2012

AMOD report 24 – 30 September 2012

Fernando H. Barreiro Megino

CERN IT-ES

data transfers
Data transfers

> 1M files a day

High number of transfer failures caused by a few NL T2s

tue25 high load on panda servers
Tue25 - High load on PanDA Servers
  • Average time for DQ2+LFC registration increased dramatically causing high load on PanDA Servers
    • Some LFC timings in the logs indicated that the registration slowness was in DQ2

CC writer 1

CC writer 2

Number of sessions open on ADCR3 instance. Mostly by ATLAS_LFC_W user

tue25 high load on panda servers1
Tue25 - High load on PanDA Servers
  • Other observations that came up during the investigation
    • Some improvements on the LFC client are going to be discussed during “DB technical meeting on the LFC” on Wednesday 3rd Oct
    • PanDA server LFC registration should be activated for all sites in order to avoid individual registrations by the pilot
    • aCT registers in bursts without bulk methods: In the LFC logs we saw 4k accesses over 1 hour and only 7 access over another hour
    • There were 2 SS machines serving the DE cloud (i.e. the same sites twice) with similar configuration
thu27 ss callbacks to dashboard piling up
Thu27- SS callbacks to dashboard piling up

SS-FR

  • Initially we thought it was exclusively due to the CERN network intervention
  • After checking the logs we have seen slow callbacks before the intervention on different SS machines
  • D. Tuckett is checking the situation
other incidents and downtimes
Other incidents and downtimes
  • Monday
    • New PanDA proxy had not been updated on PanDAMonitor machines (Savannah: 97737)
    • INFN-T1 scheduled downtime for ~1 hour
  • Tuesday
    • RAL 6h upgrade to CASTOR 2.1.12-10. Alastair set UK cloud brokeroff on previous evening
  • Thursday
    • CERN network intervention to replace some switches. Services under risk were CASTOR, EOS, elogand dashboard. Smooth intervention - NTR.
  • Friday
    • BNL to ASGC transfer errors. Being investigated by both sides during the weekend. ASGC FTS is blocked to access BNL SRM and routing path is changed. (GGUS:86537)
other incidents and downtimes 2
Other incidents and downtimes (2)
  • Sunday
    • PVSS DCS replication with large delays due to high insertion rate. DCS expert had to be called on Sunday
    • RAL had failing jobs due to put errors and transfer errors – including T0 export. Caused by problem with Stager databases and resolved during Sunday late evening(GGUS:86552)
  • Saturday
    • SS-SARA had CRITICAL errors. MySQL DB corruption? Problem to be understood by DDM experts.
acknowledgements
Acknowledgements
  • Except for occasional highlights it has been a very quiet week
  • Thanks a lot to
    • ADCoSexpert&shifters, and to the Comp@P1 shifter for the good work
    • experts of the different components and sites for the quick reaction
    • Alessandro, Ueda for their support