1 / 21

CCRC08 post-mortem LHCb activities at PIC

CCRC08 post-mortem LHCb activities at PIC. G. Merino PIC, 19/06/2008. LHCb Computing. Main user analysis supported at CERN + 6Tier-1s Tier-2s essentially MonteCarlo production facilities. CCRC08: Planned tasks.

seth-hughes
Download Presentation

CCRC08 post-mortem LHCb activities at PIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCRC08 post-mortemLHCb activities at PIC G. Merino PIC, 19/06/2008

  2. LHCb Computing • Main user analysis supported at CERN + 6Tier-1s • Tier-2s essentially MonteCarlo production facilities

  3. CCRC08: Planned tasks • May activities: Maintain equivalent of 1 month data taking assuming a 50% machine cycle efficiency • Raw data distribution from pit → T0 centre • Raw data distribution from T0 → T1 centres • Use of FTS - T1D0 • Recons of raw data at CERN & T1 centres • RAW (T1D0)  rDST (T1D0) • Stripping of data at CERN & T1 centres • RAW & rDST (T1D0)  DST (T1D1) • Distribution of DST data to all other centres • Use of FTS - T0D1 (except CERN T1D1)

  4. Activities across the sites Planned breakdown of processing activities (CPU needs) prior to CCRC08

  5. Tier 0  Tier 1 • FTS from CERN to Tier-1 centres • Transfer of RAW will only occur once data has migrated to tape & checksum is verified • Rate out of CERN ~35MB/s averaged over the period • Peak rate far in excess of requirement • In smooth running sites matched LHCb requirements

  6. Tier 0  Tier 1

  7. Tier 0  Tier 1 • To first order all transfers eventually succeeded • plot shows efficiency on 1st attempt… Issue with UK certificates Restart IN2P3 SRM endpoint CERN outage CERN SRM endpoint problems

  8. Reconstruction • Used SRM 2.2 • LHCb space tokens are: • LHCb_RAW (T1D0); LHCb_RDST (T1D0) • Data shares need to be preserved • Important for resource planning • Input 1 RAW file & output 1 rDST file (1.6 GB) • Reduced nos of events per recons job from 50k to 25k (job ~12 hour duration on 2.8 kSI2k machine) • In order to fit within the available queues • Need to get queues at all sites that match our processing time • Alternative: reduce file size!

  9. Reconstruction After data transfer file should be online, as job submitted immediately NOTE: in principle only LHCb has this requirement of “online reconstruction”  Reco jobs will read the input data from the T1D0 write buffer Just in case… LHCb pre-stages files (srm_bringonline) & then checks on the status of the file (srm_ls) before submitting pilot job via GFAL Pre-stage should ensure access availability from cache Only issue at NL-T1 with reporting of file status

  10. Reconstruction • 41.2k reconstruction jobs submitted • 27.6k jobs proceeded to done state • Done/created ~67%

  11. Reconstruction • 27.6k reconstruction jobs in done state • 21.2k jobs processed 25k events • Done/25k events ~77% • 3.0k jobs failed to upload rDST to local SE • Only 1 attempt before trying Failover • Failover/25k events ~13%

  12. Error humano en el PIC: WN con la red desconfigurada 24-27 de Mayo Hacía de black-hole (ticket-4386)

  13. Reconstruction CPU efficiency: ratio of wall/cpu time on running jobs • CNAF: more jobs than cores on a WN … • IN2P3 & RAL: Problems reading input data

  14. Reconstruction CPU efficiency: ratio of wall/cpu time on running jobs • PIC: The most cpu-efficient T1 

  15. dCache Observations • Official LCG recommendation - 1.8.0-15p3 • LHCb ran smoothly at half of T1 dCache sites • PIC OK - version 1.8.0-12p6 (dcap) • GridKa OK - version 1.8.0-15p2 (dcap) • IN2P3 - problematic - version 1.8.0-12p6 (gsidcap) • Seg faults - needed to ship version of GFAL to run • Could explain CGSI-gSOAP problem???? • NL-T1 - problematic (gsidcap) • Many versions during CCRC to solve number of issues • 1.8.0-14 -> 1.8.0-15p3->1.8.0-15p4

  16. Databases • Conditions DB used at CERN & Tier-1 centres • No replication tests of conditions DB Pit ↔Tier-0 (and beyond) • Switched to using Conditions DB 15th May for reconstruction • LFC • Use “streaming” to populate the read-only instance at T1 from CERN • Problem with CERN instance revealed local instances not being used by LHCb! • Testing underway now

  17. Stripping Stripping on rDST files 1 rDST file & associated RAW file Space tokens: LHC_RAW & LHCb_rDST DST files & ETC produced during the process stored locally on T1D1 (add storage class) Space tokens: LHCb_M-DST DST & ETC file then distributed to all other computing centres on T0D1 (except CERN T1D1) Space tokens: LHCb_DST (LHCb_M-DST)

  18. Stripping 31.8k stripping jobs were submitted 9.3k jobs ran to “Done” Major issues with LHCb book-keeping

  19. Stripping: T1-T1 transfers CNAF PIC Catch up ok once solved Initial problems uploading to M-DST Token at PIC GridKa RAL Stripping test limited to 4 T1 centres

  20. Conclusiones • A pesar de ser el Tier-1 más pequeño de LHCb, la calidad de servicio del PIC ha sido la más alta en el CCRC08 • Se han testeado los siguientes procesos para los Tier-1 • Recepción de datos desde el CERN • Reconstrucción • Stripping y envío de DST a otros Tier-1 • Los resultados en el PIC han sido positivos • Recepción de datos desde el CERN (~5MB/s) • Lectura de datos desde WNs (dcap) – OK • Demostrada replicación de DST a otros Tier-1s a más velocidad de la requerida (catch-up) • El ejercicio ha sido también útil para que LHCb detecte los puntos débiles de su infraestructura Grid DIRAC • Mejorar el sistema de book-keeping, log-files, etc

More Related