180 likes | 314 Views
Statistics of CAF usage, Interaction with the GRID. Marco MEONI CERN - Offline Week – 11.07.2008. Outline. CAF Usage and Users’ grouping Disk monitoring Datasets CPU Fairshare monitoring User query Conclusions & Outlook. CERN Analysis Facility. Cluster of 40 machines since two years
E N D
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week – 11.07.2008
Outline • CAF Usage and Users’ grouping • Disk monitoring • Datasets • CPU Fairshare monitoring • User query • Conclusions & Outlook
CERN Analysis Facility • Cluster of 40 machines since two years • 80 CPUs, 8 TB of disk pool • 35 machines as PRO partition, 5 as DEV • Head node is xrootd redirector and PROOF master • Other nodes are xrootd data servers and PROOF slaves
CAF Usage • Available resources in CAF must be fairly used • Highest attention to how disks and CPUs are used • Users are grouped • At present, sub-detectors and physics working groups • Users can belong to several groups (PWG has precedence over sub-detector) • Each group • has a disk space (quota) which is used to stage datasets from AliEn • has a CPU fairshare target (priority) to regulate concurrent queries
CAF Groups Not absolute quotas • 18 registered groups • ~60 users • 165 users have used CAF:please register to groups!
Resource Monitoring • ML ApMon running on each node • Sends monitoring information each minute • Default monitoring (Load, CPU, memory, swap, disk I/O, network) • Additional information: • PROOF and disk servers status (xrootd/olbd) • Number of PROOF sessions (proofd master) • Number of queued staging requests and hosted files (DS manager)
Hosted files and Disk Usage 2081 1434 1491 1285 1510 1135 1548 1248 2200 1997 3017 1370 1486 2275 1105 657 1337 1254 2104 1595 1354 1077 3700 1236 1794 1147 1378 887 759 1422 1710 2859 1125 1088 ----- 53665 lxb6047: 310 lxb6048: 309 lxb6049: 308 lxb6050: 308 lxb6051: 308 lxb6052: 309 lxb6053: 309 lxb6054: 0 lxb6055: 309 lxb6056: 311 lxb6057: 307 lxb6058: 308 lxb6059: 309 lxb6060: 310 lxb6061: 311 lxb6062: 309 lxb6063: 309 lxb6064: 307 lxb6065: 308 lxb6066: 1089 lxb6067: 309 lxb6068: 311 lxb6069: 309 lxb6070: 313 lxb6071: 311 lxb6072: 309 lxb6073: 312 lxb6074: 312 lxb6075: 310 lxb6076: 311 lxb6077: 309 lxb6078: 307 lxb6079: 312 lxb6080: 309 ----- 10992 lxb6047: 4505252 lxb6048: 4765808 lxb6049: 4535616 lxb6050: 4626584 lxb6051: 4611296 lxb6052: 4357392 lxb6053: 4618860 lxb6054: 0 lxb6055: 4617420 lxb6056: 4616616 lxb6057: 4604636 lxb6058: 4616228 lxb6059: 4498200 lxb6060: 4503860 lxb6061: 4615572 lxb6062: 4442524 lxb6063: 4648184 lxb6064: 4506060 lxb6065: 4617604 lxb6066: 8205852 lxb6067: 4616636 lxb6068: 4610376 lxb6069: 4503408 lxb6070: 4621144 lxb6071: 4617128 lxb6072: 4503624 lxb6073: 4614408 lxb6074: 4578440 lxb6075: 4617216 lxb6076: 4575632 lxb6077: 4508424 lxb6078: 4502668 lxb6079: 4503700 lxb6080: 4408096 -------- 154294464 136503228 131620332 129713692 130692932 131562416 131304256 133114200 147588376 136338280 129738196 134240916 131261948 137193076 136599452 135547372 133563072 138580720 135901416 136091560 79725312 135399544 127204136 131962576 138608904 138949284 125406368 143095248 128861436 126755308 139379600 136096716 125332288 132440152 142560756 ---------- 4508933068 ESDs from RAW dataproduction ready to bestaged #Raw files: 11k #Sim files: 54k Raw on disk:154GB Sim on disk:4.5TB
Interaction with the GRID • Datasets (DS) are used to stage files from AliEn • A DS is a list of files (usually ESDs or archives) registered by users for processing with PROOF • DSs may share same physical files • Staging script issues new staging requests and touch files every 5 mins • Files are uniformly distributed by the xrootd data manager
Dataset Manager • The DS manager takes care of the quotas at file level • Physical location of files is regulated by xrootd • The DS manager daemon sends: • The overall number of files • Number of new, touched, disappeared, corrupted files • Staging requests • Disk utilization for each user and for each group • Number of files on each node and total size
Dataset Monitoring • - PWG1 is using 0% of 1TB • - PWG3 is using 5% of 1TB
Datasets List • Jury produced Pt specturm plots staging his own DS (run #40825, TPC+ITS, field on) • Start staging common DSs of reconstructed runs? • /COMMON/COMMON/ESD5000_part | 1000 | /esdTree | 100000 | 50 GB | 100 % • /COMMON/COMMON/ESD5000_small | 100 | /esdTree | 10000 | 4 GB | 100 % • /COMMON/COMMON/run15034_PbPb | 967 | /esdTree | 939 | 500 GB | 97 % • /COMMON/COMMON/run15035_PbPb | 962 | /esdTree | 952 | 505 GB | 98 % • /COMMON/COMMON/run15036_PbPb | 961 | /esdTree | 957 | 505 GB | 99 % • /COMMON/COMMON/run82XX_part1 | 10000 | /esdTree | 999500 | 289 GB | 99 % • /COMMON/COMMON/run82XX_part2 | 10000 | /esdTree | 922600 | 289 GB | 92 % • /COMMON/COMMON/run82XX_part3 | 10000 | /esdTree | 943100 | 288 GB | 94 % • /COMMON/COMMON/sim_160000_esd | 95 | /esdTree | 9400 | 267 MB | 98 % • /PWG0/COMMON/run30000X_10TeV_0.5T | 2167 | /esdTree | 216700 | 90 GB | 100 % • /PWG0/COMMON/run31000X_0.9TeV_0.5T | 2162 | /esdTree | 216200 | 57 GB | 100 % • /PWG0/COMMON/run32000X_10TeV_0.5T_Phojet | 2191 | /esdTree | 219100 | 83 GB | 100 % • /PWG0/COMMON/run33000X_10TeV_0T | 2191 | /esdTree | 219100 | 108 GB | 100 % • /PWG0/COMMON/run34000X_0.9TeV_0T | 2175 | /esdTree | 217500 | 65 GB | 100 % • /PWG0/COMMON/run35000X_10TeV_0T_Phojet | 2190 | /esdTree | 219000 | 98 GB | 100 % • /PWG0/phristov/kPhojet_k5kG_10000 | 100 | /esdTree | 1100 | 4 GB | 11 % • /PWG0/phristov/kPhojet_k5kG_900 | 97 | /esdTree | 2000 | 4 GB | 20 % • /PWG0/phristov/kPythia6_k5kG_10000 | 99 | /esdTree | 1600 | 4 GB | 16 % • /PWG0/phristov/kPythia6_k5kG_900 | 99 | /esdTree | 1100 | 4 GB | 11 % • /PWG2/COMMON/run82XX_test4 | 10 | /esdTree | 1000 | 297 MB | 100 % • /PWG2/COMMON/run82XX_test5 | 10 | /esdTree | 1000 | 297 MB | 100 % • /PWG2/akisiel/LHC500C0005 | 100 | /esdTree | 97 | 663 MB | 100 % • /PWG2/akisiel/LHC500C2030 | 996 | /esdTree | 995 | 4 GB | 99 % • /PWG2/belikov/40825 | 1355 | /HLTesdTree | 1052963 | 143 GB | 99 % • /PWG2/hricaud/LHC07f_160033DataSet | 915 | /esdTree | 91400 | 2 GB | 99 % • /PWG2/hricaud/LHC07f_160038_root_archiveDataSet| 862 | /esdTree | 86200 | 449 GB | 100 % • /PWG2/jgrosseo/sim_1600XX_esd | 33568 | /esdTree | 3293900 | 103 GB | 98 % • /PWG2/mvala/PDC07_pp_0_9_82xx_1 | 99 | /rsnMVTree | 990000 | 1 GB | 100 % • /PWG2/mvala/RSNMV_PDC06_14TeV | 677 | /rsnMVTree | 6442101 | 24 GB | 100 % • /PWG2/mvala/RSNMV_PDC07_09_part1 | 326 | /rsnMVTree | 2959173 | 5 GB | 100 % • /PWG2/mvala/RSNMV_PDC07_09_part1_new | 326 | /rsnMVTree | 2959173 | 5 GB | 100 % • /PWG2/pganoti/FirstPhys900Field_310000 | 1088 | /esdTree | 108800 | 28 GB | 100 % • /PWG3/arnaldi/PDC07_LHC07g_200314 | 615 | /HLTesdTree | 45000 | 787 MB | 94 % • /PWG3/arnaldi/PDC07_LHC07g_200315 | 594 | /HLTesdTree | 42600 | 744 MB | 95 % • /PWG3/arnaldi/PDC07_LHC07g_200316 | 366 | /HLTesdTree | 30700 | 513 MB | 99 % • /PWG3/arnaldi/PDC07_LHC07g_200317 | 251 | /HLTesdTree | 20100 | 333 MB | 100 % • /PWG3/arnaldi/PDC08_170167_001 | 1 | N/A | 33 MB | 0 % • /PWG3/arnaldi/PDC08_LHC08t_170165 | 976 | /HLTesdTree | 487000 | 4 GB | 99 % • /PWG3/arnaldi/PDC08_LHC08t_170166 | 990 | /HLTesdTree | 495000 | 4 GB | 100 % • /PWG3/arnaldi/PDC08_LHC08t_170167 | 975 | /HLTesdTree | 424500 | 8 GB | 87 % • /PWG3/arnaldi/myDataSet | 975 | /HLTesdTree | 424500 | 8 GB | 87 % • /PWG4/anju/myDataSet | 946 | /esdTree | 94500 | 27 GB | 99 % • /PWG4/arian/jetjet15-50 | 9817 | /esdTree | 973300 | 630 GB | 99 % • /PWG4/arian/jetjetAbove_50 | 94 | /esdTree | 8000 | 7 GB | 85 % • /PWG4/arian/jetjetAbove_50_real | 958 | /esdTree | 90500 | 73 GB | 94 % • /PWG4/elopez/jetjet15-50_28000x | 7732 | /esdTree | 739800 | 60 GB | 95 % • /PWG4/elopez/jetjet50_r27000x | 8411 | /esdTree | 793100 | 92 GB | 94 % • ~4.7GB used out of 6GB (34*200MB -10%)
priorityMax 40% 10% priorityMin quota (q) 100% 0% 20% • f(x) = q + q*exp(kx)k = 1/q*Ln(1/4) CPU Fairshare • Usages retrieved each 5 mins, averaged each 6 hours • Compute new priorities applying a correction formula in [*quota..*quota] usage • α = 0.5, β = 2
Priority Monitoring • Priorities are used for CPU fairshare and converge to quotas • Usages are averaged to gracefully converge to quotas • If no competition, users get max CPUs • Only relative priorities are modified!
CPU quotas in practice • - only PWGs + default groups • - default usually has the highest usage
Query Monitoring • When a user query completes, PROOF master sends statistics: • Read bytes • Consumed CPU time (base for CPU fairshare) • Number of processed events • User waiting time • Values are aggregated per user and group
Query Monitoring per interval accumulated
Outlook • User sessions monitoring • in average 4-7 sessions in parallel (daily hours, EU time), with peek of 15-20 users during the tutorial sessions: running history missing • need to monitor #workers per user when load-based scheduling will be introduced • Additional monitoring per single query (disk used and Files/sec not implemented yet) • Network • traffic correlation among nodes • Xrootd activity with the new bulk staging requests • Debug • Tool to monitor and kill a hanging session when Reset doesn’t work (need to restart the cluster) • Hardware • New ALICE MAC cluster “ready” (16 workers) • New IT 8-core machines coming • Training • PROOF/CAF is the key setup for interactive user analysis (and more) • Number of people attending the monthly tutorial is increasing (20 persons last week!)