1 / 12

DO Computing Status and Budget Requirements

DO Computing Status and Budget Requirements. Amber Boehnlein DO International Finance Committee April 20, 2005. Raw Data. RECO Data. RECO MC. Computing Model . Remote Farms. Central Farms. User Data. Data Handling Services. Central Storage. User. Desktops. Central Analysis

Download Presentation

DO Computing Status and Budget Requirements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DO Computing Status and Budget Requirements Amber Boehnlein DO International Finance Committee April 20, 2005

  2. Raw Data RECO Data RECO MC Computing Model Remote Farms Central Farms User Data Data Handling Services Central Storage User Desktops Central Analysis Systems Remote Analysis Systems

  3. Operations are smooth for DO Joint operations department formed from CDF and DO CD departments Combining pager rotations, expanding use of automated tools. Second generation deployments Completion of calibration DB access in RECO for DO TMB++ and Common analysis format Reduction in skim sizes 30% speed-up of reco Monte Carlo production for DO using automated submission tools Global Reprocessing for DO has started 2003-100M events reprocessed offsite Goal of 800M events reprocessed offsite for 2005 Data handling developments for improved functionality and operations and product longevity Hardware—replacing aging infrastructure components such as D0mino Recent Achievements

  4. Use the FNAL equipment budget to provide very basic level of functionality Databases, networking and other infrastructure Primary Reconstruction Robotic storage and tape drives Disk cache and basic analysis computing Support for data access to enable offsite computing Estimate costs based on experience or need for replacements Remote Contributions Monte Carlo production takes place at remote centers Reprocessing (or primary processing) Analysis at home institutions Contributions at FNAL to project disk and to CLuED0 Collaboration-wide analysis Computing Contributions

  5. For the value basis, determine the cost of the full computing system at FNAL costs, purchased in the yearly currency Disk and servers and CPU for FNAL analysis Production activities such as MC generation, processing and reprocessing. Mass storage, cache machines and drives to support extensive data export Assign fractional value for remote contributions Merit based assignment of value Assigning equipment purchase cost as value (“Babar Model”) doesn’t take into account life cycle of equipment nor system efficiency or use. While shown as a predictor, most useful after the fact Computing planning board includes strong remote participation, representation Not included as part of the value estimate yet Wide Area Networking, Infrastructure, desktop computing, analysis Virtual Center

  6. Data Handling/Production CAB analysis stations • 15M-25M Events logged per week • Production capacity sized to keep up with data logging. • MC production at remote sites ~1M events/week • Tape writes/reads • 7TB/week average writes • 30 TB/week reads • Analysis requests at FNAL • 750 -1100 M events • DO : 50 TB/week in 1000 requests Files/30 minutes Red shows error

  7. Support peak load of 200 users TMB, Ntuple based analysis, some user MC generation Supports post-processing “fixing” as a common activity (moving to production platform) B physics tends to be most cpu and event intensive DO—Central Analysis Backend ~2 THZ Past year, short of cache, over-reliance on tape access. Deployed 100 TB as SAM Cache on CABSRV 70 TB user controlled space, primarily on CLuED0 ASA nodes still not in production Central Analysis Using Remedy system Tickets/hardware/year Tracking in this way helps Us to understand which And how to mitigate operational issues

  8. Central Robotics 30TB At peak Daily Enstore traffic for CDF, DO, and other users DO 9940 638 TB DO LTOI 175 TB DO LTOII 159 TB 971 TB Total Diversity of robotics/drives maintains flexibility 5000 mounts/day At peak Known data loss due to Robotics/Enstore for DO >10 GB

  9. Wide Area Networking • OC(12) to ESNET, filling production link, anticipate upgrade • R&D: Fiber link to Starlight-used to support reprocessing for WestGrid Out Traffic at the border router, peak stressing OC(12) Outbound since DEC 2004 CDF Green, DO-Blue

  10. Cost Estimate-Sept 2004 The guidance in 2002 was $2M, cut to $1.5 M. In 2003, $1.5M, cut to $1.35M ($0.05M off the top, $0.1M for Wideband tax.) Added replacing mover nodes to infrastructure relative to document We did not add a “tax cost” to the price of the nodes, and probably should consider doing so. ($535/node in FY2004) (Reco farm sized to keep up with 25 Hz weekly)

  11. Bottom up estimate for FNAL budget $1.8M for equipment, $250K for tapes Actual budget: $1.25 M in equipment funds, $125K for tapes Possible mitigations and trade-offs. 30% speed up of Reco Go to 4 year retirement cycle on farm, analysis nodes Rely more on remote computing, particularly in the out-years Postpone 10Gb uplink to FY2006 Reduce skim size and assume only one skimming pass Rely more heavily on LTOII media which costs ½ of STK media for the same density. Reduce drastically amount of MC DSTs stored Recycle stk tapes 2006 bottoms up estimate in progress FY 2005

  12. The DO computing model is successful We have developed tools to enable us to target computing spending at FNAL We use metrics from SAM and system monitoring to provide estimators. Use Virtual Center Concept to calculate the “value” that remote computing give the collaboration. DO continues to pursue a global vision for the best use of resources by moving towards interoperability with LCG and OSG DO computing remains effort limited—a few more dedicated people could make a huge difference. Short budgets, needs for continued construction projects and aging computing infrastructure is a serious cause for concern Conclusions

More Related