1 / 11

Computing tasks and the computing model

Computing tasks and the computing model. Computing tasks Identify tasks Identify data types Requirements to constrain computing model Computing model Baseline model including: Computing facilities Data storage model Network constraints Cost + manpower estimates

ferna
Download Presentation

Computing tasks and the computing model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing tasks and the computing model • Computing tasks • Identify tasks • Identify data types • Requirements to constrain computing model • Computing model • Baseline model including: • Computing facilities • Data storage model • Network constraints • Cost + manpower estimates • Details in two LHC-B technical notes • Editors: P.Binko, A.Pacheco

  2. Dataflow architecture

  3. Simulation • Existing program (SICB) used to: • Estimate distribution of event sizes (input to DAQ and data storage models) • Estimate complexity of data (input to trigger, reconstruction, analysis algorithm studies) • Estimate CPU and memory requirements (input to baseline computing model) • Evolution of simulation • Assess impact of new technologies (e.g. GEANT4, geometry from CAD, ODBMS) on extrapolations from SICB • Assess impact of more detailed simulations, parameterization of detector response • Estimate simulation needs (CPU, data volume) as a function of time

  4. SICB measurements • Mean raw event size: 500kBytes • Breakdown by subdetector • Includes all GEANT hit information • If only detector digitizing, truth tracks and relationships between the two are stored, reduces to 200kBytes/event • CPU time: 10kMIPS-seconds/event • No parameterization for calorimeter showers • Memory requirement: 50MBytes

  5. Requirements from tasks • For each of: • Software triggers • Reconstruction • Calibration and Alignment • Analysis • Detailed input from each subsystem: • Description of algorithms • Dataflow dependencies • Input and output data types and volume, calibration…. • Reliability • Quality assurance, monitoring, documentation…. • Performance • CPU requirements, frequency, rejection factors…. • Input to computing model

  6. Requirements from data • Identify types of data in terms of contents • Raw data (real and simulated) • Event tag data • Reconstruction objects • Analysis objects • Calibration data, configuration data, detector description • For each data type: • Estimate volume • Understand access patterns • Input to computing model

  7. Estimated volume of stored data

  8. How to store and accessa Pbyte of data? • Database approach • ODBMS • Direct access via queries (c.f. AltaVista) • Hierarchical storage • Where is data physically? • All at CERN, duplicated at regional centres, distributed • Sociological constraints • Network constraints • Fast networking: • 107 Internet users within few years, will bandwidth follow? • Security? • Internet vs. guaranteed bandwidth

  9. How to analyse PB of data? • 106 CERN Units = 3000 quad Pentium II (300MHz) • Current production farms have 100 nodes • Scalability? • 60 MCHF at today’s prices • Technology trends? • Follow industry standards • Commodity components • Cheap • Possibility to mix&match components • CPU, disks, memory, video displays, network cards… • Software libraries, databases, tools • Follow technology evolution closely • To avoid inappropriate decisions

  10. Computing facilities • Event filter farm and online reconstruction • Also available for reprocessing/simulation • Homogeneous online/offline computing architecture • Simulation farms • Can (should?) be distributed • Data server(s) • Storage of raw data at 20MB/s, ODBMS, HSS • Regional centres? • Analysis farms • Close to the data? • Desktop • Collaborative tools, SDE, analysis tools

  11. Conclusion • Collect detailed requirements from subsystems • Dataflow requirements • CPU requirements • ….. • Evaluate requirements (and constraints) of computing model • Data storage • Data access • Computing facilities • …. • Summarised in (draft) documents

More Related