1 / 43

Data Production Data Analysis at CERN

Data Production Data Analysis at CERN. Nestl é Research Center 23 Mai 2007 Ren é Brun / CERN. The Large Hadron Collider (LHC)  is being built in a circular tunnel 27 km in circumference. The tunnel is buried around 50 to 175 m. underground. It straddles the Swiss and French

adaparker
Download Presentation

Data Production Data Analysis at CERN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data ProductionData Analysisat CERN Nestlé Research Center 23 Mai 2007 René Brun/ CERN

  2. The Large Hadron Collider (LHC)  is being built in a circular tunnel 27 km in circumference. The tunnel is buried around 50 to 175 m. underground. It straddles the Swiss and French borders on the outskirts of Geneva. Data Production & Analysis at CERN

  3. Overview of CERN's accelerator layout Data Production & Analysis at CERN

  4. Overview of CERN's accelerator layout Data Production & Analysis at CERN

  5. CERN Accelerators life time SPS Design/Simulation LEP Run Construction LHC 1982 1998 1994 2000 2015 1975 Data Production & Analysis at CERN

  6. Data Production & Analysis at CERN

  7. Lowering dipole magnet (one of 1232) in tunnel Data Production & Analysis at CERN

  8. The key element – the 1232 dipoles bend the beam around the 27 km circumference Data Production & Analysis at CERN

  9. Data Production & Analysis at CERN

  10. The LHC collaborations (1995->20XX) • ALICE, ATLAS, CMS, LHCb • >5000 physicists • > 500 Univ or Labs • Many PetaBytes per year. • Billions of events. Data Production & Analysis at CERN

  11. A typical detector component Data Production & Analysis at CERN

  12. More than 10 millions electronic channels per experiment Data Production & Analysis at CERN

  13. A simulated collision Data Production & Analysis at CERN

  14. Data Production & Analysis at CERN

  15. How Much Data is Involved? High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 1 billion people surfing the Web LHCB ATLAS CMS 105 HERA-B KLOE CDF II High Data Archive(5 PetaBytes/year) 10 Gbits/s in Data base 104 CDF 103 H1ZEUS ALICE NA49 UA1 STAR 102 104 105 106 107 LEP Event Size (bytes) Data Production & Analysis at CERN

  16. Data Production & Analysis at CERN

  17. Data Production & Analysis at CERN

  18. Lhe LHC collaborations (some parameters) Data Production & Analysis at CERN

  19. Data Production & Analysis at CERN

  20. From Mainframes to the GRID Data Production & Analysis at CERN

  21. Data Production & Analysis at CERN

  22. LHC collaborations (Analysis Steps) Raw Data (PetaBytes) DAQ -> T0 -> T1 After reconstruction (100 TeraBytes) T1 -> T2 Ready for analysis (10 TeraBytes) T2 -> T3 Analysis par physicist (1 TeraByte) Data Production & Analysis at CERN

  23. Tools for Data Storage Objectivity hydra ROOT hydra zbook oracle 1982 1998 1994 2000 2020 1975 Data Production & Analysis at CERN

  24. Tools for Data Visualization & Analysis LHC++ ROOT PAW JAS 1982 1998 1994 2000 2020 1975 Data Production & Analysis at CERN

  25. VOBOX::SA xrootd (master) Data Storage and Access Tools Disk DPM SRM xrootd (worker) SRM xrootd(worker) Castor SRM ROOT xrootd (worker) MSS dCache SRM xrootd emulation (worker) MSS Data Production & Analysis at CERN

  26. The ROOT open Source Projecthttp://root.cern.ch ROOT 5.12 functionality ROOT 3.0 LHC Large Hadron Collider ROOT 2.0 RHIC, FNAL/RUN II Babar, KEK, SPS,FNAL ROOT 1.0 LEP,HERA,SPS ROOT 0.5 1995 2000 2005 Data Production & Analysis at CERN

  27. from plotters to objects All items are clickable objects Data Production & Analysis at CERN

  28. Can take advantage of graphics accelerators Data Production & Analysis at CERN

  29. GUI Examples Data Production & Analysis at CERN

  30. ROOT Math Libraries Data Production & Analysis at CERN

  31. Multivariate Analysis/ Cluster Analysis Data Production & Analysis at CERN

  32. Self-describing files • Dictionary for persistent classes written to the file. • ROOT files can be read by foreign readers • Support for Backward and Forward compatibility • Files created in 2001 must be readable in 2015 • Classes (data objects) for all objects in a file can be regenerated via TFile::MakeProject Root >TFile f(“demo.root”); Root > f.MakeProject(“dir”,”*”,”new++”); Data Production & Analysis at CERN

  33. Objects in directory /pippa/DM/CJ eg: /pippa/DM/CJ/h15 A Root file pippa.root with two levels of directories Data Production & Analysis at CERN

  34. Memory <--> TreeEach Node is a branch in the Tree Memory T.GetEntry(6) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 T.Fill() 18 T Data Production & Analysis at CERN tr

  35. 8 leaves of branch Electrons A double-click to histogram the leaf 8 Branches of T Data Production & Analysis at CERN

  36. Chains of Trees • A TChain is a collection of Trees. • Same semantics for TChains and TTrees • root >.x h1chain.C • root >chain.Process(“h1analysis.C”) { //creates a TChain to be used by the h1analysis.C class //the symbol H1 must point to a directory where the H1 data sets //have been installed TChain chain("h42"); chain.Add("$H1/dstarmb.root"); chain.Add("$H1/dstarp1a.root"); chain.Add("$H1/dstarp1b.root"); chain.Add("$H1/dstarp2.root"); } Data Production & Analysis at CERN

  37. Access Transparency TFile *f1 = TFile::Open(“local.root”) TFile *f2 = TFile::Open(“root://cdfsga.fnal.gov/bigfile.root”) TFile *f3 = TFile::Open(“rfio:/castor.cern.ch/alice/aap.root”) TFile *f4 = TFile::Open(“dcache://main.desy.de/h1/run2001.root”) TFile *f5 = TFile::Open(“chirp://hep.wisc.edu/data1.root”) TFile *f5 = TFile::Open(“http://root.cern.ch/geom/atlas.root”) Data Production & Analysis at CERN

  38. Data Sets Hierarchy +100 Millions files per experiment ! Copied/distributed in many sites around the world 100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB 1 1 5 50 500 5000 50000 TTree TChain A TFile typically contains 1 TTree A TChain is a collection of TTrees or/and TChains A TChain is typically the result of a query to the file catalogue Data Production & Analysis at CERN

  39. Interactive and batch tasks Same Interface for batch et interactive systems Medium term jobs, e.g. analysis design and development using also non-local resources Analysis jobs with well defined algorithms (e.g. production of personal trees) Interactive analysis using local resources, e.g. • end-analysis calculations • visualization Data Production & Analysis at CERN

  40. Sample of analysis activity G. Ganis, CHEP06, 15 Feb 2006 Monday at 10h15 ROOT session on my laptop AQ1: 1s query produces a local histogram AQ2: a 10mn query submitted to PROOF1 AQ3->AQ7: short queries AQ8: a 10h query submitted to PROOF2 Monday at 16h25 ROOT session on my laptop BQ1: browse results of AQ2 BQ2: browse temporary results of AQ8 BQ3->BQ6: submit 4 10mn queries to PROOF1 Wednesday at 8h40 Browse from any web browser CQ1: Browse results of AQ8, BQ3->BQ6 Data Production & Analysis at CERN

  41. From Laptop to the GRIDParallelism at all levels Online/Offline Farms Local/remote Storage Laptop Data Analysis tools must be able to exploit parallelism on multi-core laptops, use remote computers in parallel as well as storage elements and networks in a transparent way GRID Data Production & Analysis at CERN

  42. Batch: Classical Approach catalog files query jobs data file splitting myAna.C merging final analysis outputs submit Storage Batch farm queues manager • “static” use of resources • jobs frozen, 1 job / worker node • “manual” splitting, merging • limited monitoring (end of single job) Data Production & Analysis at CERN

  43. Interactive Parallel ROOT/PROOF files scheduler query PROOF query: data file list, myAna.C feedbacks (merged) final outputs (merged) catalog Storage PROOF farm MASTER • farm perceived as extension of local PC • more dynamic use of resources • real time feedback • automated splitting and merging Data Production & Analysis at CERN

More Related