120 likes | 224 Views
Explore powerful tools PROOF and AnT for data analysis in PHOBOS collaboration. Learn about PROOF architecture, AnT Trees, and how to optimize analysis using PROOF. Get involved to test these tools and enhance data analytics.
E N D
PROOF and AnT in PHOBOS Kristjan Gulbrandsen March 25, 2004 Collaboration Meeting
What is PROOF? • A system integrated into ROOT which allows for interactive analysis of large data sets using parallel processing and I/O • Transparent – difference between running a local session and over multiple computers is minimal • Adaptable – can react network conditions, system performance and multiple architectures • Scalable – no manifest limitations on size
Slave Slave Master Slave Slave Internet PROOF Architecture • Client connects to a master server local tocluster • Master server talks to slaves on nodes where (ideally) data is located • Slaves run in parallel • Master server collects results minimizing slow interaction with client User
TSelector Interface (n) Slaves: SlaveBegin() Process() … Process() … Process() … SlaveTerminate() Client: Begin() Terminate() Create histograms class TSelector{ Begin() SlaveBegin() Process() SlaveTerminate() Terminate() } } code normally in for loops If a tree exists,tree->MakeSelector() creates a skeleton class deriving from TSelector A copy of each object exists in each slave
Using PROOF • Call gROOT->Proof(“proof://<cluster>”) to begin a proof session • A set of file names must be added to a TDSet similar to adding files to a TChain • Call TDSet->Process(<selector file>)where <selector file> contain TSelector code • Additional supporting files/libraries can be used by creating PAR files
stdout/obj proof ana.C proof TFile TFile TFile proof TNetFile proof proof proof = master server proof = slave server #proof.conf slave node1 slave node2 slave node3 slave node4 PROOF Execution Local PC Remote PROOF Cluster root *.root node1 ana.C *.root $ root root [0] .x ana.C $ root root [0] .x ana.C root [1] gROOT->Proof(“remote”) $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] dset->Process(“ana.C”) $ root node2 *.root node3 *.root node4
PROOF in PHOBOS • PROOF is installed on the Pharm cluster • Newest ROOT version (4.00/03) is needed and exists in /usr/local/root • Proofserver is compiled with libnew (for now to allow PhatII classes to be used without modification • PhatII structure is ideal for transferring individual libraries among slave nodes
AnT Trees • A tree format has been created to hold summary information for analyses • Trees are designed to have basic summaryinformation used for analyses and allow pieces of data to be ignored (not read)decreasing I/O • TRefs allowing partial information to be read in while maintaining the ability to cross reference information (i.e. tracks referring to their hits)
AnT Structure TriggerInfo: IsCol L0 L1 EOct ERing TrgT_Extra[] TrgE_Extra[] Tracks[]: PID Charge MeandE SigmadE Prob Chi^2 Xprod[3] Mom[3] HitArray[]-> Vertex[]: Status ID Prob Pos[3] Sigma[3] EventInfo: Run Seq Ev_No Date Time Polarity Prim_vtx-> Hits[]: Layer SensorLabel dE Pos[3] Pad[2] Paddle: TruncMeanP TruncMeanN SumP SumN TDiff ZDC: SumP SumN TZDCP TZDCN TOF Info? PCAL Info? HitArrays are being developed
Current AnT Trees • Prototype AnT trees currently exist onPharm (10 runs, 56 Seqs) and can be used • Analysis personnel needed to use the trees and provide information about necessary additions making them useful for many analyses
Analysis using AnT/PROOF • AnT/PROOF has been used to generate pt distributions from current data • Using AnT/PROOF speeds up analysis froman hour to a minute Disabling hit read in speeds up processing by more than factor of 10
Summary • PROOF is ready for use on Pharm. • Simple example macros exist explaining how to use PROOF • AnT trees have been created for quick analysis of large data sets in conjunction with PROOF • Users are needed to test/try both PROOF and AnT to provide information on data format and stressPROOF architecture