520 likes | 537 Views
Learn how to leverage PROOF for distributed data analysis on clusters and the grid. Explore topics like setup, running sessions, error handling, authentication, and new features. Gain insights into PROOF's architecture and implementation.
E N D
Distributed Data Analysis with PROOF Fons Rademakers Bring the KB to the PB not the PB to the KB PROOF Tutorial
Outline • What is PROOF, basic principles • How is it implemented • Setting up PROOF on a cluster • PROOF on the Grid • Running a PROOF session • New exciting hardware • Conclusions • Demo PROOF Tutorial
PROOF Original Design Goals • Interactive parallel analysis on local cluster • Transparency • Same selectors, same chain Draw(), etc. on PROOF as in local session • Scalability • Quite good and well understood up to 1000 nodes (most extreme case) • Extensive monitoring capabilities • MLM (Multi-Level-Master) improves scalability on wide area clusters • Adaptability • Partly achieved, system handles varying load on cluster nodes • MLM allows much better latencies on wide area clusters • No support yet for coming and going of worker nodes PROOF Tutorial
stdout/obj proof ana.C proof TFile TFile TFile proof TNetFile proof proof proof = master server proof = slave server PROOF Parallel Execution #proof.conf slave node1 slave node2 slave node3 slave node4 Local PC Remote PROOF Cluster root *.root node1 ana.C *.root $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) $ root root [0] tree->Process(“ana.C”) $ root $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] chain->Process(“ana.C”) node2 *.root node3 *.root node4 PROOF Tutorial
PROOF Error Handling • Handling death of PROOF servers • Death of master • Fatal, need to reconnect • Death of slave • Master can resubmit packets of death slave to other slaves • Handling of ctrl-c • OOB message is send to master, and forwarded to slaves, causing soft/hard interrupt PROOF Tutorial
Authentication, Authorization • Use (x)rootd authentication plugins • Certificates (login and user name) • Single experiment wide login • User name used for sandbox • Authorization to sandbox and shared global space • Not to other user’s sandboxes under same account PROOF Tutorial
PROOF New Features • Support for “interactive batch” mode • Allow submission of long running queries • Allow client/master disconnect and reconnect • Powerful, friendly and complete GUI • Work in grid environments • Startup of agents via Grid job scheduler • Agents calling out to master (firewalls, NAT) • Dynamic master-worker setup PROOF Tutorial
PROOF Scalability 8.8GB, 128 files 1 node: 325 s 32 nodes in parallel: 12 s 32 nodes: dual Itanium II 1 GHz CPU’s, 2 GB RAM, 2x75 GB 15K SCSI disk, 1 Fast Eth, 1 GB Eth nic (not used) Each node has one copy of the data set (4 files, total of 277 MB), 32 nodes: 8.8 Gbyte in 128 files, 9 million events PROOF Tutorial
Architecture and Implementation PROOF Tutorial
TSelector – The User Code • Basic ROOT TSelector // Abbreviated version class TSelector : public TObject { Protected: TList *fInput; TList *fOutput; public void Init(TTree*); void Begin(TTree*); void SlaveBegin(TTree *); Bool_t Process(int entry); void SlaveTerminate(); void Terminate(); }; PROOF Tutorial
INTERACTIVE ANALYSIS – SELECTORS (1) • Create a selector: • TFile *fESD = TFile::Open(“AliESDs.root”); • TTree *tESD = (TTree *)fESD->Get(“esdTree”); • tESD->MakeSelector(); • Modify the selector accordingly. PROOF Tutorial
TChain – The Data Set • A TChain is a collection of TTrees root[0] TChain *c = new TChain(“esd”); root[1] c->Add(“root://rcrs4001/a.root”); … root[10] c->Print(“a”); root[11] c->Process(“mySelector.C”, nentries, first); • Returned by DB or File Catalog query etc. • Use logical filenames (“lfn:…”) PROOF Tutorial
Running a PROOF Session PROOF Tutorial
Running PROOF TGrid *alien = TGrid::Connect(“alien”); TGridResult *res; res = alien->Query(“lfn:///alice/simulation/2001-04/V0.6*.root“); TChain *chain = new TChain("AOD"); chain->Add(res); gROOT->Proof(“master”); chain->Process(“myselector.C”); // plot/save objects produced in myselector.C . . . PROOF Tutorial
New PROOF GUI PROOF Tutorial
New PROOF GUI PROOF Tutorial
New PROOF GUI PROOF Tutorial
New PROOF GUI PROOF Tutorial
Conclusions PROOF Tutorial
People Working On PROOF • Maarten Ballintijn • Bertrand Bellenot • Gerri Ganis • Jan Iwaszkiewics • Guenter Kickinger • Andreas Peters • Fons Rademakers PROOF Tutorial
Conclusions • The PROOF system on local clusters provides efficient parallel performance on up to O(1000) nodes • Combined with Grid middleware it becomes a powerful environment for “interactive” parallel analysis of globally distributed data PROOF Tutorial
PROOF FOR ALICE • CAF = CERN Analysis Facility • Cluster running PROOF • The CAF will be used for prompt processing of data • pp: Analysis • PbPb: Calibration & Alignment, pilot analysis PROOF Tutorial
CAF test system • 40 machines (each 2 CPUs, 200 GB disk) • Disks organized as xrootd pool 8 TB • Goal: up to 500 CPUs • Next months: Intensive testing • Usability • Performance PROOF Tutorial
Proof node Proof node Proof node Proof node Proof node Proof node local disk local disk local disk local disk local disk local disk CAF Schema Tier-1 data export Tape storage Experiment Disk Buffer Sub set (moderated) CAF computing cluster ...
BACKUP BACKUP PROOF Tutorial
ROOT in a Nutshell • An efficient data storage and access system designed to support structured data sets in very large distributed data bases (PetaBytes). • A query system to extract information from these distributed data sets. • The query system is able to use transparently parallel systems on the GRID (PROOF). • A scientific visualization system with 2-D and 3-D graphics. • An advanced Graphical User Interface. • A C++ interpreter allowing calls to user defined classes. • An Open Source Project (LGPL). PROOF Tutorial
ROOT - An Open Source Project • The project is developed as a collaboration between: • Full time developers: • 10 people full time at CERN • 2 developers at FermiLab • 1 key developer in Japan • 2 key developer at MIT • Many contributors spending a substantial fraction of their time in specific areas (> 50). • Key developers in large experiments using ROOT as a framework. • Several thousand users giving feedback and a very long list of small contributions, comments and bug fixes. PROOF Tutorial
The ROOT Web Pages http://root.cern.ch • General Information and News • Download source and binaries • HowTo’s & tutorials • User Guide & Reference Guide • Roottalk mailing list & Forum PROOF Tutorial
Data Access Strategies • Each slave gets assigned, as much as possible, packets representing data in local files • If no (more) local data, get remote data via (x)rootd (needs good LAN, like GB eth) • In case of SAN/NAS just use round robin strategy PROOF Tutorial
Workflow For Tree Analysis –Pull Architecture Slave 1 Master Slave N Process(“ana.C”) Process(“ana.C”) Initialization Packet generator Initialization GetNextPacket() GetNextPacket() 0,100 Process 100,100 Process GetNextPacket() GetNextPacket() 200,100 Process 300,40 Process GetNextPacket() GetNextPacket() 340,100 Process Process 440,50 GetNextPacket() GetNextPacket() 490,100 Process 590,60 Process SendObject(histo) SendObject(histo) Wait for next command Add histograms Wait for next command Display histograms PROOF Tutorial
Interactive/Batch queries Commands scripts Batch GUI stateful stateless stateful or stateless PROOF Tutorial
Analysis Session Example AQ1: 1s query produces a local histogram AQ2: a 10mn query submitted to PROOF1 AQ3->AQ7: short queries AQ8: a 10h query submitted to PROOF2 Monday at 10h15 ROOT session on my laptop BQ1: browse results of AQ2 BQ2: browse temporary results of AQ8 BQ3->BQ6: submit 4 10mn queries to PROOF1 Monday at 16h25 ROOT session on my laptop Wednesday at 8h40 Carrot session on any web browser CQ1: Browse results of AQ8, BQ3->BQ6 PROOF Tutorial
Sandbox – The Cache • Minimize the number of file transfers • One cache per file space • Locking to guarantee consistency • File identity and integrity ensured using • MD5 digest • Time stamps • Transparent via TProof::Sendfile() PROOF Tutorial
Sandbox – Package Manager • Provide a collection of files in the sandbox • Binary or source packages • PAR files: PROOF ARchive. Like Java jar • Tar file, ROOT-INF directory • BUILD.sh • SETUP.C, per slave setting • API to manage and activate packages PROOF Tutorial
Merge API • Collect output lists in master server • Objects are identified by name • Combine partial results • Member function: Merge(TCollection *) • Executed via CINT, no inheritance required • Standard implementation for histograms and (in memory) trees • Otherwise return the individual objects PROOF Tutorial
TPacketizer – The Work Distributor • The packetizer is the heart of the system • It runs on the master and hands out work to the workers • Different packetizers allow for different data access policies • All data on disk, allow network access • All data on disk, no network access • Data on mass storage, go file-by-file • Data on Grid, distribute per Storage Element PROOF Tutorial
PROOF SLAVE SERVERS PROOF SLAVE SERVERS PROOF PROOF PROOF PROOF Interactive Analysis with PROOF on a Large Cluster Slave servers access data via xrootd from local disk pools PROOF SLAVE SERVERS PROOF SUB-MASTER SERVER Proofd Startup Grid Service Interfaces PROOF MASTER SERVER TGrid UI/Queue UI Guaranteed site access through PROOF Sub-Masters calling out to Master (agent technology) Grid Access Control Service Grid/Root Authentication Grid File/Metadata Catalogue USER SESSION Client retrieves list of logical files (LFN + MSN) PROOF Tutorial
Exciting New Hardware PROOF Tutorial
Multi-Core CPU’s • Multi-Core CPU’s are the next step to stay on Moore’s performance curve • AMD currently price/performance leader • Dual-core Athlon64 X2 • Dual-core Opteron • Quad-core Opteron later this year • Intel going same way following AMD, now shipping dual-core Pentium-M laptop CPU’s PROOF Tutorial
Multi-Core CPU’s: They Deliver • We recently acquired a dual-core Athlon AMD64 X2 4800+ • One of the fastest machine ROOT has been compiled on (8m with parallel compile), and for sure the cheapest machine ever to deliver that performance • Two CPU’s for the price and power consumption of one, forget about traditional SMP machines • But not about SMP multi-core machines ;-) PROOF Tutorial
TB Disks • Current maximum is 500 GB 3.5” disks • End of next year 1 TB 3.5” is within reach • This means we will pass the breakeven point where it will be cheaper to have all the data on disk in stead of on tape PROOF Tutorial
Sandbox – The Environment • Each slave runs in its own sandbox • Identical, but independent • Multiple file spaces in a PROOF setup • Shared via NFS, AFS, shared nothing • File transfers are minimized • Cache • Packages PROOF Tutorial
Setting Up PROOF PROOF Tutorial
Setting Up PROOF • Install ROOT system • Add proofd and rootd to /etc/services • The rootd (1094) and proofd (1093) port numbers have been officially assigned by IANA • Start the rootd and proofd daemons (put in /etc/rc.d for startup at boot time) • Use scripts provided in $ROOTSYS/etc • Setup proof.conf file describing cluster • Setup authentication files (globally, users can override) PROOF Tutorial
PROOF Configuration File # PROOF config file. It has a very simple format: # # node <hostname> [image=<imagename>] # slave <hostname> [perf=<perfindex>] # [image=<imagename>] [port=<portnumber>] # [srp | krb5] # user <username> on <hostname> node csc02 image=nfs slave csc03 image=nfs slave csc04 image=nfs slave csc05 image=nfs slave csc06 image=nfs slave csc07 image=nfs slave csc08 image=nfs slave csc09 image=nfs slave csc10 image=nfs PROOF Tutorial
PROOF on the Grid PROOF Tutorial
PROOF Grid Interfacing • Grid file catalog • Data set creation • Meta data, #events, time, run, etc. • Event catalog • PROOF agent creation • Agents call out (no-incoming connection) • Config file generation / zero config • Grid aware packetizer • Scheduled execution • Limiting processing to specific part of the data set PROOF Tutorial
TGrid – Abstract Grid Interface class TGrid : public TObject { public: virtual Int_t AddFile(const char *lfn, const char *pfn) = 0; virtual Int_t DeleteFile(const char *lfn) = 0; virtual TGridResult *GetPhysicalFileNames(const char *lfn) = 0; virtual Int_t AddAttribute(const char *lfn, const char *attrname, const char *attrval) = 0; virtual Int_t DeleteAttribute(const char *lfn, const char *attrname) = 0; virtual TGridResult *GetAttributes(const char *lfn) = 0; virtual void Close(Option_t *option="") = 0; virtual TGridResult *Query(const char *query) = 0; static TGrid *Connect(const char *grid, const char *uid = 0, const char *pw = 0); ClassDef(TGrid,0) // ABC defining interface to GRID services }; PROOF Tutorial
AliEn Grid Interface UI to GRID Services + Files Abstract GRID Base Classes GRID Plugin Classes GRID Service Authentication - Job-Interface - Catalogue-Interface - File Listing/Location/Query TGrid TAlien TGrid::Connect(“alien://..) Alien API Service TFile TAlienFile xrootd TFile::Open(“alien://..) TXNetFile TFile Plugin Implementation for the “alien://” protocol Uses TAlien methods to translate “alien://” => “root://” URL (or others) PROOF Tutorial