1 / 16

PROOF - Parallel ROOT Facility

PROOF - Parallel ROOT Facility. Maarten Ballintijn http://root.cern.ch. Bring the KB to the PB not the PB to the KB. Agenda. Architecture TSelector TDSet Environment Implementation TProofPlayer TPacketizer Dynamic Binning Histograms Merge API. TSelector – The algorithms.

Download Presentation

PROOF - Parallel ROOT Facility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PROOF - Parallel ROOT Facility Maarten Ballintijn http://root.cern.ch Bring the KB to the PB not the PB to the KB

  2. Agenda • Architecture • TSelector • TDSet • Environment • Implementation • TProofPlayer • TPacketizer • Dynamic Binning Histograms • Merge API

  3. TSelector – The algorithms • Basic ROOT TSelector + small changes // Abbreviated version class TSelector : public TObject { Protected: TList *fInput; TList *fOutput; public void Init( TTree* ); void Begin( Ttree* ); Bool_t Process(int entry); void Terminate(); };

  4. TDSet – The data • Specify a collection of TTrees or files with objects [] TDSet *d = new TDSet(“TTree”, “tracks”, “/”); [] TDSet *d = new TDSet(“TEvent”, “”, “/objs”); [] d->Add(“root://rcrs4001/a.root”, “tracks”, “dir”, first, num); … [] d->Print(“a”); [] d->Process(“mySelector.C”, nentries, first); • Returned by DB or File Catalog query etc. • Use logical filenames (“lfn:…”)

  5. Sandbox – The Environment • Each slave runs in its own sandbox • Identical, but independent • Multiple file spaces in a PROOF setup. • Shared via NFS, AFS, multi CPU node • File transfers are minimized • Cache • Packages

  6. Sandbox – The Cache • Minimize the number of File transfers • One Cache per file space • Locking to guarantee consistency • File identity and integrity ensured using • MD5 digest • Time stamps • Transparent via TProof::Sendfile()

  7. Sandbox – Package Manager • Provide a collection of files in the sandbox • Binary or Source packages • PAR files: Proof ARchive. Like Java jar • Tar file, ROOT-INF directory • BUILD.C or BUILD.sh • SETUP.C, per slave setting • API manage and activate packages

  8. Implementation Highlights • TProofPlayer Class hierarchy • Basic API to process events in Proof • Implement Event Loop • Implement proxy for remote execution • TEventIter • Access to TTree or TObject derived collection • Cache File, Directory, Tree

  9. TProofPlayer Slave Client TProofServ TPPRemote Master TPPSlave TProof TProof TProofServ Slave TPPRemote TProofServ TPPSlave

  10. SendFile SendFile Process(dset,sel,inp,num,first) GetEntries Process(dset,sel,inp,num,first) GetPacket ReturnResults(out,log) ReturnResults(out,log) Simplified Message Flow Master Client Slave(s)

  11. Dynamic Histogram Binning • Implemented by extending THLimitsFinder • Avoid synchronization between Slaves • Keep score-board in master • Use histogram name as key • First slave posts limits • Others use these values

  12. Merge API • Collect output lists in master server • Objects are identified by name • Combine partial results • Member function: Merge(TCollection *) • Executed via CINT, no inheritance required • Standard implementation for Histograms • Otherwise return the individual objects

  13. Near Future • Few more weeks testing in Phobos • Beta test with a few other experiments • Basic documentation • Install and Configure guide • User HowTo • First Release in the next major ROOT Release

  14. Future • Ongoing development • Event lists • Friend Tree • Scalability to O(100) nodes • Multi site PROOF sessions • The GRID

  15. Demo! • The H1 example analysis code • Use output list for Histograms • Move fitting to client • 15 fold H1 example dataset at CERN • 4.1 Gbyte • 4.3 Million Events • 4 fold H1 example dataset at MIT

  16. Demo! • Client machine • PIII @ 1GHz / 512 MB • Standard IDE disk • Cluster with 15 nodes at CERN • Dual PIII @ 800 MHz / 384 MB • Standard IDE disk • Cluster with 4 nodes at MIT • Dual AthlonMP @ 1.4GHz / 1GB • Standard IDE disk

More Related