1 / 15

BU SciDAC Meeting

Balint Joo Jefferson Lab. BU SciDAC Meeting. Anisotropic Clover. Why do it ? Anisotropy -> Fine Temporal Lattice Spacing at moderate cost Combine with Group Theoretical Baryon Operators -> Access to Excited States Nice preliminary results – with just Wilson Excited states

ralph-adams
Download Presentation

BU SciDAC Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Balint Joo Jefferson Lab BU SciDAC Meeting

  2. Anisotropic Clover • Why do it ? • Anisotropy -> Fine Temporal Lattice Spacing at moderate cost • Combine with Group Theoretical Baryon Operators -> Access to Excited States • Nice preliminary results – with just Wilson • Excited states • States with spin 5/2+ http://arxiv.org/pdf/hep-lat/0601029 http://arxiv.org/pdf/hep-lat/0609052

  3. Anisotropic Clover • Why do it ? • Part of Jlab 3 prong Lattice QCD programme • Prong 1: Dynamical Anisotropic Clover • Prong 2: DWF on a staggered sea (MILC Configs) • Prong 3: Large Scale Dynamical DWF • This programme was specially commended by the DOE at our recent Science and Technology Review • Anisotropic Clover is a major part of the INCITE proposal (for XT3 and BG/?) machines

  4. Anisotropic Clover • Level 2 • Clover Term and Inverse & Force Term • Wired into Chroma -> Provides HMC/RHMC • Our Choice of Gauge Action: • Plaquette + Rectangle + Adjoint Term • Fermion Action • Anisotropic Clover + Stout Smearing • Stout Force Recursion • Usual Barrage of DF techniques • Hasenbusch + Chronology for 2 flavours • RHMC for the +1 flavour • Multi time scale integrators

  5. CG Inverter Performance We only got 7.3Tflops on 8K CPUs :( - but we didn't work much at all at optimzation

  6. Clover Work Under SciDAC 2 • Performance is OK but want better... • Optimizations • Clover • SSE Optimizations for Clusters & XT3 • BAGEL terms for BG/??? • Multi Mass Inverter, Trace Terms • Would like to optimize the actual bottleneck • CG Inverter is not the current bottleneck • Help from our friends at RENCI at identifying the exact hotspots? (Right now we rely on gprof) • Algorithmic: Temporal Preconditioning ('later)

  7. Thoughts at the back of my mind • Are we actually going to get any time at ORNL? • We asked for a lot • I think 20M CPU hours just for the clover stuff • Incite proposal was extremely hurried • We had to respond very quickly • Many small groups did not have (stand?) a chance • How much effort should we be investing? • Should we be focusing on BlueGene/? and clusters more?

  8. CRE and ILDG • Progress on CRE has been slow. Why? • Manpower reasons in SciDAC 1? • People are happily running production already without it? In which case is it just LOW VALUE? • where are the 'armies of new users' who need it? • What are the issues? • Intimately tied to infrastructure at each site. • site infrastructure leverages off experiments • different everywhere • High Maintenance • PBS, LoadLeveller, NSF? dcache anyone? • upgrade of mvapich, OpenMPI, IB fabric etc • Inherently non portable (what about ANL/ORNL)

  9. CRE and ILDG • If it has low value, no user demand and is high maintenance and won't work outside our sites.... • is it worth doing? • can we just drop it ? PLEASE? • Anyway common environments are so passe and 90s. Nowadays we should think about 'interoperable grid environments' – they're IN!

  10. ILDG • Middleware Progressed • but still on eXist MDC • dumb RC: (just remap the LFN to a FNAL dcache name) • Issues: • Where is all the markup ? • Eventually need more sophisticated RC ? • Markup is NOT anisotropy aware (future fights in the MDWG – will take time) • working towards interoperability • Meeting at JlLab Dec 11-13. Can folks from BNL and FNAL come?

  11. Testing and Release • Unit Testing v.s. End to End Testing • Too much existing code • We intermix • QMP, QDP++, QIO, XpathReader, LIME, Chroma, Wilson Dslash or BAGEL Dslash, possibly BAGEL linear algebra, level 3 CG-DWF • Unit testing all of these is difficult • End to End Tests: Compare the final result • eg: correlation functions • Lots of output – selective diffs? • QDP++ Uses XML, Selective Diffs through XMLDiff

  12. Structure • Test Consists of • Executable, Input XML, Expected Output XML • Metric file to decide which bits of the Output we need to check • Runner – abstract away running • Trivial Runner (just re-echoes your commands) • MPIRUN runner (runs on 2 Jlab IB nodes) • prototype YOD runner (for XT3) • LoadLeveller runner (for BG/L) – yucky • Driver Scripts • run interactively (eg scalar targets) & check • submit jobs to a queue, check later (for queues)

  13. What has testing taught us? • We run through this regression framework nightly: gcc3,gcc4, scalar, parscalar-ib • What runs fine with gcc3.x on RHEL won't necessarily run fine with gcc4.x on FC5 • Maintenance: • Keep up with compilers – identify problems • ICC – catastrophic error: can't allocate register (SSE inline) • VACPP (XLC) – 'Internal Compiler error: Please contact IBM representative' on templates • PGI: No inline assembler? intrinsics? • we really MUST focus on this issue • or will it be GCC 3.4.x forever (seems most stable so far)

  14. SciDAC Release Pages? • What's the actual problem here? • Jlab page has releases that live in the JLAB CVS • release directory previous versions (by vox populi) • We strive to keep the pages up to date • Not everyone uses Jlab CVS. Why? • do you prefer to run your own repository? • do you you want to use Subversion? • do you think only sissies use version control? • Centralizing release management is bad • imagine if I had to be responsible for the release of a code that I myself could only pick up by web page? • Is it only John Kogut who is unhappy?

  15. A possible solution ... • ... to the problem which may or may not exist • A SourceForge like setup (Gforge) • Provides Per Project • Web-Space, • Release Tarball Space • Source Code Management Modules (CVS & SVN) • May be able to 'proxy' for your own repo. • Mailing Lists, Bugtracker, Newsfeeds yadda yadda • Wiki like authentication • Our new Sysadmins are installing this at JLAB • But all the effort iswasted if folks don't use it...

More Related