1 / 26

Issues in managing HEP Software Development in a distributed environment

Issues in managing HEP Software Development in a distributed environment. Elizabeth Buckley-Geer Fermilab CHEP 2000, Padova, Italy. Contents. Characterizing the problem Key issues and solutions from CDF/D0 Collider Run II Some thoughts on the development process Conclusions.

Download Presentation

Issues in managing HEP Software Development in a distributed environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Issues in managing HEP Software Development in a distributed environment Elizabeth Buckley-Geer Fermilab CHEP 2000, Padova, Italy E. Buckley-Geer, CHEP 2000

  2. Contents • Characterizing the problem • Key issues and solutions from CDF/D0 Collider Run II • Some thoughts on the development process • Conclusions E. Buckley-Geer, CHEP 2000

  3. Characterizing the problem • Developer community of about 150 people (both collaborations) from North and South America, Europe, Asia, India, Russia • Widely varying quality of network connections between FNAL and remote locations • Widely varying abilities of groups to afford to purchase commercial tools E. Buckley-Geer, CHEP 2000

  4. Characterizing the problem • One common denominator since mid-1997: • Everyone can buy a cheap PC and run Linux on it • No more $10-20K workstations. Every member of the group can have a PC • They don’t want to rely on connecting to a central machine at FNAL to do code development • They want to make use of these PCs at their own location to do their code development • First release of CDF code for Linux was January 1998 – several years after the basic development environment was designed E. Buckley-Geer, CHEP 2000

  5. The situation during Run I(CDF - but similar for D0) • Highly centralized code development. • Could only realistically develop code on central machine at FNAL (VMS cluster) – no distributed development was supported even on other VMS systems • Code was ported to run on IRIX and AIX but only frozen releases were available on these platforms • Frozen release were distributed to remote sites as tar files or VMS save sets • Development version of the code was available to desktop VMS nodes at FNAL from 1993 onwards but code could not be committed to repository from these machines E. Buckley-Geer, CHEP 2000

  6. Run I development tools • Code was mostly Fortran with some small amounts of C. About 50 packages. • Used proprietary VMS tools for for version control and package building (CMS and MMS) • Used vendor compilers and debuggers . Only UNIX vendors who supported VMS extensions were considered. Luckily the list was sufficiently long! • No serious use of design tools – some early attempts at D0 but didn’t survive • No tools to locate memory leaks due to the nature of the memory management packages in use – YBOS and ZEBRA E. Buckley-Geer, CHEP 2000

  7. Goals for Run II development environment – early 1996 • Obviously needed to migrate from VMS as a primary platform • Provide ability to do remote development – recognized as important even before the Linux revolution • Reduce the need for proprietary tools for base system • Handle movefrom Fortran to C++ • Identify useful software engineering tools E. Buckley-Geer, CHEP 2000

  8. Configuration Management Joint Project • Formed joint D0, CDF, FNAL Computing Division working group to study configuration management in early 1996 (see E248 for more on Run II joint projects) • Charge was to find and implement a common solution for CDF and D0 for software management • Version control • Package and release organization • Building packages • Distribution • Validation E. Buckley-Geer, CHEP 2000

  9. Configuration Management Joint Project • Group looked at existing tools in use in HEP and elsewhere • Chose • CVS for version control with customizations from Sloan Digital Sky Survey (SDSS) • SoftRelTools from BaBar for package organization and building • UPS/UPD from FNAL for product setup and distribution tools E. Buckley-Geer, CHEP 2000

  10. CVS • Run in client/server mode – adopted from SDSS • Repository on server + cvsuser pseudo account running a restricted shell CVSH that only allows cvs commands to be executed • Local and remote access are identical so users do not need to be on a FNAL computer to access repository – necessary condition for remote development E. Buckley-Geer, CHEP 2000

  11. SoftRelTools (SRT) • Adapted from BaBar experiment • Uses cpp used to create dependencies and gmake used to build libraries & binaries • BaBar and FNAL agreed to diverge on development • It was becoming difficult to add new features given the original structure of the package • Have since done a re-write (Spring 1999) of the package at FNAL to make it more maintainable E. Buckley-Geer, CHEP 2000

  12. UPS – Unix Product Setup • FNAL product in use since 1991 • Supports existence of multiple versions of a product. Choice is made using a ‘setup’ command. • Re-write for Run II • Completed in summer 1998 • In use by both CDF and D0 E. Buckley-Geer, CHEP 2000

  13. Use of these tools at CDF • ~ 65 code developers • 1.3 million lines of code • 71% C++ , 20% Fortran, 8% C, 0.6% Java + external packages • 144 packages • Development release built every night on IRIX, TRU64, SUN, Linux • Daily build logs scanned for errors and reported to developers. Build logs are posted on web • Development builds lead to timely detection and fixing of bugs • Create frozen releases about every 2 months. Also create releases to capture code used for certain milestones. E. Buckley-Geer, CHEP 2000

  14. Use of these tools at CDF • Success of development rebuild varies. Somewhat correlated with number of files changed E. Buckley-Geer, CHEP 2000

  15. Use of these tools at D0 • ~60 code developers have write access to repository • Essentially 100% C++ except for external packages • 280 packages – but big variation in size • Test release of entire package weekly on IRIX and Linux. Goal is to have operational reconstruction exe at the end of every release. Currently 80% success rate. • Production releases occur at intervals determined by the management. Used to capture important milestones and provide stable working versions. • 5 production releases to date E. Buckley-Geer, CHEP 2000

  16. Code Distribution • CDF has a set of custom scripts to distribute code to remote sites. • Both frozen releases and development are distributed • Fairly straightforward to get distribution. • Currently fairly manpower intensive for development release on remote nodes – ½ FTE devoted for fixing problems • Working on switching to UPD for ease of maintenance • No significant automatic code distribution happening in D0 yet E. Buckley-Geer, CHEP 2000

  17. Code Distribution • Majority of distribution is to Linux machines E. Buckley-Geer, CHEP 2000

  18. Compilers • We wanted to write code that adhered to the C++ ANSI standard – not get into the Fortran extensions quagmire! • GCC and vendor compilers were not thought sufficiently compliant in summer 1997 • Chose KAI compiler from Kuck and Associates • Compiler was available on the relevant platforms – including LINUX • Has led to issues with availability of KAI versions of external products that must be built with the CDF/D0 software – e.g. we paid for a port of Open Inventor • We still believe it was the right choice at the time but expect to use EGCS and vendor compilers in the future E. Buckley-Geer, CHEP 2000

  19. Debuggers and other tools • Quality of the debugging tools has left a lot to be desired • This was one of the few downsides of choosing KAI. Things have been particularly problematic on Linux • Have purchased TotalView which is in use on IRIX and will shortly be available for Linux – seems to improve the situation • Case tools – used GDPro and Rational Rose • Mostly used to document design – did not use automatic code generation features • Purify and Insure++ used to look for memory leaks – but not currently available for Linux E. Buckley-Geer, CHEP 2000

  20. Licensed products • Has been very beneficial to negotiate license agreements that cover use of a product by all Run II developers independent of their location • Have done this with KAI, Open Inventor • Get better price - all licenses must be ordered through Fermilab E. Buckley-Geer, CHEP 2000

  21. Thoughts on the development process • Borrowing from the terminology and observations presented in “The Cathedral and the Bazaar” by Eric Raymond – O’Reilly Books • Our code is clearly Open Source because (by and large) it is freely available to anyone who wants to use it from another experiment • However, both CDF and D0 software projects are run using the traditional “cathedral” style of software development • This is necessitated by the requirements to provide schedules, obtain manpower resources from a limited pool, meet milestones and convince review committees that you know what you are doing • We can make some comparisons between aspects of the Open Source (aka Linux) model and what we are doing in HEP E. Buckley-Geer, CHEP 2000

  22. Thoughts on the development process • “Treat your users as co-developers” • Two user communities in an experiment • Those working on the software project – programmers and physicists • The rest of the experiment – the physicist-user • The first group tends to be like the Linux community – working on the project because they are interested in the problem and want to improve the product • The second group just want to use the software to get physics results – they want to improve their physics analysis software but not the infrastructure E. Buckley-Geer, CHEP 2000

  23. Thoughts on the development process • “Release early, release often” • CDF has shown that this leads to more timely bug fixes and shorter integration time and is very desirable for the project developers • However, it drives the physicist-user to distraction because he/she just wants something that works! • Have to have stable frozen releases in addition E. Buckley-Geer, CHEP 2000

  24. Thoughts on the development process • Some of the skills necessary to co-ordinate a successful Open Source project are relevant to managing an HEP computing project • Must have good people and communication skills • Need to be able to attract people to the project and keep them interested and happy • These can often be more important than possessing great technical prowess • If often feels like we are in a bazaar rather than a cathedral! E. Buckley-Geer, CHEP 2000

  25. Conclusions • CDF and D0 are successfully managing their software development projects with ~ 60 – 70 developers per experiment and 1 million lines of C++ each • We are expected to have schedules, milestones and reviews which makes it unlikely that we can ever manage a project using the bazaar model • However, some of the Open Source concepts are applicable to HEP projects E. Buckley-Geer, CHEP 2000

  26. Use of these tools at CDF • On days that development builds we create a rawhide release. This satisfies developers who need the up-to-date code but also need the whole release to actually build E. Buckley-Geer, CHEP 2000

More Related