lfc and dpm n.
Skip this Video
Loading SlideShow in 5 Seconds..
LFC and DPM PowerPoint Presentation
Download Presentation


92 Views Download Presentation
Download Presentation


- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. LFC and DPM Jean-Philippe Baud, IT-GD, CERN November 2007

  2. Agenda • Goals for LFC and DPM • DPM architecture • Simple design • Good coding practices • Secure services • Testing • Operations Reliability Workshop: LFC-DPM

  3. Goals for LFC and DPM • LFC: LCG File Catalogue • Replace EDG RLS • Provide hierarchical name space, access control lists, sessions and transactions • DPM: Disk Pool Manager • Provide a scalable solution to replace the Classic Storage Elements at Tier2s • Focus on manageability • Easy to install • Easy to configure • Low effort for ongoing maintenance • Easy to add/remove resources • Integrated security (authentication/authorization) Reliability Workshop: LFC-DPM

  4. DPM architecture CLI, C API, SRM-enabled client, etc. DPM head node DPMCOPY node • DPM Name Server • Namespace • Authorization • Physical files location • DPM Server • Requests queuing and processing • Space management • SRM Servers (v1.1, v2.1, v2.2) • Disk Servers • Physical files • Direct data transfer from/to disk server (no bottleneck) data transfer … DPM disk servers Reliability Workshop: LFC-DPM

  5. DPM architecture (head node) Space manager Request Scheduler Persistency Server side DPM daemon DPM client Interoperability Asynchronous requests to DB Database backend DPM tables DPM client DPM client SRM v1 and v2 daemons DPNS tables DPNS daemon DPM client Maestro of metadata Metadata Control interface SRM client Insert/select data to/from the DPM tables Synchronous requests Insert/select data to/from the DPNS tables Authentication Lcg-util/gfal Control data Reliability Workshop: LFC-DPM

  6. Simple design (1) • DPM architecture is database centric • Only 2 DBs • Fairly simple schema • No complex query (mostly key access) • Use of bind variables, indices, transactions and integrity constraints • Automatic reconnection to the DB (allows transparent failover when using Oracle) Reliability Workshop: LFC-DPM

  7. Simple design (2) • Few daemons • Mainly communicating through the DB • Stateless • Configuration is kept in DB • A given daemon can be restarted on a different server • Scalability and high availability • All servers (except the DPM one) can be replicated if needed (DNS load balancing) • Daemons can be restarted independently • Automatic retries in clients Reliability Workshop: LFC-DPM

  8. Good coding practices • For long term maintainability of the code • Portable code (compiled and tested on several platforms) • Modular code with enough comments • Protect against buffer overrun • Check validity of parameters • Check for memory leaks • Avoid mutexes in multi-threaded applications for performance reason (good design is needed) • Code profiling Reliability Workshop: LFC-DPM

  9. Security • All control and I/O services have security built-in (GSI) • The entries in the name space can be protected by Posix Access Control Lists • All privileged operations can only be done with a Host Certificate on a trusted host • VOMS integration: groups, sub-groups and roles are supported • The DNs and VOMS FQANs are mapped to virtual ids (no pool account) • All the groups present in the proxy are used for authorization in the namespace • Only the primary group/role is used in disk pool selection Reliability Workshop: LFC-DPM

  10. Testing • Unit tests • Test of new features • Test after bug fixes • Functional tests • Full test suite • Interoperability testing (SRM) • Stress tests • Find the limits of the system • Discover timing and corner problems • Pilot service (LFC only) • Test of bulk methods by Atlas • Test of new permission and ownership scheme (LHCb) Reliability Workshop: LFC-DPM

  11. Operations • Common logging format with timestamps and user identity • LFC upgrade is transparent if no DB schema change and if 2 frontends are used • We limit the number of DB schema updates to about once a year • LFC and DPM databases do not need to run on the same machine as the frontend server • Monitoring scripts (LFC) • Number of threads, response time, DB errors Reliability Workshop: LFC-DPM

  12. Conclusion • The LFC and DPM have become very popular (more than 100 sites are using them for many VOs) • The simple and robust design allows us to do external site support with less than one FTE at CERN • Documentation: • • Reference man pages • Admin guide • Troubleshooting Reliability Workshop: LFC-DPM