1 / 14

On Distributed Database Deployment for the LHC Experiments

This project aims to increase availability and scalability of the LCG and experiment components by providing a distributed database infrastructure for accessing and replicating relational databases. It will bring service providers closer to users and developers, allowing for a consistent and location-independent access to data.

sherilyn
Download Presentation

On Distributed Database Deployment for the LHC Experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Distributed Database Deployment for the LHC Experiments I. Bird, M. Lamanna, D. Düllmann, M. Girone, J. Shiers (CERN) A. Vaniachine, D. Malon (ANL) CHEP 2004, Interlaken, Switzerland

  2. Regional Centres Connected to the LCG • more than 70 sites world wide • more than 7,000 CPUs • reached 100k jobs on the grid/week D.Duellmann, CERN

  3. Why a LCG Database Deployment Project? • LCG today provides an infrastructure for distributed access to file based data and file replication • Physics applications (and grid services) require a similar services for data stored in relational databases • Several applications and services already use RDBMS • Several sites have already experience in providing RDBMS services • Goals for common project as part of LCG • increase the availability and scalability of LCG and experiment components • allow applications to access data in a consistent, location independent way • allow to connect existing db services via data replication mechanisms • simplify a shared deployment and administration of this infrastructure during 24 x 7 operation • Need to bring service providers (site technology experts) closer to database users/developers to define a LCG database service • Time frame: First deployment in 2005 data challenges (autumn ‘05) D.Duellmann, CERN

  4. Project Non-Goals • Store all database data • Experiments are free to deploy databases and distribute data under their responsibility • Setup a single monolithic distributed database system • Given constraints like WAN connections one can not assume that a single synchronously updated database work and provide sufficient availability. • Setup a single vendor system • Technology independence and a multi-vendor implementation will be required to minimize the long term risks and to adapt to the different requirements/constraints on different tiers. • Impose a CERN centric infrastructure to participating sites • CERN is one equal partner of other LCG sites on each tier • Decide on an architecture, implementation, new services, policies • Produce a technical proposal for all of those to LCG PEB/GDB D.Duellmann, CERN

  5. Situation on the Application Side • Databases are used by many applications in the physics production chain • Currently many of these applications are run centralized • Several of these applications expect to move to a distributed model for scalability and availability reasons • This move can be simplified by a generic LCG database distribution infrastructure - but still will not happen by magic • Choice of the supported database • Is often made by application developers • Not necessarily yet with the full deployment environment in mind • Need to continue to make key applications vendor neutral • DB abstraction layers exist or are being implemented in many foundation libraries • OGSA-DAI, ODBC, JDBC, ROOT, POOL, … are steps in this direction • Degree of the abstraction achieved varies • Still many applications which are only available for one vendor • Or have significant schema differences which forbid DB<->DB replications D.Duellmann, CERN

  6. Database Services at LCG Sites Today • Several sites provide Oracle production services for HEP and non-HEP applications • Deployment experience and procedures exists… • … but can not be changed easily without affecting other site activities • MySQL is very popular in the developer community • Used for some production purposes in LHC, though not at large scales • Expected to deployable with limited db administration resources • So far no larger scale production service exists at LCG sites • But several applications are bound to MySQL • Expect a significant role for both database flavors • To implement different parts of the LCG infrastructure D.Duellmann, CERN

  7. Local Database vs. Local Cache • FNAL experiments deploy a combination of http based database access with web proxy caches close to the client • Performance gains • reduced real database access for largely read-only data • reduced transfer overhead compared to low level SOAP RPC based approaches • Deployment gains • Web caches (eg squid) are much simpler to deploy than databases and could remove the need for a local database deployment on some tiers • No vendor specific database libraries on the client side • “Firewall friendly” tunneling of requests through a single port • Expect cache technology to play a significant role towards the higher tiers which may not have the resources to run a reliable database service D.Duellmann, CERN

  8. Application s/w stack and Distribution Options client s/w APP RAL = relational abstraction layer RAL web cache network SQLite file web cache Oracle MySQL db & cache servers db file storage D.Duellmann, CERN

  9. Tiers, Resources and Level of Service • Different requirements and service capabilities for different tiers • Tier1 Database Backbone • High volume, often complete replication of RDBMS data • Can expect good network connection to other T1 sites • Asynchronous, possibly multi-master replication • Large scale central database service, local dba team • Tier2 • Medium volume, often only sliced extraction of data • Asymmetric, possibly only uni-directional replication • Part time administration (shared with fabric administration) • Tier3/4 (eg Laptop extraction) • Support fully disconnected operation • Low volume, sliced extraction from T1/T2 • Need to deploy several replication/distribution technologies • Each addressing specific parts of the distribution problem • But all together forming a consistent distribution model D.Duellmann, CERN

  10. Starting Point for a Service Architecture? M M M Oracle Streams Cross vendor extract MySQL Files Proxy Cache O T0 - autonomous reliable service T3/4 T1- db back bone - all data replicated - reliable service T2 - local db cache -subset data -only local service O O M D.Duellmann, CERN

  11. LCG 3D Project WP1 -Data Inventory and Distribution Requirements • Members are s/w providers from experiments and grid services that use RDBMS data • Gather data properties (volume, ownership) requirements and integrate the provided service into their software WP2 - Database Service Definition and Implementation • Members are site technology and deployment experts • Propose a deployment implementation and common deployment procedures WP3 - Evaluation Tasks • Short, well defined technology evaluations against the requirements delivered by WP1 • Evaluation are proposed by WP2 (evaluation plan) and typically executed by the people proposing a technology for the service implementation and result in a short evaluation report D.Duellmann, CERN

  12. Data Inventory • Collect and maintain a catalog of main RDBMS data types • Select from catalog of well defined replication options • which can be supported as part of the service • Conditions and Collection/Bookkeeping data are likely candidates • Experiments and grid s/w providers fill a table for each data type which is candidate for storage and replication via the 3D service • Basic storage properties • Data description, expected volume on T0/1/2 in 2005 (and evolution) • Ownership model: read-only, single user update, single site update, concurrent update • Replication/Caching properties • Replication model: site local, all t1, sliced t1, all t2, sliced t2 … • Consistency/Latency: how quickly do changes need to reach other sites/tiers • Application constraints: DB vendor and DB version constraints • Reliability and Availability requirements • Essential for whole grid operation, for site operation, for experiment production, • Backup and Recovery policy • acceptable time to recover, location of backup(s) D.Duellmann, CERN

  13. Service Definition and Implementation • DB Service Discovery • How does a job find a close by replica of the database it needs? • Need transparent (re)location of services - eg via a database replica catalog • Connectivity, firewalls and connection constraints • Access Control - authentication and authorization • Integration between DB vendor and LCG security models • Installation and Configuration • Database server and client installation kits • Which database client bindings are required (C, C++, Java(JDBC), Perl, ..) ? • Server and client version upgrades (eg security patches) • Are transparent upgrades required for critical services? • Server administration procedures and tools • Need basic agreements to simplify shared administration • Monitoring and statistics gathering • Backup and Recovery • Backup policy templates, responsible site(s) for a particular data type • Acceptablelatency for recovery • Bottom line: service effort should not be underestimated! • We are rather close to LHC startup and can only afford to propose models that have a good chance of working! • Do not just hope for good luck; These services will be a critical part of the experiments’ infrastructure and should be handled accordingly! D.Duellmann, CERN

  14. Summary • Together with the LHC experiments LCG will define and deploy a distributed database service at Tier 0-2 sites • Several potential experiment applications and grid services exist but need to be coupled to the upcoming services • development work will be required on the 3D service and the application side • Difference in available T0/1/2 manpower resources will result in different level of service • a multi-vendor environment has been requested to avoid of vendor coupling and to support the existing s/w base • The 3D project ( http://lcg3d.cern.ch ) has been started in the LCG deployment area to coordinate this activity • Meetings in the different working groups are starting to define the key requirements and verify/adapt the proposed model • Prototyping of reference implementations of the main model elements has started and should soon be extended to a (small) multi-site test-bed • Need to start pragmatic and simple to allow for first deployment in 2005 • A 2005 service infrastructure can only draw from already existing resources • Requirements in some areas will only become clear during first deployment when the computing models in this area firm up D.Duellmann, CERN

More Related