1 / 21

Developing Scalable High Performance Petabyte Distributed Databases

Developing Scalable High Performance Petabyte Distributed Databases. CHEP ‘98 Andrew Hanushevsky SLAC Computing Services Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy. BaBar & The B-Factory. High precision investigation of B-meson decays

kato-colon
Download Presentation

Developing Scalable High Performance Petabyte Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Scalable High PerformancePetabyte Distributed Databases CHEP ‘98 Andrew Hanushevsky SLAC Computing Services Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy

  2. BaBar & The B-Factory • High precision investigation of B-meson decays • Cosmic ray tracking starts October 1998 • Experiment starts April 1999 • 500 physicists collaborating from >70 sites in 10 countries • USA, Canada, China, France, Germany, Italy, Norway, Russia, UK, Taiwan • The experiment produces large quantities of data • 200 - 400 TB/year for 10 years • Data stored as objects using Objectivity • Heavy computational load • 5,000 SpecInt95’s • 526 Sun Ultra 10’s or 312 Alpha PW600’s • Work will be distributed across the collaboration

  3. Handling The Data & Computation RS/6000-F50’s AIX 4.2 Sun ES10000 Veritas FS/VM Sun Ultra 2’s Solaris 2.6 Sun ES4500’s Veritas FS/VM Solaris 2.5 HPSS Compute Farm Network Switch AMS Farm External Collaborators

  4. disk High Performance Storage System T a p e hpss app Control Network #Bitfile Server #Name Server #Storage Servers #Physical Volume Library # Physical Volume Repositories #Storage System Manager #Migration/Purge Server #Metadata Manager #Log Daemon #Log Client #Startup Daemon #Encina/SFS #DCE mover mover mover Data Network

  5. client ams disk Advanced Multithreaded Server • Client/Server Application • Serves “pages” (512 to 64K byte blocks) • Similar to other remote filesystem interfaces (e.g., NFS) • Objectivity client can read and write database “pages” via AMS • Pages range from 512 bytes to 64K in powers of 2 (e.g., 1K, 2K, 4K, etc.) • Enables Data Replication Option (DRO) • Enables Fault Tolerant Option (FTO) ufs protocol ams protocol

  6. Volume Manager RAID RAID RAID RAID RAID Veritas File System & Volume Manager • Volume Manager • Catenates disk devices to form very large capacity logical devices • Also s/w RAID-0,1,5 and dynamic I/O multi-pathing • File System • High performance journaled file system for fast recovery • Maximizes device speed/size performance (30+ MB/Sec for h/w RAID-5) • Supports 1TB+ files and file systems File System

  7. Together Alone …. • Veritas Volume Manager + Veritas File System • Excellent I/O performance (10 - 30 MB/Sec) but • Insufficient capacity (1TB) and online cost too high • AMS • Efficient database protocol and highly flexible but • Limited security, low scalability, tied to local filesystem • HPSS • Highly scalable, excellent I/O performance for large files but • High latency for small block transfers (i.e., Objectivity/DB) • Need to synergistically mate these three systems but • Want to keep them independent so any can be changed

  8. ufs hpss The Extensible AMS ams oofs interface glue System specific interface ooss vfs vfs hpss security

  9. An Object Oriented Interface class oofsDesc { // General File System Methods } class oofsDir { // Directory-Specific Methods } class oofsFile { // File-Specific Methods }

  10. The oofs Interface • Provides a standard interface for AMS to get at a filesystem • Any filesystem can be used that can implement the functions: • close getsize remove • closedir open rename • exists opendir sync • getmode read truncate • getsectoken readdir write • Includes all current POSIX-like filesystems • The oofs interface is linked with AMS to create an executable • Normally transparent to client applications • Timing may not be transparent

  11. The HPSS Interface • HPSS implements a “POSIX” filesystem • The HPSS API library provides sufficient oofs functionality • close() hpss_Close() • closedir() hpss_Closedir() • exists() hpss_Stat() • getmode() hpss_Stat() • getsectoken() not applicable • getsize() hpss_Fstat() • open() hpss_Open() [+ hpss_Create() ] • opendir() hpss_Opendir() • read() hpss_SetFileOffset() + hpss_Read() • readdir() hpss_Readdir() • remove() hpss_Unlink() • rename() hpss_Rename() • sync() not applicable • truncate() hpss_Ftruncate() • write() hpss_SetFileOffset() + hpss_Write()

  12. app ams security Additional Issues • Security • Performance • Access patterns (e.g., random vs sequential) • HPSS staging latency • Scalability ooss vfs hpss security

  13. Object Based Security Model • Protocol Independent Client Authentication Model • Public or private key • PGP, RSA, Kerberos, etc. • Can be negotiated at run-time • Provides for server authentication • AMS Client must call a special routine to enable security • oofs_Register_Security() • Supplied routine responsible for creating the oofsSecurity object • Client Objectivity Kernel creates security objects as needed • Security objects supply context-sensitive authentication credentials • Works only with Extensible AMS via oofs interface

  14. Supplying Performance Hints • Need additional information for optimum performance • Different from Objectivity clustering hints • Database clustering • Processing mode (sequential/random) • Desired service levels • Information is Objectivity independent • Need a mechanism to tunnel opaque information • Client supplies hints via oofs_set_info() call • Information relayed to AMS in a transparent way • AMS relays information to underlying file system via oofs()

  15. Dealing With Latency • Hierarchical filesystems may have high latency bursts • Mounting a tape file • Need mechanism to notify client of expected delay • Prevents request timeout • Prevents retransmission storms • Also allows server to degrade gracefully • Can delay clients when overloaded • Defer Request Protocol • Certain oofs() requests can tell client of expected delay • For example, open() • Client waits indicated amount of time and tries again

  16. Balancing The Load I • Dynamically distributed databases • Single machine can’t manage over a terabyte of disk cache • No good way to statically partition the database • Dynamically varying database access paths • As load increases, add more copies • Copies accessed in parallel • As load decreases, remove copies to free up disk space • Objectivity catalog independence • Copies managed outside of Objectivity • Minimizes impact on administration

  17. Balancing The Load II • Request Redirect Protocol • oofs () routines supply alternate AMS location • oofs routines responsible for update synchronization • Typically, read/only access provided on copies • Only one read/write copy conveniently supported • Client must declare intention to update prior to access • Lazy synchronization possible • Good mechanism for largely read/only databases • Load balancing provided by an AMS collective • Has one distinguished member recorded in the catalogue

  18. ams ams ams ams ams ams ams ams client The AMS Collective Collective members are effectively interchangeable AMS Collective 1 AMS Collective 2 redirect Distinguished Members

  19. Overall Effects • Extensible AMS • Allows use of any type of filesystem via oofs layer • Generic Authentication Protocol • Allows proper client identification • Opaque Information Protocol • Allows passing of hints to improve filesystem performance • Defer Request Protocol • Accommodates hierarchical filesystems • Redirection Protocol • Accommodates terabyte+ filesystems • Provides for dynamic load balancing

  20. vfs vfs vfs Dynamic Load Balancing Hierarchical Secure AMS ams Redwood ams Dynamic Selection hpss client Redwood ams Redwood

  21. Summary • AMS is capable of high performance • Ultimate performance limited by disk speeds • Should be able to deliver average of 20 MB/Sec per disk • The oofs interface + other protocols greatly enhance performance, scalability, usability, and security • SLAC will be using this combination to store physics data • BaBar experiment will produce over a 2 PB database in 10 years • 2,000,000,000,000,000 = 2´1015 bytes @ 200,000 3590 Tapes

More Related