An Overview of a Scalable Distributed Database System: SD-SQL Server

An Overview of a Scalable Distributed Database System: SD-SQL Server Witold LITWIN, Soror SAHRI &Thomas SCHWARZ Witold.litwin@dauphine.frSoror.sahri@dauphine.frtjschwarz@scu.edu Ceria Laboratory Comp. Eng. Dep. Paris-Dauphine University Santa Clara U. BNCOD 2006

Overview • Introduction • Architecture • Command Interface • Processing • Performance • Conclusion & Future Work

Partitioned Tables • Most DBSs have distributed/parallel versions with partitioned tables • SQL Server, Oracle, DB2, MySQL, Postgres…

BENEFITS OF PARTITIONING BENEFITS OF PARTITIONING Partitioning can provide tremendous benefits to a wide variety of applications by improving manageability, performance, and availability. It is not unusual for partitioning to improve the performance of certain queries or maintenance operations by an order of magnitude. Moreover, partitioning can greatly simplify common administration tasks. Partitioning also enables database designers and administrators to tackle some of the toughest problems posed by cutting-edge applications. Partitioning is a key tool for building multi-terabyte systems or systems with extremely high availability requirements. Partitioning in Oracle Database 10g Release 2 An Oracle White Paper May 2005

Hassle of Partitioning • DBSs require manual partitioning • And manual repartitioning when tables scale-up • DBSs do not provide dynamically scalable tables

Facts • http://ceria.dauphine.fr/CERIA-publications.html • Research Report, December 2005 • [Oracle Database 10g]

SD-SQL Server Goal ScalableDistributed Partitioning of Relational Tables Scalable Distributed Database System SD-DBS

Role Model: an SDDS • A scalable distributed data structure • Specifically designed for possibly very large data on multi-computers or networks of WSs • P2P & Grids in modern vocabulary • Why SDDS Role Model? • Several SDDS schemes are well-known by now: • LH*, RP*, k-RP*, LH*RS…Chord, VBI & most of P2P schemes • The domain has over 20.000 references on Google • An SD-DBS reuses SDDS design principles • With DB management specificity

SD-SQL Server • The first and yet the only SD-DBS • Implements the SD-DBS architecture • Litwin, Schwartz & Risch (2002) • Runs on Microsoft SQL Server 2000 • Shared Nothing Architecture • Up to 250 nodes at present

User/Application User/Application sd_insert sd_create_table SD-SQL ServerManagers SD-SQLserver SD-SQLpeer SD-SQLclient SD-SQLserver LinkedSQLServers T Split _D1_T _D1_T _D1_T NDBs C P S S D2 Di Di+1 D1 Gross Architecture D1_T

…… Node1 Node2 Node3 Nodei DB1 DB2 DB1 DB2 DB1 DB2 SDB DB1 SDB MDB Nodes, SDBs & NDBs

NDB Types • Client NDB • Interfaces applications & users • Carries only images • No actual tables with application data • Server NDB • Carries actual tables • segments • Peer NDB • Both functions • Primary NDB • First for an SDB

Scalable (Distributed) Table • For the application: a table of an SDB • Internally: a collectionof segments behind client images • A segment is an SQL table • One per NDB of the SDB • Sharing the scalable table scheme • Except its check constraint • Min and Max value of the partition key • With size limit • Splitting when overflows occur • The check constraintspartition the partitionkey space

Scalable (Distributed) Table • The primary segment • First allocated for a new table • At some server or peer NDB of SDB • The peer creating the table • The primary server of the client creating the table

Scalable (Distributed) Table • For the application, it the client image is the table • The image name is the table name • Primary image • Created during the scalable table creation • at the client or peer NDB creating the table • Secondary images • Created later on • On other NDBs of the SDB • For local applications • By a dedicated command • sd_Create Image….

Scalable (Distributed) Table • Internally, every image is a specific SQL Server view of the segments: • Distributed partitioned union view CREATE VIEW TAS SELECT * FROM N2.DB1.SD._N1_TUNION ALL SELECT * FROM N3.DB1.SD._N1_T UNION ALL SELECT * FROM N4.DB1.SD._N1_T • Updatable • Through the check constraints • With or without Lazy Schema Validation

SD SQL Server Meta-Tables • Store various SD-SQL Server meta-data • In particular about each scalable table • At each server or peer NDB • SD.Size meta-table • Segment capacity • The number of stored tuples triggering a split • Same for every segment at present • SD.RPmeta-table • the actual partitioning of the scalable table • The location of each segment • SD.Primary table • The location of the SD.RP table for each segment in the NDB

SD SQL Server Meta-Tables • At every client or peer NDB • In SD.Image table • All the local images • The name of the image • The type • Primary or secondary • The number of segments • As seen by an image • Not necessarily the actual one

SD SQL Server Meta-Tables • At every NDB • SD.SDBNode points towards the primary NDB • SD.MDBNode points towards the MDB • At MDB • SD.Nodesindicates all the available SD-SQL Server nodes • Over linked SQL Server nodes • SD.SDB describes all the SDBs • At every primary NDB • SD.NDBpoints to every NDB of the SDB

DB1 SDB ……. N1.DB1 N2.DB1 N3.DB1 Ni.DB1 Primary N1.DB1 N1.DB1 RP N2.DB1 N3.DB1 1000 Size Ni.DB1 Nodes Meta-Tables Scalable Tables: Meta-data T Scalable Table SDBNode Meta-Tables

Scalable Table Expansion • The number of segments in a scalable table may grow • An overflowing segment splits • Creating one or more new segments • A split occurs when an insert overflows the segmentcapacity • The trigger launches the split as an asynchronous job called splitter • To avoid the application level timeout

Scalable Table Expansion • Every new segment • Is basically created at an existing NDB that does not yet have any segments of the expanding table • provided there is any • Otherwise a new NDB is first appended to SDB • Provided there is an available SD SQL Server node • Inherits the “father”’s schema • Gets its new check constraint • Gets indexes as defined at the “father”

Single Segment SplitSingle Tuple Insert p=INT(b/2) C( S)= { c: c < h = c (b+1-p)} C( S1)={c: c > = c (b+1-p)} b+1-p p S S S1 Check Constraint? b+1 b SELECT TOP Pi * INTO Ni.Si FROM S ORDER BY C ASC SELECT TOP Pi * WITH TIES INTO Ni.S1 FROM S ORDER BY C ASC

Single Segment SplitBulk Insert Single segment split

Multi-Segment SplitBulk Insert Multi-segment split

N1 N2 N3 Ni N3 sd_create_node_database N4 NDBDB1 NDBDB1 NDBDB1 NDBDB1 NDBDB1 NDBDB1 SDB DB1 SDB DB1 SDB DB1 sd_insert sd_insert sd_insert ScalableTable T Split with SDB Expansion sd_create_node sd_create_node_database …….

Image Adjustment • The splits do not modify synchronously the images • Any split makes every image outdated • The client or peer verifies every image dynamically when a query to the image comes in • Image checking • Image adjustment if necessary

Image Adjustment • Get the number of segments presented in the image, N1 • Get the number of segments of the scalable table, N2 • Compare N1 and N2: • If N1<N2 then Image Adjustment • Alter the partitioned view definition

N1.DB1 N2.DB1 N3.DB1 N4.DB1 PrimaryImage Image: Example DB1 SDB T Image T Scalable Table CREATE VIEW TAS SELECT * FROM N2.DB1.SD._N1_T CREATE VIEW TAS SELECT * FROM N2.DB1.SD._N1_TUNION ALL SELECT * FROM N3.DB1.SD._N1_T UNION ALL SELECT * FROM N4.DB1.SD._N1_T

INSERTsd_insert • CREATE TABLEsd_create_table Application Interface • The application interface manipulates scalable tables through SD-SQL Server commands • The SD-SQL Server commandsstart with ‘sd_’ to distinguish from SQL Server commands for static tables

Nodes Management • Node Creation • sd_create_node‘Dell1’/* Serverby default */ • sd_create_node‘Ceria’, ‘client’ • Node Alteration • sd_alter_node‘Ceria’, ‘ADD server’/* Becomes peer*/ • Node Removal • sd_drop_node‘Ceria’

SDB & NDB Management • SDB Creation • sd_create_scalable_database ‘SkyServer’, ‘Dell1’, ‘Server’,2 /* Creates the primary SkyServer NDB as well at Dell1*/ • SDB Alteration • sd_create_node_database‘SkyServer’, ‘Ceria’, ‘Client’ • SDB Removal • sd_drop_scalable_database‘SkyServer’

Scalable Tables Management • Scalable Table Creation • sd_create_table‘PhotoObj (objid BIGINT PRIMARY KEY…)’, 10000 • No foreign keys yet • Scalable Table Alteration • sd_alter_table ‘PhotoObj ADD t INT’, 1000 • sd_create_index‘run_index ON Photoobj (run)’ • sd_drop_index‘PhotoObj.run_index’ • Scalable Table Removal • sd_drop_table‘PhotoObj’

Image Adjustment • Secondary Image Creation • sd_create_image‘Ceria’, ‘PhotoObj’ • sd_create_image‘Dell2’, ‘PhotoObj’ • Secondary Image Removal • sd_drop_image'PhotoObj’

Scalable View • A view of an image • Involving perhaps static tables • And perhaps static views • … • Declared under SD-SQL Server by the SQL Server CREATE VIEW command

Scalable Queries Management USE SkyServer /* SQL Server command */ • Scalable Update Queries • sd_insert‘INTO PhotoObj SELECT * FROM Ceria5.Skyserver-S.PhotoObj’ • Scalable Search Queries • sd_select‘* FROM PhotoObj’ • sd_select‘TOP 5000 * INTO PhotoObj1 FROM PhotoObj’, 500

Image Binding Command Processing • Let Q a scalable query using the PhotoObj image: • sd_select‘COUNT (*) FROM PhotoObj’ Find Images in Q Check PhotoObj Image for Correctness Adjust PhotoObj Image if needed Send Q’to SQL Server for Execution

Concurrency • SD-SQL Server processes every command as SQL distributed transaction at Repeatable Read isolation level • Tuple level locks • Shared locks • Exclusive 2PL locks • Much less blocking than the Serializable Level

Concurrency • Splits use exclusive locks on segments and tuples in RP meta-table. • Shared locks on other meta-tables: Primary, NDB meta-tables • Scalable queries use basically shared locks on meta-tables and any other table involved • All the conccurent executions can be shown serializable

Splitter sd_alter_table Dell1 RP Dell2 Dell3 PhotoObj Dell1.SkyServer Concurrency: Example X Exclusive Lock Waiting Shared Lock X Exclusive Lock Exclusive Lock

Experimental Environment • 6 Machines Pentium IV 1.7 GHz • RAM: 780 Mb & 1 Gb • Operating System: Windows 2K Server • Ethernet Network: max bandwidth of 1 Gb/s • Use of SQL Analyzer for editing queries • Use of SQL Profiler to take measurements

The SkyServer Benchmark • We use SkyServer database as benchmark • Provided and installed at Ceria by Dr. Gray • SkyServer brings the entire database of the Sloan Digital Sky Survey, SDSS • We use of thePhotoObj table as an example scalable table • In our experiments, PhotoObjhas almost 159 K tuples (about 260 MB) • Originally, it has 14 M tuples

Split Time Splitting PhotoObj with 160 k tuplesinto 2…5 segments, according to segment capacity

Split Time Splitting PhotoObj with 160 k tuples and indexes into 2… 5 segments according to segment capacity

Split Time Analysis • Longer split time may timeout a query put on wait • Future solution: Incremental Splitting • The splitter moves tuples by an increment at a time • Let us say 1000 tuples • Then ends up by calling upon itself • The query may proceed as the splitter releases the exclusive lock on the RP tuple • The process continues for next increment etc as long as there are tuples to move

Image Adjustment (Q)sd_select‘COUNT (*) FROM PhotoObj’ Query (Q1) execution time

Scalable View Processing (Q)sd_select‘COUNT (*) FROM Ti’

An Overview of a Scalable Distributed Database System: SD-SQL Server