1 / 88

Distributed Databases

Distributed Databases. Distributed Databases. Parallel databases Architecture Query evaluation Query optimization Distributed databases Architectures Data storage Catalog management Query processing Transactions. Introduction.

lyris
Download Presentation

Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases

  2. Distributed Databases • Parallel databases • Architecture • Query evaluation • Query optimization • Distributed databases • Architectures • Data storage • Catalog management • Query processing • Transactions

  3. Introduction • A parallel database system is designed to improve performance through parallelism • Loading data, building indexes, evaluating queries • Data may be stored in a distributed way, but solely for performance reasons • A distributed database system is physically stored across several sites • Each site is managed by an independent DBMS • Distribution is affected by local ownership, and availability as well as performance

  4. Parallel Databases

  5. Motivation • How long does it take to scan a 1 terabyte table at 10MB/s? • 1,099,511,627,776 bytes = 1,0244 or 240 bytes • 10MB = 10,485,760 bytes • 1,099,511,627,776 / 10,485,760 = 104,858 • 104,858 / (60 * 20 * 24) = 1.2 days! • Using 1,000 processors in parallel the time can be reduced to 1.5 minutes

  6. Coarse-Grain and Fine-Grain • A coarse-grain parallel machine consists of a small number of processors • Most current high-end computers • A fine-grain parallel machine uses thousands of smaller processors • Also referred to as a massively parallel machine

  7. Performance • Both throughput and response time can be improved by parallelism • Throughput – the number of tasks completed in a given time • Processing many small tasks in parallel increases throughput • Response time – the time it takes to complete a single task • Subtasks of large transactions can be performed in parallel increasing response time

  8. Speed-Up, Scale-Up ideal • Speed-up • More resources means less time for a given amount of data • Scale-up • If resources increase in proportion to increase in data size, time is constant throughput degreeof parallelism ideal responsetime degree of parallelism

  9. Parallel Database Architecture • Where possible a parallel database should carry out evaluation steps in parallel • There are many opportunities for parallelism in a relational database • There are three main parallel DBMS architectures • Shared nothing • Shared memory • Shared disk

  10. P P P D D D Shared Memory Architecture • Multiple CPUs attached to an interconnection network • Accessing a common region of main memory • Similar to a conventional system • Good for moderate parallelism • Communication overhead is low • OS services control the CPUs • Interference increases with size • As CPUs are added memory contention becomes a bottleneck • Adding more CPUs eventually slows the system down … interconnectionnetwork globalsharedmemory

  11. P P P D D D Shared Disk Architecture • Each CPU has private memory and direct access to data • Through the interconnection network • Good for moderate parallelism • Suffers from interference in the interconnection network • Which acts as a bottleneck • Not a good solution for a large scale parallel system M … M M interconnectionnetwork

  12. P P P D D D Shared Nothing Architecture • Each CPU has local memory and disk space • No two CPUs access the same storage area • All CPU communication is through the network • Increases complexity • Linear speed-up • Operation time decreases proportional to increase in CPUs • Linear scale-up • Performance maintained if CPU increase is proportional to data interconnectionnetwork … M M M

  13. Parallel Query Evaluation • A relational query execution plan is a tree, or graph, of relational algebra operators • Operators in a query tree can be executed in parallel • If one operator consumes the output of another, there is pipelined parallelism • Otherwise the operators can be evaluated independently • An operator blocks if it does not produce any output until it has consumed all its inputs • Pipelined parallelism is limited by blocking operators

  14. Single Operator Evaluation • Individual operators can be evaluated in a parallel way by partitioning input data • In data-partitioned parallel evaluation the input data is partitioned, and worked on in parallel • The results are then combined • Tables are horizontally partitioned • Different rows are assigned to different processors

  15. Data Partitioning • Partition using a round-robin algorithm • Partition using hashing • Partition using ranges of field values

  16. Round-Robin • Partition using a round-robin algorithm • Assign record i to processori mod n • Similar to RAID systems • Suitable for evaluating queries that access the entire table • Less efficient for queries that access ranges of values and queries on equality

  17. Hash Partitioning • Partition using hashing • A hash function based on selected attributes is applied to each record to determine its processor • The data remains evenly distributed as the table grows, or shrinks over time • Good for equality selections • Only one disk is used, leaving the others free • Also useful for sequential scans where the partitioning attributes are a candidate key

  18. Range Partitioning • Partition using ranges of field values • Ranges are chosen from the sort key values, each range should contain the same number of records • Each disk contains one range • If a range is too large can lead to data skew • Skew can lead to the processors with large partitions becoming bottlenecks • Good for equality selections, and range selections

  19. Data Skew • Both hash and range partitioning may result in data skew • Where some partitions are larger or smaller • Skew can dramatically reduce the speed-up obtained from parallelism • In range partitioning skew can be reduced by using histograms • The histograms contain the number of attributes and are used to derive even partitions

  20. Parallel Evaluation Code • Parallel data streams are used to provide data for relational operators • The streams can come from different disks, or • Output of other operators • Streams are merged or split • Merged to provide the inputs for of a relational operator • Split as needed to parallelize processing • These operations can buffer data, and should be able to halt operators that provide their input data • A parallel evaluation consists of a network of relational, merge and split operators

  21. Types of Parallelism • Interqueryparalellism • Different queries or transactions execute in parallel • Throughput is increased but response time is not • Easy to support in a shared-memory system • Intraquery parallelism • Executing a single query in parallel to speed up large queries • Which in turn can entail either intraoperation or interoperation parallelism, or both

  22. Parallel Operations • Scanning and loading • Pages can be read in parallel while scanning a relation • The results can be merged • If hash or range partitioning is used selections can be directed to the relevant processors • Sorting • Joins

  23. Sorting • The simplest sort method is for each processor to sort its portion of the table • Then merge the sorted records • The merging phase may limit the amount of parallelism • A better method is to first redistribute the records over the processors using range partitioning • Using the sort attributes • Each processor sorts its set of records • The sets of sorted records are then retrieved in order • To make the partitions even the data in the processors can be sampled

  24. Joins • Join algorithms can be parallelized • Parallelization is most effective for hash or sort-merge joins • Parallel hash join is widely used • The process for parallel hash join is • First partition the two tables across the processors using the same hash function • Join the records locally, using any join algorithm • Merge the results of the local joins, the union of these results is the join of the two tables

  25. Improved Parallel Hash Join • If tables are very large parallel hash join may have a high cost at each processor • If each partition is large, multiple passes will be required for the local joins • An alternative approach is to use all processors for each partition • Partition the tables using h1 • Each partition of the smaller relation should fit into the combined memory of the processors • Process each partition using all processors • Use h2 to determine which processor to end records to

  26. Joins on Inequalities • Partitioning is not suitable for joins on inequalities • Such as R ⋈R.a < S.b S • Since all records in R could join with a record in S • Fragment and replicate joins can be used • In asymmetric fragment and replicate join • One of the relations is partitioned • The other relation is replicated across all partitions

  27. Fragment and Replicate • Each relation can be both fragmented and replicated • Into m fragments of R and n of S • However m * n processors are required • This works with any join condition • When partitioning is not possible S0 S1 S2 … Sn-1 R0 Pm-1,n-1 P0,0 P0,1 P2,0 P0,2 P1,1 P1,0 R1 R2 … Rm-1

  28. Other Operations • Selection – the table may be partitioned on the selection attribute • If not, it can be scanned in parallel • Duplicate elimination – use parallel sorting • Projection – can be performed by scanning • Aggregation – partition the table on the grouping attribute • If records do have to be transferred between processors it may be possible to just send partial result • The final result can then be calculated from the partial results • e.g. sum

  29. Costs of Parallelism • Using parallel processors reduces the time to perform an operation • Possibly to as little as 1/n * original cost • Where n is the number of processors • However there are also additional costs • Start-up costs for initiating the operation • Skew which may reduce the speed-up • Contention for resources resulting in delays • Cost of assembling the final result

  30. Parallel Query Optimization • As well as parallelizing individual operators, different operators can be processed in parallel • Different processors perform different operations • Result of one operator can be pipelined into another • Note that sorting and the hash-join partitioning block pipelines • Multiple independent operations can be executed concurrently • Using bushy, rather than left-deep, join trees

  31. Parallel vs. Serial Plans • The best serial plan may not be the best parallel plan • Also note that parallelization introduces further complexity into query optimization • Consider a table partitioned into two nodes, with a local secondary index • Node 1 contains names between A and M • Node 2 contains names between N and Z • Consider the selection: name < “Noober“ • Node 1 should scan its partition, but • Node 2 should use the name index

  32. Parallel System Design • In a large-scale parallel system the chances of failure increase • Such systems should be designed to operate even if a processor disk fails • Data can be replicated across multiple processors • Failed processors or disks are tracked • And request re-routed to the backup

  33. Summary • Architecture • Shared-memory is easy, but costly and does not scale well • Shared-nothing is cheap and scales well, but is harder to implement • Both intraoperation, and interoperation parallelism are possible • Most relational algebra operations can be performed in parallel • How the data is partitioned across processors is very important

  34. Distributed Databases

  35. Introduction • A distributed database is motivated by a number of factors • Increased availability • If a site containing a table goes down, the table may still be available if a copy is maintained at another site • Distributed access to data • An organization may have branches in several cities • Access patterns are typically affected by locality • Analysis of distributed data • Distributed systems must support integrated access

  36. Ideal • Data is stored at several sites • Each site is managed by an independent DBMS • The system should make the fact that data is distributed transparent to the user • Distributed Data Independence • Users should not need to know where the data is located • Queries that access several sites should be optimized • Distributed Transaction Atomicity • Users should be able to write transactions that access several sites, in the same way as local transactions

  37. Reality • Users may have to be aware of where data is located • Distributed data independence and distributed transaction atomicity may not be supported • These properties may be hard to support efficiently • Sites may be connected by a slow long-distance network • Consider a global system • Administrative overheads for viewing data as a single unified collection may be prohibitively expensive

  38. Distributed vs. Parallel • Distributed and shared-nothing parallel systems appear similar • In practice these are often very different since distributed DBs are typically • Geographically separated • Separately administered • Have slower interconnections • May have both local and global transactions

  39. Types of Distributed Database • Homogeneous • Data is distributed but every site runs the same DBMS software • Heterogeneous, or multidatabase • Different sites run different DBMSs, and the sites are connected to enable access to data • Require standards for gateway protocols • A gateway protocol is an API that allows external applications access to the database • e.g. ODBC and JDBC • Gateways add a layer of processing, and may not be able to entirely mask differences between servers

  40. Distributed DBMS Architecture • Client-Server • Collaborating Server • Middleware

  41. Client-Server Systems • One or more client processes and one or more server processes • A client process sends a query to any one server process • Clients are responsible for UI • Servers manage data and execute transactions • A popular architecture • Relatively simple to implement • Servers do not have to deal with user-interactions • Users can run a GUI on clients • Communication between client and server should be as set-oriented as possible • e.g. stored procedures vs. cursors

  42. Collaborating Server Systems • Client-server systems do not allow a single query to access multiple servers as this would require • Breaking the query into subqueries to be executed at different sites, and merging the answers to the subqueries • To do this the client would have to be overly complex • In a collaborating server system the distinction between clients and servers is eliminated • A collection of DB servers, each able to run transactions against local data • When a query is received that requires data from other servers the server generates appropriate subqueries

  43. Middleware Systems • Designed to allow a single query to access multiple servers, but • Without requiring all servers to be capable of managing multi-site query execution • Often used to integrate legacy systems • Requires one database server (the middleware) capable of managing multi-server queries • Other servers only handle local queries and transactions • The special server coordinates queries and transactions • The middleware server typically doesn’t maintain any data

  44. Distributed Data

  45. Storing Distributed Data • In a distributed system tables are stored across several sites • Accessing a table stored elsewhere incurs message-passing costs • A single table may be replicated or fragmented across several sites • Fragments are stored at the sites where they are most often accessed • Several replicas of a table may be stored at different sites • Fragmentation and replication can be combined

  46. Fragmentation • Fragmentation consists of breaking a table into smaller tables, or fragments • The fragments are stored instead of the original table • Possibly at different sites • Fragmentation can either be vertical or horizontal horizontal vertical

  47. Managing Fragmentation • Records that belong to a horizontal fragment are usually identified by a selection query • e.g. all the records that relate to a particular city, achieving locality, reducing communication costs • A horizontally fragmented table can be recreated by computing the union of the fragments • Fragments are usually required to be disjoint • Records belonging to a vertical fragment are identified by a projection query • The collection of vertical fragments must be a lossless-join decomposition • A unique tuple ID is often assigned to records

  48. Replication • Replication entails storing several copies of a table or of table fragments for • Increased availability of data, which protects against • Failure of individual sites, and • Failure of communication links • Faster query evaluation • Queries can execute faster by using a local copy of a table • There are two kinds of replication, synchronous, and asynchronous • These differ in how replicas are kept current when the table is modified

  49. Managing Distributed Catalogs • Distributing data across sites adds complexity • It is important to track where replicated or fragmented tables are stored • Each replica or fragment must be uniquely named • Naming should be performed locally • A global relation name consists of {birth site, local name} • The birth site is the site where the table was created • A site catalog records fragments and replicas at a site, and tracks replicas of tables created at the site • To locate a table, look up its birth site catalog • The birth site never changes, even if the table is moved

  50. Distributed Queries

More Related