Investigating distributed database systems
1 / 29

Investigating Distributed Database Systems - PowerPoint PPT Presentation

  • Uploaded on

Investigating Distributed Database Systems. Challenges and Technology Kishore Puppala Rao. Definitions. A database is a logically related collection of data, stored in one or many files

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Investigating Distributed Database Systems' - dyllis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Investigating distributed database systems

Investigating Distributed Database Systems

Challenges and Technology

Kishore Puppala Rao


  • A database is a logically related collection of data, stored in one or many files

  • A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network


  • Client/server architectures

  • Multiple clients, single server – this is the most common and straightforward implementation

  • Multiple clients, multiple servers – more flexible. DB distributed over multiple servers. Each client directs requests to a “home” server.

Architecture cont d
Architecture (cont’d)

  • DB is physically distributed by fragmenting and replicating data (discussed later)

  • Regardless of architecture, implementation details of queries, transactions and DB operations should be transparent to users.

Architecture peer to peer
Architecture (Peer-to-peer)

  • No distinction between client and server

  • Each site has functionality of both client and server

  • E.g. File-sharing apps such as BearShare, LiveWire

  • Sophisticated protocols needed to manage data distributed across multiple sites


  • Partitions the data

  • Subdivides each relation either vertically (by project operation) or horizontally (by selection operation)

  • Facilitates the placement of data close to its place of use, reducing transmission costs


  • Refers to duplication of data for access and/or security purposes

  • Fragments or whole database may be replicated

  • Replication involves keeping physical separate copies of data at different sites

Distributed vs parallel
Distributed vs. Parallel

  • Distributed DBMS are not parallel DBMS, although distinction may be unclear

  • Distributed DBMS assume loose connection between processors operating independently, perhaps under different operating systems

Parallel dbms
Parallel DBMS

  • Multiple processors under same operating system.

  • Architecture: Shared-none, shared-disk, or shared memory

  • Shared-Nothing: Each processor has exclusive access to its main memory and disk. Each processing element (PE) is a local site.

Parallel dbms cont d
Parallel DBMS (cont’d)

  • Shared-memory: Each PE has access to any memory module or disk through some fast connection (e.g. LAN or cross-bar switch)

  • Shared-disk: Each PE has exclusive access to its own memory, but shared access to any disk via a fast connection. PE accesses DB pages on shared disk and copy to local cache


  • Distributed (and Parallel) DBMS must provide same functionality and consistency of centralized DBMS.

  • Transparency implies presenting a consistent view that shields the user from implementation details such as fragmentation, replication, and distribution.

  • Introduces major challenges


  • Query processing and optimization

  • Concurrency control

  • Reliability protocols

  • Replication protocols

Query processing and optimization
Query Processing and Optimization

  • Techniques needed to address difficulties arising from data distribution and fragmentation. Localization techniques employed.

  • Algebraic queries on global relations are transformed to operate on fragments

  • Opportunities for parallel processing are identified (fragments are stored at different sites), unnecessary work is eliminated (not all fragments may be involved in the query)

Query optimization
Query optimization

  • Determining the execution sites for distributed operations

  • Identifying the best distributed algorithm for distributed operations

  • Changing the order of operations in a query

Concurrency control
Concurrency Control

  • Challenge in synchronizing user transactions is to extend serializability and concurrency to the distributed execution environment

  • Serializability: The ability to perform a set of operations in parallel with the same effect as if they were performed in a certain sequence, requires:

  • (a) execution of the set of transactions at each site must be serializable

  • (b) the serialization orders of these transactions at all these sites must be identical

Concurrency cont d
Concurrency (cont’d)

  • If locking-based algorithms used, lock management may be centralized or distributed

  • Deadlocks must be avoided

  • Deadlock detection and management in a distributed database can be difficult

Reliability protocols
Reliability protocols

  • Several types of failures: System, media, transaction, communication

  • May be difficult to differentiate type of failure

  • Distributed reliability protocols enforce transaction atomicity (commit all or commit nothing)

Reliability cont d
Reliability (cont’d)

  • E.g. of Atomic commitment protocol: Two-phase commit

  • All sites involved in the execution of a distributed transaction must agree to commit the transaction before it is made permanent.

Replication protocols
Replication protocols

  • Each logical data item has a number of physical instances

  • Challenge is to maintain (or approximate) consistency among physical copies as user updates logical data

  • Example criterion: One-copy equivalence – All physical copies of logical data should be equivalent after being updated by a transaction

  • Read-One/Write All (ROWA) protocol – enforces one-copy equivalence. Disadvantage: failure of one site may block entire transaction

Replication cont d
Replication (cont’d)

  • Alternative algorithms relax ROWA by mapping each write to a subset of the physical copies

  • Quorum-based voting: Copies are assigned votes; read and (especially) write operations have to collect votes and reach a quorum to commit data. (see class notes)

Research and trends
Research and Trends

  • Workflow models (advanced transaction models)

  • Network scaling problems

  • Multi-database systems and interoperability

  • Distributed object management

Trends cont d
Trends (cont’d)

  • Primitive objects are not simple-structured data. Can consist of programs, voice, images, etc.

  • Distributed DBMS must handle increasingly larger data objects. E.g. 1MB storage needed for 1 digital X-Ray image (1024x1024) @ 8 bits/pixel

  • Most commercial DBMS (e.g. MS SQL Server 2000, Oracle 8i) provide some sort of distribution

  • Emergence of broadband networks eliminates the network as a bottleneck

Trends cont d1
Trends (cont’d)

  • Mobile computing is escalating in interest and prevalence

  • Mobile stations may download data as needed

  • Alternatively, more powerful mobile stations may store native data for sharing with others

  • Mobility raises issues of address migration, maintenance of directories, and determining the location of stations

  • Object-oriented DBMS e.g. CORBA (platform independent), COM/OLE (MS-specific)


  • Common Object Request Broker Architecture

  • Facilitates the maintenance and DB access of data from a number of autonomous and heterogeneous sources (e.g. file systems, spreadsheets) via a multidatabase approach

  • Provides a generic platform for distributed computing

Corba cont d
CORBA (cont’d)

  • In multidatabase systems, the main problem is the heterogeneity extant at four levels: platform, communication, database system, and semantic.

  • CORBA facilitates implementation transparency by providing client access via interfaces defined in a special Interface Definition Language (IDL), independent of the databases actual software and hardware environment.

  • Provides location transparency, allowing clients to access DB objects independent of location and communication protocols

Corba cont d1
CORBA (cont’d)

  • Provides a common interface to mask heterogeneity among native database system implementations based on different data models (e.g. flat-file, relational, spreadsheet) and query languages

  • Common interface overcomes semantic conflicts such as schema and data conflicts


  • M.T. Ozsu and P. Valduriez, "Distributed and Parallel Database Systems – Technology and Current State-of-the-Art", ACM Computing Surveys, 28(1): 125 - 128, March 1996.

  • A. Dogac, C. Dengi and M.T. Ozsu, "Distributed Object Computing Platforms", Communications of ACM, 41(9): 95-103, September 1998.

  • J. N. Gray, “Notes on Data Base Operating Systems.” Operating Systems: An Advanced Course. R. Bayer, R.M. Graham (eds.) New York: Springer-Verlag, 1979, pp. 393-481.

References cont d
References (cont’d)

  • M.T. Ozsu, "The Push/Pull Effect - Can Distributed Database Technology Meet The Challenges of New Applications?", Database Programming & Design, April 1997.