investigating distributed database systems n.
Skip this Video
Download Presentation
Investigating Distributed Database Systems

Loading in 2 Seconds...

play fullscreen
1 / 29

Investigating Distributed Database Systems - PowerPoint PPT Presentation

  • Uploaded on

Investigating Distributed Database Systems. Challenges and Technology Kishore Puppala Rao. Definitions. A database is a logically related collection of data, stored in one or many files

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Investigating Distributed Database Systems' - dyllis

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
investigating distributed database systems

Investigating Distributed Database Systems

Challenges and Technology

Kishore Puppala Rao

  • A database is a logically related collection of data, stored in one or many files
  • A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network
  • Client/server architectures
  • Multiple clients, single server – this is the most common and straightforward implementation
  • Multiple clients, multiple servers – more flexible. DB distributed over multiple servers. Each client directs requests to a “home” server.
architecture cont d
Architecture (cont’d)
  • DB is physically distributed by fragmenting and replicating data (discussed later)
  • Regardless of architecture, implementation details of queries, transactions and DB operations should be transparent to users.
architecture peer to peer
Architecture (Peer-to-peer)
  • No distinction between client and server
  • Each site has functionality of both client and server
  • E.g. File-sharing apps such as BearShare, LiveWire
  • Sophisticated protocols needed to manage data distributed across multiple sites
  • Partitions the data
  • Subdivides each relation either vertically (by project operation) or horizontally (by selection operation)
  • Facilitates the placement of data close to its place of use, reducing transmission costs
  • Refers to duplication of data for access and/or security purposes
  • Fragments or whole database may be replicated
  • Replication involves keeping physical separate copies of data at different sites
distributed vs parallel
Distributed vs. Parallel
  • Distributed DBMS are not parallel DBMS, although distinction may be unclear
  • Distributed DBMS assume loose connection between processors operating independently, perhaps under different operating systems
parallel dbms
Parallel DBMS
  • Multiple processors under same operating system.
  • Architecture: Shared-none, shared-disk, or shared memory
  • Shared-Nothing: Each processor has exclusive access to its main memory and disk. Each processing element (PE) is a local site.
parallel dbms cont d
Parallel DBMS (cont’d)
  • Shared-memory: Each PE has access to any memory module or disk through some fast connection (e.g. LAN or cross-bar switch)
  • Shared-disk: Each PE has exclusive access to its own memory, but shared access to any disk via a fast connection. PE accesses DB pages on shared disk and copy to local cache
  • Distributed (and Parallel) DBMS must provide same functionality and consistency of centralized DBMS.
  • Transparency implies presenting a consistent view that shields the user from implementation details such as fragmentation, replication, and distribution.
  • Introduces major challenges
  • Query processing and optimization
  • Concurrency control
  • Reliability protocols
  • Replication protocols
query processing and optimization
Query Processing and Optimization
  • Techniques needed to address difficulties arising from data distribution and fragmentation. Localization techniques employed.
  • Algebraic queries on global relations are transformed to operate on fragments
  • Opportunities for parallel processing are identified (fragments are stored at different sites), unnecessary work is eliminated (not all fragments may be involved in the query)
query optimization
Query optimization
  • Determining the execution sites for distributed operations
  • Identifying the best distributed algorithm for distributed operations
  • Changing the order of operations in a query
concurrency control
Concurrency Control
  • Challenge in synchronizing user transactions is to extend serializability and concurrency to the distributed execution environment
  • Serializability: The ability to perform a set of operations in parallel with the same effect as if they were performed in a certain sequence, requires:
  • (a) execution of the set of transactions at each site must be serializable
  • (b) the serialization orders of these transactions at all these sites must be identical
concurrency cont d
Concurrency (cont’d)
  • If locking-based algorithms used, lock management may be centralized or distributed
  • Deadlocks must be avoided
  • Deadlock detection and management in a distributed database can be difficult
reliability protocols
Reliability protocols
  • Several types of failures: System, media, transaction, communication
  • May be difficult to differentiate type of failure
  • Distributed reliability protocols enforce transaction atomicity (commit all or commit nothing)
reliability cont d
Reliability (cont’d)
  • E.g. of Atomic commitment protocol: Two-phase commit
  • All sites involved in the execution of a distributed transaction must agree to commit the transaction before it is made permanent.
replication protocols
Replication protocols
  • Each logical data item has a number of physical instances
  • Challenge is to maintain (or approximate) consistency among physical copies as user updates logical data
  • Example criterion: One-copy equivalence – All physical copies of logical data should be equivalent after being updated by a transaction
  • Read-One/Write All (ROWA) protocol – enforces one-copy equivalence. Disadvantage: failure of one site may block entire transaction
replication cont d
Replication (cont’d)
  • Alternative algorithms relax ROWA by mapping each write to a subset of the physical copies
  • Quorum-based voting: Copies are assigned votes; read and (especially) write operations have to collect votes and reach a quorum to commit data. (see class notes)
research and trends
Research and Trends
  • Workflow models (advanced transaction models)
  • Network scaling problems
  • Multi-database systems and interoperability
  • Distributed object management
trends cont d
Trends (cont’d)
  • Primitive objects are not simple-structured data. Can consist of programs, voice, images, etc.
  • Distributed DBMS must handle increasingly larger data objects. E.g. 1MB storage needed for 1 digital X-Ray image (1024x1024) @ 8 bits/pixel
  • Most commercial DBMS (e.g. MS SQL Server 2000, Oracle 8i) provide some sort of distribution
  • Emergence of broadband networks eliminates the network as a bottleneck
trends cont d1
Trends (cont’d)
  • Mobile computing is escalating in interest and prevalence
  • Mobile stations may download data as needed
  • Alternatively, more powerful mobile stations may store native data for sharing with others
  • Mobility raises issues of address migration, maintenance of directories, and determining the location of stations
  • Object-oriented DBMS e.g. CORBA (platform independent), COM/OLE (MS-specific)
  • Common Object Request Broker Architecture
  • Facilitates the maintenance and DB access of data from a number of autonomous and heterogeneous sources (e.g. file systems, spreadsheets) via a multidatabase approach
  • Provides a generic platform for distributed computing
corba cont d
CORBA (cont’d)
  • In multidatabase systems, the main problem is the heterogeneity extant at four levels: platform, communication, database system, and semantic.
  • CORBA facilitates implementation transparency by providing client access via interfaces defined in a special Interface Definition Language (IDL), independent of the databases actual software and hardware environment.
  • Provides location transparency, allowing clients to access DB objects independent of location and communication protocols
corba cont d1
CORBA (cont’d)
  • Provides a common interface to mask heterogeneity among native database system implementations based on different data models (e.g. flat-file, relational, spreadsheet) and query languages
  • Common interface overcomes semantic conflicts such as schema and data conflicts
  • M.T. Ozsu and P. Valduriez, "Distributed and Parallel Database Systems – Technology and Current State-of-the-Art", ACM Computing Surveys, 28(1): 125 - 128, March 1996.
  • A. Dogac, C. Dengi and M.T. Ozsu, "Distributed Object Computing Platforms", Communications of ACM, 41(9): 95-103, September 1998.
  • J. N. Gray, “Notes on Data Base Operating Systems.” Operating Systems: An Advanced Course. R. Bayer, R.M. Graham (eds.) New York: Springer-Verlag, 1979, pp. 393-481.
references cont d
References (cont’d)
  • M.T. Ozsu, "The Push/Pull Effect - Can Distributed Database Technology Meet The Challenges of New Applications?", Database Programming & Design, April 1997.