chapter 1 3 data models and dbms architecture l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 1.3: Data Models and DBMS Architecture PowerPoint Presentation
Download Presentation
Chapter 1.3: Data Models and DBMS Architecture

Loading in 2 Seconds...

play fullscreen
1 / 16

Chapter 1.3: Data Models and DBMS Architecture - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Chapter 1.3: Data Models and DBMS Architecture. Title: Anatomy of a Database System Authors: J. Hellerstein, M. Stonebraker Pages: 43-95 . Anatomy of a Database System. Problem Problem Statement Why is this problem important? Why is this problem hard? Approaches

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 1.3: Data Models and DBMS Architecture' - burt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 1 3 data models and dbms architecture
Chapter 1.3: Data Models and DBMS Architecture
  • Title: Anatomy of a Database System
  • Authors: J. Hellerstein, M. Stonebraker
  • Pages: 43-95
anatomy of a database system
Anatomy of a Database System
  • Problem
    • Problem Statement
    • Why is this problem important?
    • Why is this problem hard?
  • Approaches
    • Approach description, key concepts
    • Contributions (novelty, improved)
    • Assumptions
problem statement dbms architecture
Problem Statement – DBMS Architecture
  • Given
    • A data model
    • Platform, i.e. operating system, computer hardware architecture
  • Find - An DBMS architecture
    • A set of building-block components
    • Interactions among building blocks
  • Objectives
    • Efficiency, Scalability
    • Extensibility
  • Constraints
    • Relational Data Model
why is this problem important
Why is this problem important?
  • Why review Relational DBMS architectural innovations?
    • Backbone of infrastructure applications
      • Banking, airline reservation, medical records, CRM, SCM, …
    • Well-understood point of reference for
      • New extensions and future revolution
  • Architecture allows
    • Analysis of properties
      • Availability, fault-tolerance, reliability
    • Mapping of multiple views
      • User requirements to components - validation and acceptance tests
      • Software developers, maintainer, …
      • Software operational support group
why is this problem hard
Why is this problem Hard?
  • Complexity
    • Mid-1970s – Efficient implementation of a Relational DBMS
    • Declarative Query Language
    • Logical and physical independence
  • Changes
    • Platforms evolve
      • Computer Hardware, Languages, Operating Systems
      • Storage: Tapes  Disks (1960s)  RAID (1990s)  SAN …
      • CPUs: Mainframe  Mini  Desktops  Multi-core CPUs (2000s)
    • Integrate many views
      • Enterprise – performance level, transaction reliability, …
      • Data Processing Needs – data warehouses, reports, OLTP, Web,…
contributions validation methodology
Contributions, Validation Methodology
  • Contributions
    • A simple yet relatively comprehensive RDBMS architecture
    • Decomposition into 4 components
    • Identification of depedencies
  • Validation
    • Ability to explain academic and commercial RDBMSs
    • Expert opinion, authors have architected multiple DBMSs
proposed approach
Proposed Approach
  • Four Components (Figure 1, pp. 44)
    • A Process Manager
    • Query Processing Engine
    • Transactional Storage Subsystem
    • Shared Utilities, e.g. Disk space management
  • Interactions among components
    • Not explicit in Figure 1
    • Implicit:
      • Left-top to lower-right flow
component 1 process manager
Component 1 – Process Manager
  • Responsibilities - Organization of processes
  • Platform: Uni-processor, High-performance OS threads
  • Two Options
    • Process per user (connection)
      • Issues - scalability
    • Server Process (+ I/O Process per disk)
      • Dispatcher thread, log manager thread
      • Pool of worker threads
      • Shared data (e.g. log, I/O buffer) in common heap space
      • Issues – asynchronous I/O, protection across threads, …
  • Client – Server communication
    • network socket
  • Q? What is new in this paper relative to Parallel Database paper by DeWitt et al.?
component 1 issues
Component 1 – Issues
  • Mapping DBMS threads to OS Processes
    • Absence of OS threads – page 50
    • Commercial examples – last para, sec. 2.2.1, page 51
  • Parallelism (Figures 5-7, pp. 52-54)
    • Shared memory – previous architectures port easily
    • Shared nothing
      • Query processing parallelizes w/ horizontal data partitioning
      • 2 phase commit need communication
      • Partial failure
    • Shared disk
      • Distributed lock manager, cache coherency protocol, …
  • Admission Control
    • Avoid thrashing ( working set > memory buffers)
    • Control number of connections, number of queries
component 2 query processor
Component 2 – Query Processor
  • Responsibility:
    • SQL query  execution plan (Fig. 8, pp. 64)
  • Subcomponents
    • Parsing and Authorization
    • Catalogs
    • Query rewrite – views, constant expressions, semantic optimization, sub-query flattening
    • Optimizer – plan space, selectivity estimation, search, parallelism, extensibility, auto-tuning, …
    • Executor – iterator model (Figure 9, pp. 68)
  • Q? What is new in optimizer since Selinger ?
component 2 query processor issues
Component 2 – Query Processor Issues
  • Data Modification Statements
    • Plans are more complex
    • Ex. Halloween problem (Fig. 10, pp. 71)
  • Access Methods
    • Unordered files, B+-tree, R-tree and bit-map indexes
    • API methods – init(), get_next(), …
    • Search by logical conditions (sarg) or record-id
    • Interacts with concurrency and recovery sub-components
component 3 transactional storage manager
Component 3 – Transactional Storage Manager
  • Responsibilities – ACID properties
  • Subcomponents
    • Lock Manager
      • Serializability, 2PL, Isolation levels (p. 76)
    • Log Manager
      • WAL – 3 rules (p. 78), performance tuning
    • Buffer pool
    • Access methods
      • Latches in B+trees (p. 80) – conservative, latch-coupling, right-link
      • Predicate locks – next-key locking
component 3 transactional storage manager13
Component 3 – Transactional Storage Manager
  • Interdependencies among subcomponents
    • Lock Manager, Log Manager
      • WAL assume strict 2PL (p. 82)
      • Q? What would happen without strict 2PL ?
    • Concurrency control, Access Methods
      • Methods are unique to index types
component 4 shared utilities
Component 4 – Shared Utilities
  • Sub-components
    • Memory allocator (p. 84)
    • Disk management subsystem
      • Map tables to devices or files
      • New issues with RAIDs (p. 86-87)
    • Replication services
      • Physical, trigger based, log-based
    • Batch utilities
      • Optimizer statistics gathering, backup/export, physical reorg and index construction
summary
Summary
  • Paper’s focus
    • DBMS Architectures – components and dependencies
  • Insights - Four Components (Figure 1, pp. 44)
    • A Process Manager
    • Query Processing Engine
    • Transactional Storage Subsystem
    • Shared Utilities, e.g. Disk space management
  • Interactions among components
    • Not explicit in Figure 1
    • Q. List a few discussed in the paper!
assumptions rewrite today
Assumptions, Rewrite today
  • Assumptions
    • Focus on Relational DBMS
    • Centralized DBMS (Recall T2.6 on R*)
    • Four component architecture reminds one of Ingres!
    • Lessons translate over to new domains
  • Rewrite today
    • Cover a post-relational DBMS, e.g. Stream or XML
    • Illustrate how lessons translate over web-services, e-mail repositories, network monitors, etc.