Advanced Topics in Databases

Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I

Outline: • Main Memory databases • Transaction Processing Monitors • Transactional Workflows • Real Time Databases

Main Memory Databases: • Memory resident database systems (MMDB’s) store their data in main physical memory and provide very high-speed access. • Traditional database systems rely on • the disk subsystem to retrieve and update data • use an offline storage device such as magnetic tape for backup. • MMDB will use • physical memory as primary storage • a disk subsystem for backup.

Main Memory Databases: • Advantages: • Achieving significant performance improvements over conventional database systems • Improve of processing time • Improve of throughput rates • due to elimination of I/O overhead.

Main Memory Databases: • Vs. conventional DBMS: • Volatility of main memory • using a small stable main memory to support recovery processing • High overhead for the initial load • frequently create the archives, • distribute them across several secondary storage devices • Implementation issues • Inherent differences between disk and memory storage

Main Memory Databases: • Concurrency Control: • access to main memory is so much faster than disk access, transactions may complete more quickly in a main memory system. • So, locks will not be held as long, • Lock contention may not be as important as it is when the data is disk resident. • The advantage of small locking granules (fields or records) (when data are memory resident), are removed. • Very large lock granules (e.g., relations) are most appropriate for memory resident data.

Main Memory Databases: • Implementation of the locking mechanism : • conventional system: a hash table that contains entries for the objects currently locked . • MMDB: a small number of bits in data to represent data’s lock status. • the first bit is set  the object is locked, • If it is locked and the second bit is set, • there are one or more waiting transaction • else it is free.

Main Memory Databases: • Commit Processing: • necessary to have a backup copy and to keep a log of transaction activity • Before a transaction can commit, its activity records must be written to the log. • Logging can impact response time, since each transaction must wait for at least one stable write before committing. • Logging can also affect throughput if the log becomes a bottleneck • In MMDBs, the logging represents the only disk operation each transaction will require. • Using a small amount of stable main memory hold a portion of the log. • eliminating the response time problem, since transactions need never wait for disk operations. • Pre-committing: is accomplished by releasing a transaction’s locks as soon as its log record is placed in the log, without waiting for the information to be propagated to the disk. • Group commits: can be used to relieve a log bottleneck.

Main Memory Databases: • Access Methods: • in main memory access methods, data values on which the index is built, need not be stored in the index itself, as is done in B-Trees. • random access is fast in main memory  pointers can be followed quickly index structures can store pointers to the indexed data, rather than the data itself. • eliminating the problem of storing variable length fields in an index and , • saving space as long as the pointers are smaller than the data they point to!

Main Memory Databases: • Data Representation: • Relational tuples can be represented as a set of pointers to data values. • use of pointers is space efficient when large values appear multiple times in the database, • (the actual value needs to only be stored once.) • Pointers also simplify the handling of variable length fields since variable length data can be represented using pointers into a heap. • Query Processing • In MMDB: sequential access is not faster than random access . • E.g.: sort merge join processing loses its advantage! • Although the sorted relations could be represented easily in a main memory database using pointer lists, there is really no need for this since much of the motivation for sorting is already lost .

Main Memory Databases: • Recovery: • Backups of memory resident databases must be maintained on disk or other stable storage to insure against loss of the volatile data . • 1.the procedure used during normal database operation to keep the backup up-to-date (Checkpointing), • 2.the procedure used to recover from a failure. (Failure Recovery) • loading blocks of the database “on demand” until all of the data has been loaded. • using disk striping or disk array • there must be independent paths from the disks to memory

Main Memory Databases: • Papers: • Main Memory Database Systems: An overview Hector Garcia-Molina, Kenneth Salem, IEEE 1992 • This paper surveys the major memory residence optimizations and briefly discusses some of the memory resident systems that have been designed or implemented. • Transaction Management for a Main-Memory Database Piyush Burte, Boanerges Aleman-Meza, D. Brent Weatherly, Rong Wu • In this paper, the details of thread concurrency and resource locking protocols are examined, the deadlock prevention scheme, and the Java-based implementation of these design decisions. Also the effectiveness of the design with performance tests that simulate typical transactions on a highly concurrent database system are shown.

Main Memory Databases: • Main Memory Database Recovery Margaret H. Eich, IEEE 1986 • This paper examines MMDB recovery, identifies differences from traditional DBMS recovery, composes a “wish list” of MMDB recovery requirements, describes why previously proposed techniques do not satisfy these requirements, and proposes a new MMDB recovery technique which does. • * -Understanding, Modeling and Improvement Main-Memory Database Performance Stefan Manegold, 2002, Amsterdam university

Main Memory Databases: • Some possible future works: • Efficient loading for MMDBs including the idea of partitioning. • examining methods for distributing log and archive database information across multiple secondary storage devices. • Simulation and testing the results

Transaction Processing Monitor • A Transaction Processing (TP) application is a program that performs an administrative function by accessing a shared data based on behalf of an online user. • A TP system is an integrated set of products that supports TP applications ,including : • hardware such as processors, memories, disks, and communication controllers • software, such as operating systems , DBMSs, computer networks • TP monitors • The main function of a TP monitor is to • coordinate the flow of transaction requests between terminals or other devices and application programs that can process these requests • it imposes a certain structure on the software components of a TP system and offers functions to support the activities of each component

Transaction Processing Monitor • Most of TP applications are structured to perform the following steps for each terminal : • Interact with the terminal user to collect the transaction’s input, usually through forms and menus. • Translate the transaction input into a standard-format request message. • Start the transaction. • Examine the request’s header to determine its type. • Execute the request type’s application program, which may in turn invoke DBMSs and other application programs. • Commit the transaction after the application has finished. • Send the transaction’s output to the terminal.

Transaction Processing Monitor • A TP monitor divides an application into 3 components that perform the above steps: • Message Manager • performs steps 1, 2, and 7 • Request Control • performs steps 3, 4, and 6 • Application Server • performs step 5, in collaboration with DBMSs

Transaction Processing Monitor • Papers: • Transaction Processing Monitors Philip.A.Bernstein November 1990,Vol 33,No.11,Communication of ACM • In this article, it was shown that TP monitors have evolved to solve distributed computing problems that are not solved by the underlying OS, DBMS, and network. In particular, they support multithreaded processes, message routing, queuing, and system management and recovery. Sometimes, they support the transaction abstraction • Chapter 2-(Transaction Processing systems) of Distributed Database Systems Michael Gertz, spring 2003

Transactional Workflows: • Workflow: is an activity in which multiple tasks are executed in a coordinated way by different processing entities. • Task: some work to be done; • Textual description in a file,a form,a msg,a computer program • Processing entity: that performs the tasks • Person / sw system • A workflow process : • directed graph P=< N, A > • a set of nodes N= {n1,n2 .... } • and a set of arcs

Transactional Workflows: • architecture of a WFMS • Increasingly, workflow management systems (WFMSs) are being used as the primary technology for organizations to perform their daily business processes (workflows)

Transactional Workflows: • Execution: • Scheduler: program  submitting tasks, monitoring events…. • Task agents: controlling exec. of a task • Query mechanisms : for the state of the workflow • 3 architectures: • Centralized: single scheduler • Partially distributed: one instance of the scheduler for each workflow • Fully distributed: no scheduler, communicating agents

Transactional Workflows: • Papers: • Logic Bases Modeling and Analysis of Workflows Hasan Davulcu , Michael Kifer, CR. Ramakrishnan, I.V. Ramakrishnan,ACM 1998 • In this paper Concurrent Transaction Logic (CTL) as the language for specifying, analyzing, and scheduling of workflows is proposed. Also it has been shown that both local and global properties of workflows can be naturally represented as CTL formulas and reasoning can be done with the use of the proof theory and the semantics of this logic. • Virtual Transaction Model to Support Workflow Vasudev Krishnamoorthy,Ming-Chien Shan,ACM 2000 • This paper presents a model called the virtual transaction ,to provide transactional support to the workflow applications. and also discusses on how ACID properties will be achieved using this model.

Transactional Workflows: • Correctness Issues in Workflow Management Mohan Kamathy and Krithi Ramamritham Department of Computer Science, University of Massachusetts, Amherst MA, 1996 • This paper discusses about issues happened due to failures in workflows and then describes techniques for ensuring correctness of workflows.

Real-Time Database systems • Real-time database systems inherit many properties from both database systems and real-time systems. • ACID properties • Transactions process in a RTDBS are associated with timing constraints deadlines: • Hard: serious problem if the deadline is contradicted • Firm: no worth if completed after the deadline • Soft: diminishing value if completed after the deadline

Real-Time Database systems • In traditional databases only the data consistency should be preserved, • But in RTDBS : both timing and data consistency constraints •  need of time-critical scheduling methods for • Concurrency control • Resource scheduling • Commit processing • Buffer management

Real-Time Database systems • Goal: • In traditional Dbs: • minimizing the transaction response time • And maximizing the throughput • In RTDBS : • Maximizing the # of transactions that satisfy their deadlines. • assigning a priority to each transaction based on its deadline • Earliest Deadline First • Least Slack First • Slack Time: max length of the time ,the transaction can be delayed and still satisfy its deadline.

Real-Time Database systems • Papers: • Advances in Real-Time Database Systems Research Azer Bestavros, Sigmod 1996 • Research Issues in Real-Time Database Systems Ozgur Ulusoy,Bilkent University • In this paper a basic understanding of the issues in real-time databases is provided and research efforts in this area are introduced. • Maintaining Security in Firm Real-time Systems Quazi N. Ahmed and Susan V. Vrbsky, Department of Computer Science, The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A. • In this papera new concurrency control algorithm for secure firm real-time databases is produced. Also, the results show that the algorithm performs fairly well in terms of security and timeliness compared to a non-secure algorithm. • *- Value-Based Scheduling in Real-Time Database Systems Jayant R. Haritsa, Michael J. Carey, and Miron Livny ,VLDB Journal 1992

Advanced Topics in Databases