High availability and fault tolerance in real time databases
Download
1 / 24

High Availability and Fault-Tolerance in Real-Time Databases - PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on

High Availability and Fault-Tolerance in Real-Time Databases. Jan Lindström University of Helsinki Department of Computer Science. Overview. The causes of the downtime Availability solutions CASE 1: Clustra CASE 2: TelORB CASE 3: RODAIN. The Causes of Downtime. Planned downtime

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'High Availability and Fault-Tolerance in Real-Time Databases' - sal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
High availability and fault tolerance in real time databases l.jpg

High Availability and Fault-Tolerance in Real-Time Databases

Jan Lindström

University of Helsinki

Department of Computer Science


Overview l.jpg
Overview

  • The causes of the downtime

  • Availability solutions

  • CASE 1: Clustra

  • CASE 2: TelORB

  • CASE 3: RODAIN


The causes of downtime l.jpg
The Causes of Downtime

  • Planned downtime

    • Hardware expansion

    • Database software upgrades

    • Operating system upgrades

  • Unplanned downtime

    • Hardware failure

    • OS failure

    • Database software bugs

    • Power failure

    • Disaster

    • Human error


Traditional availability solutions l.jpg
Traditional Availability Solutions

  • Replication

  • Failover

  • Primary restart


Case 1 clustra l.jpg
CASE 1: Clustra

  • Developed for telephony applications such as mobility management and intelligent networks.

  • Relational database with location and replication transparency.

  • Real-Time data locked in main memory and API provides precompiled transactions.

  • NOT a Real-Time Database !




How clustra handles failures l.jpg
How Clustra Handles Failures

  • Real-Time failover: Hot-standby data is up to date, so failover occurs in milliseconds.

  • Automatic restart and takeback: Restart of the failed node and takeback of operations is automatic, and again transparent to users and operators.

  • Self-repair: If a node fails completely, data is copied from the complementary node to standby. This is also automatic and transparent.

  • Limited failure effects


How clustra handles upgades l.jpg
How Clustra Handles Upgades

  • Hardware, operating system, and database software upgrades without ever going down.

    • Process called “rolling upgrade”

      • I.e. required changes are performed node by node.

      • Each node upgraded to catch up to the status of complementary node.

      • When this is completed, the operation is performed to next node.


Case 2 telorb l.jpg
CASE 2: TelORB

Characteristics

  • Very high availability (HA), robustness implemented in SW

  • (soft) Real Time

  • Scalability by using loosely coupled processors

    Openness

  • Hardware: Intel/Pentium

  • Language: C++, Java

  • Interoperability: CORBA/IIOP, TCP/IP, Java RMI

  • 3:rd party SW: Java


Telorb availability l.jpg
TelORB Availability

  • Real-time object-oriented DBMS supporting

    • Distributed Transactions

    • ACID properties expected from a DBMS

    • Data Replication (providing redundancy)

    • Network Redundancy

  • Software Configuration Control

    • Automatic restart of processes that originally executed on a faulty processor on the ones that are working

    • Self healing

  • In service upgrade of software with no disturbance to operation

  • Hot replacement of faulty processors


Automatic reconfiguration l.jpg

reloading

Automatic Reconfiguration


Software upgrade l.jpg
Software upgrade

  • Smooth software upgrade when old and new version of same process can coexist

  • Possibility for application to arrange for state transfer between old and new static process (unless important states aren’t already stored in the database)


Partioning types and data l.jpg

17

17

22

22

A

18

18

21

21

20

20

19

19

B

A

B

Partioning: Types and Data


Advantages l.jpg
Advantages

  • Standard interfaces through Corba

  • Standard languages: C++, Java

  • Based on commercial hardware

  • (Soft) Real-time OS

  • Fault tolerance implemented in software

  • Fully scalable architecture

  • Includes powerful middleware: A database management system and functions for software management

  • Fully compatible simulated environment for development on Unix/Linux/NT workstations


Case 3 rodain l.jpg
CASE 3: RODAIN

  • Real-Time Object-Oriented Database Architechture for Intelligent Networks

  • Real-Time Main-Memory Database System

  • Runs on Real-Time OS: Chorus/ClassiX (and Linux)



Rodain database node l.jpg

shared

disk

Rodain Database Node

Database Primary Unit

User Request

Interpreter Subsystem

Object-

Oriented

Database

Management

Subsystem

Watchdog Subsystem

Distributed Database

Subsystem

Fault-Tolerance and

Recovery Subsystem

Database Mirror Unit

Fault-Tolerance and

Recovery Subsystem

Object-

Oriented

Database

Management

Subsystem

Distributed Database

Subsystem

Watchdog Subsystem

User Request

Interpreter Subsystem


Rodain database node ii l.jpg

shared

disk

RODAIN Database Node II

Database Primary Unit

User Request

Interpreter Subsystem

Object-

Oriented

Database

Management

Subsystem

Watchdog Subsystem

Distributed Database

Subsystem

Fault-Tolerance and

Recovery Subsystem

Database Mirror Unit

Fault-Tolerance and

Recovery Subsystem

Object-

Oriented

Database

Management

Subsystem

Distributed Database

Subsystem

Watchdog Subsystem

User Request

Interpreter Subsystem


Ord architechture l.jpg
ORD Architechture

Index

OCC

Data

TRP

ORD

DDS

FTRS


Fault tolerance l.jpg
Fault-Tolerance

  • Based on logs and mirroring

  • Logs send to Mirror

  • Mirror stores the logs on disk in SSS

  • Mirror maintains copy of main-memory database

  • Mirror makes disk copies of its database image


Recovery l.jpg
Recovery

  • Based on role switching

  • When Primary fails

    • Mirror updates its MMDB up to date

    • Mirror starts acting as new Primary

    • Active transactions are restarted or lost

  • When Mirror fails

    • Primary stores logs directly to SSS


Recovery ii l.jpg
Recovery II

  • During recovery the failed Node

    • always starts as a mirror node

    • loads most recent database image from disks in SSS

    • updates the log tail to loaded image

    • receives the logs from primary node

    • continues as normal mirror node


Further reading l.jpg
Further reading

  • Bratsberg, Humborstad: Online Scaling in a Highly Available Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp 451-460, 2001.

  • Clustra Database: Technical Overview, http://www.clustra.com

  • Björnerstedt, Ketoja, Sintorn, Sköld: Replication between Geographically Separated Clusters - An Asynchronous Scalable Replication Mechanism for Very High Availability, Proceedings of the International Workshop on Databases in Telecommunications II, LNCS vol 2209, pp. 102-115, 2001.

  • Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Main-Memory Database for Telecommunications, Proceedings of the International Workshop on Databases in Telecommunications, LNCS vol 1819, pp 158-173, 2000.