A transactional model for data warehouse maintenance
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

A Transactional Model for Data Warehouse Maintenance PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

A Transactional Model for Data Warehouse Maintenance. Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland D atabase S ystems R esearch G roup Worcester Polytechnic Institute. DWMS. Wrapper. Wrapper. Wrapper. Base. Base. Base.

Download Presentation

A Transactional Model for Data Warehouse Maintenance

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A transactional model for data warehouse maintenance

A Transactional Model for Data Warehouse Maintenance

Authored by:

Jun Chen, Songting Chen, Elke A. Rundensteiner

Published in ER’2002, Finland

Database Systems Research Group

Worcester Polytechnic Institute


Data warehousing

DWMS

Wrapper

Wrapper

Wrapper

. . .

Base

Base

Base

Data Warehousing

  • Data Integration from Remote Base Sources

    • Difficult and Labor-Intensive

    • Better Do it only ONCE and Materialize the Results

    • Share Materialized Data by Many Applications

Data Warehouse


A transactional model for data warehouse maintenance

Data Warehouse Maintenance

  • Motivation: Keep Data Warehouse (DW) Update-to-Date

    • Base Changes over Time

      • Source Data Updates

        • insert, delete, update

      • Source Schema Changes

        • add, drop, rename

    • Basic Idea: Incremental instead of Re-computation

      • Re-computation may take weeks


A transactional model for data warehouse maintenance

General Maintenance Algorithms

  • View Maintenance (VM)

    • Incrementally incorporate source data updates

    • [BLT86], [GMS93], [ZGH+95], [SBC+00]

  • View Synchronization (VS)

    • Rewrite data warehouse view definition after the schema

      of the source changed

    • [NLR98], [LNR02]

  • View Adaptation (VA)

    • Adapt view extent after the view definition changed

    • [NR99], [GMR+01]


A transactional model for data warehouse maintenance

Insert ( ‘Steve’, ‘Boston’)

DW Maintenance Example

CREATE VIEW Asia_Traveller AS

SELECT C.Name, C.Address,

F.FlightNo

FROM Customer C, FlightRes F

WHERE C.Name = F.Name

AND F.Dest = ‘Asia’;

View: Asia_Traveller

Name

Address

FlightNo

Dave

WPI

AA8384

Select FlightNo from FlightRes

where Name=‘Steve’

Customer

FlightRes

Name

Address

Name

Age

FlightNo

Dest

Dave

WPI

Dave

22

AA8384

Asia

Ellen

MA

Steve

22

UA77788

Europe


A transactional model for data warehouse maintenance

View: Asia_Traveller

Broken Query!

Name

Address

FlightNo

Dave

WPI

AA8384

3. Select FlightNo from FlightRes

where Name=‘Steve’

Customer

FlightRes

Name

Address

Name

Age

FlightNo

Dest

Dave

WPI

Dave

22

AA8384

Asia

Ellen

MA

Steve

22

UA77788

Europe

2. Rename (FlightRes, FlightReservation)

1. Insert ( ‘Steve’, ‘Boston’)

Maintenance Anomaly Problem


Inside broken query

Inside Broken Query

  • Two Transactions

    • Base Update Transaction

      • w(Bi)c(Bi)

    • DW Maintenance Transaction

      • r(B1)r(B2)…r(Bn)w(DW)c(DW)

  • Read-write conflicts between two transactions

    • Two Independent Transactions

    • w(Bi) / r(Bi)

      • Data Update w(Bi): Incorrect Query Results [ZGH+95]

      • Schema Change w(Bi): Broken Query


A transactional approach

A Transactional Approach

  • A Global Transaction Model

    • DWMS_Transaction

      • Integrates both base update transaction and its corresponding DW maintenance transaction

      • w(Bi)c(Bi)r(B1)r(B2)…r(Bn)w(DW)c(DW)

  • Maintenance Anomaly

    • Rephrased to read-write conflicts of DWMS_Transactions

      • w(Bi)c(Bi)r(B1)r(B2)…r(Bj)…r(Bn)w(DW)c(DW)

      • w(Bj)c(Bj)r(B1)r(B2)…r(Bn)w(DW)c(DW)


Serializability of dwms transaction

Serializability of DWMS_Transaction

  • Theorem

    • A history of DWMS_Transactions S is serializable

      iff it is equivalent to some serial schedule S’ of the

      same DWMS_Transactions.

  • Basis for Solving Anomaly Problems

    • To solve the anomaly problem, we need all DWMS_Transactions serializable.


Traditional serializability algorithms

Traditional Serializability Algorithms

  • Lock-based

    • Reads / writes acquire locks for access to shared resources

    • Transactions block each other

  • Multiversion-based

    • Write on a version, read on another version

    • Transactions do not block each other


Traditional serializability algorithms1

Traditional Serializability Algorithms

  • Lock-based

    • Read / write would need to lock data in sources?

    • Not desirable in DW environment

      • Data sources are autonomous

      • Not realistic to impose locking on them

  • Multiversion-based

    • Do not block each other

    • Desirable in DW environment

      • DW and data sources do not block each other

    • Need to maintain versions somewhere


Txnwrap a multiversion algorithm

CREATE VIEW Asia_Traveller AS

SELECT C.Name, C.Address,

F.FlightNo

FROM Customer’ C,FlightRes’ F

WHERE C.Name = F.Name

AND F.Dest = ‘Asia’;

View: Asia_Traveller

Name

Address

FlightNo

Customer’

Dave

WPI

AA8384

FlightRes’

Wrapper

Meta Relation

Wrapper

Meta Relation

Name

Address

Rel

Attr

N.

A.

F.

D.

Rel

Attr

Dave

WPI

Cust’

Name

Fli’

Name

Customer

FlightRes

Ellen

MA

Cust’

Address

Name

Address

Name

Age

FlightNo

Dest

Dave

WPI

Dave

22

AA8384

Asia

Ellen

MA

Steve

22

UA77788

Europe

TxnWrap: A Multiversion Algorithm

CREATE VIEW Asia_Traveller AS

SELECT C.Name, C.Address,

F.FlightNo

FROM Customer C, FlightRes F

WHERE C.Name = F.Name

AND F.Dest = ‘Asia’;


Versioned wrapper

Versioned Wrapper

Wrapper for Customer

Relation Customer’

Meta Relation

  • Semantics: life time of a tuple is

    • #born <= time < #dead


Source updates on versioned wrapper

Name

Address

#born

#dead

Dave

WPI

0

Relation Customer’ (state 2 )

Ellen

MA

0

Name

Address

#born

#dead

Dave

WPI

0

1

Ellen

MA

0

Relation Customer’ (state 1 )

Stove

Boston

1

Name

Address

#born

#dead

Meta Relation (state 2 )

Rel

Attr

Rel’

Attr’

#born

#dead

Dave

WPI

0

1

C’

Name

-

-

0

Ellen

MA

0

C’

Addr.

-

-

0

2

Steve

Boston

1

Source Updates on Versioned Wrapper

Relation Customer’ (Init)

Transaction1:

1. DELETE FROM Customer C

WHERE C.Name = ‘Dave’;

2. INSERT (‘Steve’, ‘Boston’);

Transcation 2:

Drop Customer.Address;


Dw maintenance query rewritten for versioned wrapper

Relation Customer’ (State 1 )

Name

Address

#born

#dead

Dave

WPI

0

1

Ellen

MA

0

Stove

Boston

1

DW Maintenance Query Rewritten for Versioned Wrapper

The maintenance query issued in Transaction2:

SELECT Name, Address

FROM Customer

WHERE condition;

Rewritten versioned maintenance query:

SELECT Name, Address

FROM Customer’

WHERE condition and

#born <= 2 and #dead > 2;


Performance evaluation

Performance Evaluation

  • Implementation

    • In Java

    • Platform: Oracle, JDBC on Windows NT

    • Embedded in DyDa [CCZ+01] System at WPI

  • Testbed

    • 6 data sources with one relation each

    • Each relation has 4 attributes and 100,000 tuples

    • One materialized joined view over these data sources

    • TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)


Data update processing

Data Update Processing


Schema change processing

Schema Change Processing


Related work

Related Work

  • View Maintenance

    • View Maintenance / Synchronization / Adaptation

  • Maintenance Anomaly

    • ECA [ZGH+95], SWEEP [AAS+97] handles only

      concurrent data updates

      • Compensation-based

      • Performance degrades at a high load

  • Multi-version Algorithms

    • 2-version, n-version, unlimited-version algorithms [MPL92]


Conclusions

Conclusions

  • Identify the Maintenance Anomaly Problem in mixed model environment

  • Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction.

  • Rephrase the maintenance anomaly in terms of

    serializability of DWMS_Transactions

  • Propose multiversion algorithm to achieve serializability

  • Implemented the maintenance solution in Dyda

  • Achieve stable performance under various workloads


Other activities and future work

Other Activities and Future Work

  • Batching of updates into more complex maintenance plans

  • Parallelism of maintenance processes

  • Support more complex views, e.g., aggregation

  • Generalize to more change types

  • Provide alternate view synchronization algorithms

  • Discovery of changes by non-cooperating sources

  • Discovery of meta data in terms of source relationships of distributed sources

  • Move beyond relational middle-layer model


Questions

Questions?


  • Login