1 / 24

View Materialization & Maintenance Strategies

View Materialization & Maintenance Strategies. By Ashkan Bayati & Ali Reza Vazifehdoost. Motivation. Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources.

neil-mercer
Download Presentation

View Materialization & Maintenance Strategies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost

  2. Motivation Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources

  3. View Materialization & Maintenance • View materialization is the process of pre-computing views (summarized information) in order to gain performance • Drawback is to keep the view consistent when the underlying data sources change • View Maintenance is the process of keeping the view consistent with the underlying source tables

  4. Incremental View Maintenance • Relevant Updates only affect the view • The aim of incremental view maintenance is to re-compute the view considering only the net changes that have taken place instead of re-calculating the view from scratch.

  5. Selection • V = Ө C(y) (r) any tuple that satisfies C(y) will be in the view • After inserts and deletes we get V’= V + Ө C(y) (i) - Ө C(y) (d) • The view can be incrementally maintained by: Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i))) Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d))

  6. Projection • Problem with Projections: • Imagine if you delete (1,10) from the base table • Solution is to keep the key in the view or use a counter

  7. Joins • Inserts: Let V = r s and r’ = r i then: V’= r’ s = (r i) s = (r s) (i s) = V (i s) Deletes are similar

  8. View Maintenance in Dynamic Environments • Dynamic environment is specified here as one that covers both data updates and schema changes • Interleaving data updates and schema changes can cause problems • The following steps need to be taken: - Optimize updates based on their source relations and update types. - For schema changes that effect the view definition perform a view evolution process. - Perform view adaptation to make the view consistent.

  9. Optimize Updates • DU’ = п (attr (R)) ∩ (attr(R’)) <DU> • Its obvious to see п(attr (R)) ∩ (attr(R’)) contains all the attributes related to the view redefinition. This is essentially because neither dropped nor added attributes will appear in the view definition. • The relationship between SC and DU are: • If SCi’ contains drop relation Ri, then DUi ={} and SCi’ = drop relation Ri. • If SCi’ contains drop attribute operation both SCi’ and DUi’ might not be empty • If SCi’ contains no drop operation, then DUi’=DUi.

  10. Example • Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. • Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this information you can see that attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}.

  11. Evolving View Definition • Applying view synchronization:

  12. Making the view consistent • Now that the schema is consistent we need the view to become synchronized with the underlying base table updates. Many mechanisms have been defined I will explain more on this issue later.

  13. Efficient VM over distributed data sources • Materialized view integrate and store data from distributed data sources to ensure better access, higher performance and better availability. • Since the data sources are distributed the network cost involved in transferring the net changes can also be dramatic. • State of the art view maintenance requires 0(n^2) maintenance queries to remote data sources with n being the number of data sources in the view definition.

  14. Goal • The aim is to restructure the view maintenance queries in order to reduce costs. • HOW?? • Assume the Materialized view R1► R 2 ► R 3 ► R 4. (►= join)

  15. Restructuring Batch View Maintenance • State of ART: • Ri’=Ri + Ri • Hence O(n^2)

  16. Adjacent Grouping • Adjacent Grouping (share common access to the maintenance Queries): For the previous example divide it up into two groups. • It becomes • (R1 ►R2 +R1’ ►R2) ►R3►R4 + (R3►R4+ R 3’ ►R4)►R1’ + R2’ hence 12 queries have been reduced to 8 hence O(n^1.5)

  17. Conditional Grouping • A more aggressive method is called conditional grouping whose execution is 2*(n-1) maintenance queries. • Scroll up phase

  18. Conditional Grouping Cont • Scroll Down phase

  19. Self Maintenance of Multiple SPJ Views • The view V at level 0 can be described in terms of nodes at level as tmp1 tmp3. • Some tuples of tmp1 and tmp3 do not join into the view V; hence, we store these tuples in their respective AV’s for tmp1 and tmp3 at level 1.

  20. Update takes place in Relation R • There are two possible paths that U (update) can take to find its way to the root node: • ∆V = (((U AV(S)) AV(T)) AV(tmp1))) • ∆V = V U

  21. Sub-trees • With this approach, a change in any sub-tree can be propagated to the root node without re-computing any of the other sub-trees. • Since we only store tuples at level i if they do not join into the node at level i+1, the tuples are not duplicated in the tree.

  22. Benefits of this approach • The benefits of this procedure can be summarized as follows: • Changes to the view of a sub-tree only effectively change the root of that sub-tree • The view updates can effectively be computed by joining only subsets of base relations rather than the entire base relation. As an example ∆V = (((U AV(S)) AV(T)) AV(tmp1))) rather than the traditional method ∆V = (((U S) T) AV(tmp1)))

  23. Multiple View Maintenance • Essentially the same as single view maintenance however the AV of the shared node in the tree will be different.

  24. Auxiliary View Structure • AV(temp3) stores tuples that do not join in V and do not join into V’ into two separate AV’s. The problem with this scheme is that it stores the set AV(temp3) (V) ∩ AV(temp3) (V’).The sub-tree represented by intermediate node temp3 will be recomputed twice and the views V and V’ will be updated separately. • AV(temp3) stores tuples that do not join in view V and tuples that do not join in view V’ in three AV’s: AV(temp3)(V) , AV(temp3)(V’) and AV(temp3)(V∩V’). This eliminates duplicates, this will cut down the computational cost but incurs additional overhead of placing tuples in the correct AV.

More Related