220 likes | 332 Views
Chapter 17: Additional Slides. June 10, 2014. Outline. Physical Data Management Fragments Distributed Query Processing Transactions Logical Data Management Transparency Conceptual Data Management. Physical Data Management Fragments. What is a Fragment?
E N D
Chapter 17: Additional Slides June 10, 2014
Outline • Physical Data Management • Fragments • Distributed Query Processing • Transactions • Logical Data Management • Transparency • Conceptual Data Management
Physical Data ManagementFragments • What is a Fragment? • Vertical subset (project operation) • Horizontal subset (restrict operation) • Mixed Fragment (combination of project and restrict) • A fragment may be allocated to a single or multiple sites • Fragments may be replicated where • there is a primary fragment as a single site • Copies of the fragment are placed at multiple sites (secondary)
Physical Data ManagementDistributed Query Processing • Involves both local (intra site) and global (inter site) optimization. • Multiple optimization objectives • The weighting of communication costs versus local processing costs depends on network characteristics. • There are many more possible access plans for a distributed query.
Physical Data ManagementDistributed Query Processing cont’d • Local vs. Global query processing • In Local, • Queries are performed at a central server (single site) • In Global, • Must decide which sites to access for the fragments • May need to move fragments from site to site • Multiple optimization is needed for Global • Due to the multiple sites and access plans • Many possible access plans for Global • Choosing the best one may be difficult
Physical Data ManagementDistributed Query Processing cont’d • Communication Costs • Communication Time (CT) • Fixed Message Delay (MD) • Variable Transmission Time (TT) • CT = MD + TT • MD = Number of Messages * Delay per message • TT = Number of bits/Data rate
Physical Data ManagementDistributed Query Processing cont’d • Global Query Example (p. 632) • List the order number, order date, product number, product name, product price, and order quantity for eastern orders with a specified customer number, date range, and product color. • Four possible access plans
Physical Data ManagementDistributed Query Processing cont’d • Access Plan 1 • Move the Product table to the Tulsa site where the query is processed
Physical Data ManagementDistributed Query Processing cont’d • Access Plan 2 • Restrict the Product table at the Denver Site • Then move result to the Tulsa site to execute the remainder of the query
Physical Data ManagementDistributed Query Processing cont’d • Access Plan 3 • Perform join and restrictions of Eastern-Orders and Eastern Order-lines fragments at the Tulsa site • Then move result to Denver site to join with Product Table
Physical Data ManagementDistributed Query Processing cont’d • Access Plan 4 • Restrict the Product table at the Denver site • Move product numbers to Tulsa and do restrict/join • Then move result back to Denver to combine with Product table to get product names
Physical Data ManagementDistributed Query Processing cont’d • Obviously many different access plans can be used to answer the same query • Need to investigate • actual network costs • the local processing costs at each site to determine which access plan is the best
Physical Data ManagementTransactions – 2 Phase Commit Protocol • 2 Phase Commit (2PC) • Ensures that all transactions are Atomic • One site is selected as a Coordinator while other sites are Participants • Each Participant site execute a different part of the transaction • Two phases: Voting Phase and Decision Phase • Figure 17.18, page 634
Physical Data ManagementTransactions – 2 Phase Commit Protocol • Several Complications IF • Failures during recovery and Timeouts • Log records are lost • Coordinator fails • Etc. • Several methods to resolve these, but out of scope for this class
Logical Data ManagementTransparency • Transparency is related to data independence. • With transparency, users can write queries with no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions. • Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries.
Logical Data ManagementFragmentation Transparency • Fragmentation transparency provides the highest level of data independence. • Users formulate queries and transactions without knowledge of fragments (locations, or local formats). • If fragments change, queries and transactions are not affected. • Table 17.6, p 626
Logical Data ManagementLocation Transparency • Location transparency provides a lesser level of data independence than fragmentation transparency. • Users need to reference fragments in formulating queries and transactions. • However, knowledge of locations and local formats is not necessary. • Table 17.7, p 627
Conceptual Data ManagementSchema Integration • Multiple types of schemas may exist to describe the same dataset • Integrate multiple schemas into a single schema • Best explained using an Exercise
Conceptual Data ManagementSchema Integration - Exercise Engineering Database – Relational Schema E(eno, ename, title), p.k. = eno J(jno, jname, budget, loc, cname), p.k. = jno G(eno, jno, resp, dur), p.k. = eno, jno S(title, sal), p.k. = title First, find the common entities and relationships between schemas Employee Database – CODASYL Schema Department(dept-name, budget, manager) Employee(e#, name, address, title, salary) Department Employs Employee (1:N relationship) • Database III – E/R Model • Entities Engineer(Engineer No, name, title, salary) Project(PNo, project name, budget, location) Client(Client Name, Address) • Relationships Engineer Works_In Project : (Responsibility, Duration) Project Contract_By Client : (Contract Date)
Conceptual Data ManagementSchema Integration - Exercise Engineering Database – Relational Schema E(eno, ename, title), p.k. = eno J(jno, jname, budget, loc, cname), p.k. = jno G(eno, jno, resp, dur), p.k. = eno, jno S(title, sal), p.k. = title Second, draw the conceptual diagram for the common entities and relationships Employee Database – CODASYL Schema Department(dept-name, budget, manager) Employee(e#, name, address, title, salary) Department Employs Employee (1:N relationship) • Database III – E/R Model • Entities Engineer(Engineer No, name, title, salary) Project(PNo, project name, budget, location) Client(Client Name, Address) • Relationships Engineer Works_In Project : (Responsibility, Duration) Project Contract_By Client : (Contract Date)
Conceptual Data ManagementSchema Integration • No single correct solution to this exercise • It may be seen that there are multiple solutions to this problem • Considered a very HARD problem • Often hard to find the best synonyms especially from a large set of schemas
Weekly Exercise • Questions 2, 4, and 5