1 / 54

HIGHER SCHOOL OF DIGITAL ECONOMY

This exposition explores the concepts of distributed databases, including fragmentation, allocation, replication, and query processing. (246 characters)

Download Presentation

HIGHER SCHOOL OF DIGITAL ECONOMY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2015-2014 HIGHER SCHOOL OF DIGITAL ECONOMY Exposition of Advanced DataBases deals with: DISTRIBUTED DATABASES Web Intelligence Research Master PREPARED BY: NouhaSouid SamehJabbari EmnaBennour 19-02-2015

  2. Plan Introduction DDB Fragmentation Allocation and Replication Query Processing and Optimization in Distributed DataBases Conclusion

  3. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion decentralize the information Increase of the information volume How to improve the debit(flow) of the Input-output: Increase of the transactions volume

  4. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Need for database which supply a good time of answer on big data Volumes.

  5. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion The BDRs developed, thanks to the technological progress realized at the level of the network infrastructure.

  6. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion We can define a’’ distributed database (DDB) as a collection of multiple logically interrelated databases distributed over a computer network, and a distributed database management system (DDBMS) as a software system that manages a distributed database while making the distribution transparent to the user”*. * :This definition and discussions in this section are based largely on Ozsu and Valduriez (1999).

  7. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion BD Federateddatabase Multibase multiple heterogeneous Database are accessed as a single db via a common model Several database, Inter operate with an application through a common language and a common model

  8. DISTRIBUTED DATABASE MANAGEMENT SYSTEM

  9. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • A DDBMS (distributed database management system) is a centralized application that manages a distributed database as if it were all stored on the same computer. • The DDBMS synchronizes all the data periodically, and in cases where multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere

  10. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • In DDBMS, the distribution of applications involves – Distribution of the DDBMS software – Distribution of applications that run on the database • Distribution of applications will not be considered in the following; instead the distribution of data is studied.

  11. Advantages of DistributedDatabases

  12. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Additional Functions of Distributed Databases

  13. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Design of DDB • Bottom up design • The databases already exist at a number of sites • The databases should be connected to solve common tasks DDB Local DB Local DB Local DB

  14. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion TOP-DOWN • Designing systems from scratch • Homogeneous systems DDB Local DB Local DB Local DB

  15. DISTRIBUTED DATABASE ARCHITECTURES

  16. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion External level: the views are distributed on the user sites.

  17. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Abstract level: the abstract plan of the data is associated, byThe intermediary of the plan of distribution (the even decomposed into a plan ofFragmentation and a plan of allocation), in the local plans which are distributedOn several sites, the physical sites

  18. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Internal level: the global internal plan has no real existence but given way In local internal plans distributed on various sites.

  19. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion The fragmentation is the process of decomposition of a DB a set of sub-bases of data. => This decomposition has to be without loss of information.

  20. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Types of Fragmentation • Horizontal: partitions a relation along its tuples • Vertical: partitions a relation along its attributes • Mixed/hybrid: a combination of horizontal and vertical fragmentation Horizontal Fragmentation Vertical Fragmentation Mixed Fragmentation

  21. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Horizontal fragmentation: The cases(occurrences) of the same class can be distributed in different fragments • The operator of partitioning is the selection (σ) The operator of reorganization is the union (ᴜ)

  22. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion R1 R σ(A=a1) R2 σ(A=a2) R = R1 U R2

  23. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Client Client1= Client Where ville = ‘’Paris ‘’ Client2 = Client Where ville <> ‘’Paris ‘’ Client2 Client1 reconstruction : Client = Client1 U Client2

  24. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Vertical Fragmentation. • All the values of the cases(occurrences) for the same attribute are in the same Fragment. • A vertical fragmentation is useful to distribute the parts of the data On the site where each of these parts is used The operator of partitionnement is the projection ( ∏ ) The operator of reorganization is the joint

  25. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion cde cde1 Cde1 = Cde (ncde, nclient) Cde2 = Cde (ncde, produit, qté) cde2 Reconstruction Cde = [ncde, nclient, produit, qté] where Cde1.ncde = Cde2.ncde

  26. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Mixed (Hybrid) Fragmentation. The operation of partitioning is a combination of projections and selections. The operation of reorganization is a combination of joints and unions.

  27. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Example • The Customer table is obtained : (Cli3 ∪ Cli5) * Cli4 * Cli6 • Relation Cli3 π[NoClient, NomClient] (σ[Age < 38]Client) • Relation Cli5 π[NoClient, NomClient] (σ[Age ≥ 38]Client) • Relation Cli4 π[NoClient, Prénom]Client • Relation Cli6 π[NoClient, Age]Client

  28. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion counts • Objective: to split up Counts(NoClient, Agency, TypeCompte, Somme). • - Propose a plan of horizontal fragmentation, then vertical one by taking into account following requests: • R1 = π[NoClient, Agence] (σ[(TypeCompte = 'courant') ∧ (Somme > 100 000)] Compte) • R2 = π[NoClient, Somme] (σ[(Agence = 'Genève') ∧ (TypeCompte = 'courant')] Compte) • R3 = σ[Agence = 'Lausanne'] Compte • -

  29. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion R1 = π[NoClient, Agence] (σ[(TypeCompte = 'courant') ∧ (Somme > 100 000)] Compte) R2 = π[NoClient, Somme] (σ[(Agence = 'Genève') ∧ (TypeCompte = 'courant')] Compte) R3 = σ[Agence = 'Lausanne'] Compte

  30. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Replication: Which fragements shall be stored as multiple copies? – Complete Replication ∗ Complete copy of the database is maintained in each site – SelectiveReplication ∗ Selected fragments are replicated in some sites

  31. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Allocation: On which sites the various fragments should be stored? – Centralized ∗ Consists of a single DB and DBMS stored at one site with users distributed across the network – Partitioned ∗ Database is partitioned into disjoint fragments, each fragment assigned to one site

  32. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Example • Client1 = s Ville = Paris (Client) • Client2 = s Ville != Paris (Client) • Commande1 = Commande g Client1 • Commande2 = Commande g Client2 • Allocation @Site1 : Client1, Commande1 @Site2 : Client2, Commande2

  33. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Update of distributeddatabase

  34. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Requests on distributed DBS • We produce first of all the algebrictree of the request. Every sheet(leaf) of the tree represents a relation, and every knot represents an algebricoperation. We enrich the tree by specifying on which site every operation must be executed.

  35. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • The complexity of a request in a distributed database is defined according to the following factors: • Input/Output on disks. • Cost CPU. • Communication on the network. => In a centralized database, only factors I/O and CPU determines complexity of a request.

  36. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Data transfer:

  37. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Data transfer: Example: P (NP, NOMP, MADE IN, COULEUR, POIDS). F (NF, NOMF, VILLE, ADRESSE, PAYS, COEF). The speed of transmission is 1000 bytes of useful informations per second

  38. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Data transfer: Correction of the example: -> Size of P: 10 000 * 90 = 900 000 Time of transmission (P): 900 000/1000 = 900 (s) -> Size of F: 100 * 120 = 12 000 Time of transmission (F): 12 000/1000 = 12 (s) => Communication time: 12 + 900 = 912 (s)

  39. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Treatment of distributed requests:

  40. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Dynamic optimization of the requests: Having generated a tree of request, the strategy adopted for the execution is ascending.

  41. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Semi-join: It is in fact about a double joint: the principle is to make two small joints rather than a big. The semi-joint reduces the size of the relations operands. She allows to reduce the size of the data to pass on.

  42. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Semi-join: Example: havingR1 and R2 two relations are repectivelyfound on the sites S1 and S2. But : EvaluateR1 R2 on the sites S1. The algorithm of semi-joint functionlike the following: • S1> temp1 ← πR1∩ R2(R1) • S1> sending temp1 to S2 • S2> temp2 ← R2 temp1 • S2> sending temp2 to S1 • S1> R1 temp2 (equal to R1 R2)

  43. EXERCISE • We consider the following relational DB: • Product(Pnum,libelle,annee-fabrication,Fnum,Categorie,Px-revient) • Supplier(Fnum,Nom,Prenom,Adresse) • Customer(Cnum,Nom,Prenom,Specialite,contract) • Command(Ncde,Cnum,Fnum,Quantité,Px-unitaire) The category attribute in the relation product designs the category of every product, we distinguish essentially 4 categories:Alimentary, pharmacetical, cleaning and other

  44. 1/SQL express the following question: give the libel (libellé) and manufacturing year (année-abrication) of alimentary products a cost price above 200 We suppose in DB distributed on 4 sites:alimentary, pharmacetique, cleaning, other matches different values of category attribut in the relation product • 2/suggest and justify a decomposition of the base hard 4 sites knowing that: - Every product belongs to a category - all customer are managed on the site ”autre” - Every supplier can have all product’s categories • 3/ specify for every relation product, supplier, customer and commands the used type of fragmentation given 4/give the definition of every site in the form of a requet of relationel algebra and proposed a plan of execution for the SQlrequet seen in 1. If the requet is to launch of site

  45. 1/ Select libellee,annee-fabrication From Produit WhereCategorie = ‘alimentaire’ and px-revient>200; • 2/ Alimentary Pharmacetical Product-Alimentary Supplier Product-Pharmacetical Supplier other Cleaning Product-other Supplier costumer Command Product-Cleaning Supplier

  46. 3/ Product: Horizontal fragmentation supplier: Complete Replication Costumer: Centralized Command: Centralized • 4/ S1: Product-Alimentary=σ[categorir=‘alimentary’] Product S2: Product-Pharmacetical= σ[categorie=‘Pharmacetical’] Product Supplier_S2= σ (Supplier) S3: Product-Cleaning= σ[categorie=‘Cleaning’] Product Supplier_S2= σ (Supplier)

  47. S4: Product-other= σ[categorie!=‘Cleaning’, categorie!=‘Pharmacetical’, categorir!=‘alimentary’] Product Supplier_S2= σ (Supplier) Costumer= σ costumer Command= σ Command

  48. 4/ Select libellee,annee-fabricationFrom ProduitWhereCategorie = ‘alimentaire’ and px-revient>200; Alimentary π(libelle,Anee-fabrication) σ(px-revient>20) Product-Alimentary other Select libellee,annee-fabricationFromProduct-alimentary@S1

  49. Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Conclusion The distribution of the data entrain the revision of the notions of storage of the data, the techniques of cataloguing, the treatment of the requests, the control of the simultaneous access as well as the resumption. • Complexity increased to manage and synchronize the data and the transactions. • So certain data necessary for the treatment can pull important deadlines are situated on a distant knot.

  50. International Journal of Computer & Communication Engineering Research (IJCCER) Effective Refinement Heuristic For Distributed Database Partitioning Using Weka TinMyintNaing, AungWin Ph.DStudent of UT(YCC), PyinOoLwin, Myanmar,utinmyintnaing08@gmail.com Principal, UT(YCC), PyinOoLwin, Myanmar page140 -144 Volume 2 - Issue 4 July 2014

More Related