HIGHER SCHOOL OF DIGITAL ECONOMY

2015-2014 HIGHER SCHOOL OF DIGITAL ECONOMY Exposition of Advanced DataBases deals with: DISTRIBUTED DATABASES Web Intelligence Research Master PREPARED BY: NouhaSouid SamehJabbari EmnaBennour 19-02-2015

Plan Introduction DDB Fragmentation Allocation and Replication Query Processing and Optimization in Distributed DataBases Conclusion

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion decentralize the information Increase of the information volume How to improve the debit(flow) of the Input-output: Increase of the transactions volume

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Need for database which supply a good time of answer on big data Volumes.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion The BDRs developed, thanks to the technological progress realized at the level of the network infrastructure.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion We can define a’’ distributed database (DDB) as a collection of multiple logically interrelated databases distributed over a computer network, and a distributed database management system (DDBMS) as a software system that manages a distributed database while making the distribution transparent to the user”*. * :This definition and discussions in this section are based largely on Ozsu and Valduriez (1999).

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion BD Federateddatabase Multibase multiple heterogeneous Database are accessed as a single db via a common model Several database, Inter operate with an application through a common language and a common model

DISTRIBUTED DATABASE MANAGEMENT SYSTEM

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • A DDBMS (distributed database management system) is a centralized application that manages a distributed database as if it were all stored on the same computer. • The DDBMS synchronizes all the data periodically, and in cases where multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • In DDBMS, the distribution of applications involves – Distribution of the DDBMS software – Distribution of applications that run on the database • Distribution of applications will not be considered in the following; instead the distribution of data is studied.

Advantages of DistributedDatabases

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Additional Functions of Distributed Databases

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Design of DDB • Bottom up design • The databases already exist at a number of sites • The databases should be connected to solve common tasks DDB Local DB Local DB Local DB

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion TOP-DOWN • Designing systems from scratch • Homogeneous systems DDB Local DB Local DB Local DB

DISTRIBUTED DATABASE ARCHITECTURES

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion External level: the views are distributed on the user sites.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Abstract level: the abstract plan of the data is associated, byThe intermediary of the plan of distribution (the even decomposed into a plan ofFragmentation and a plan of allocation), in the local plans which are distributedOn several sites, the physical sites

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Internal level: the global internal plan has no real existence but given way In local internal plans distributed on various sites.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion The fragmentation is the process of decomposition of a DB a set of sub-bases of data. => This decomposition has to be without loss of information.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Types of Fragmentation • Horizontal: partitions a relation along its tuples • Vertical: partitions a relation along its attributes • Mixed/hybrid: a combination of horizontal and vertical fragmentation Horizontal Fragmentation Vertical Fragmentation Mixed Fragmentation

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Horizontal fragmentation: The cases(occurrences) of the same class can be distributed in different fragments • The operator of partitioning is the selection (σ) The operator of reorganization is the union (ᴜ)

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion R1 R σ(A=a1) R2 σ(A=a2) R = R1 U R2

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Client Client1= Client Where ville = ‘’Paris ‘’ Client2 = Client Where ville <> ‘’Paris ‘’ Client2 Client1 reconstruction : Client = Client1 U Client2

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Vertical Fragmentation. • All the values of the cases(occurrences) for the same attribute are in the same Fragment. • A vertical fragmentation is useful to distribute the parts of the data On the site where each of these parts is used The operator of partitionnement is the projection ( ∏ ) The operator of reorganization is the joint

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion cde cde1 Cde1 = Cde (ncde, nclient) Cde2 = Cde (ncde, produit, qté) cde2 Reconstruction Cde = [ncde, nclient, produit, qté] where Cde1.ncde = Cde2.ncde

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Mixed (Hybrid) Fragmentation. The operation of partitioning is a combination of projections and selections. The operation of reorganization is a combination of joints and unions.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Example • The Customer table is obtained : (Cli3 ∪ Cli5) * Cli4 * Cli6 • Relation Cli3 π[NoClient, NomClient] (σ[Age < 38]Client) • Relation Cli5 π[NoClient, NomClient] (σ[Age ≥ 38]Client) • Relation Cli4 π[NoClient, Prénom]Client • Relation Cli6 π[NoClient, Age]Client

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion counts • Objective: to split up Counts(NoClient, Agency, TypeCompte, Somme). • - Propose a plan of horizontal fragmentation, then vertical one by taking into account following requests: • R1 = π[NoClient, Agence] (σ[(TypeCompte = 'courant') ∧ (Somme > 100 000)] Compte) • R2 = π[NoClient, Somme] (σ[(Agence = 'Genève') ∧ (TypeCompte = 'courant')] Compte) • R3 = σ[Agence = 'Lausanne'] Compte • -

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion R1 = π[NoClient, Agence] (σ[(TypeCompte = 'courant') ∧ (Somme > 100 000)] Compte) R2 = π[NoClient, Somme] (σ[(Agence = 'Genève') ∧ (TypeCompte = 'courant')] Compte) R3 = σ[Agence = 'Lausanne'] Compte

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Replication: Which fragements shall be stored as multiple copies? – Complete Replication ∗ Complete copy of the database is maintained in each site – SelectiveReplication ∗ Selected fragments are replicated in some sites

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Allocation: On which sites the various fragments should be stored? – Centralized ∗ Consists of a single DB and DBMS stored at one site with users distributed across the network – Partitioned ∗ Database is partitioned into disjoint fragments, each fragment assigned to one site

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Example • Client1 = s Ville = Paris (Client) • Client2 = s Ville != Paris (Client) • Commande1 = Commande g Client1 • Commande2 = Commande g Client2 • Allocation @Site1 : Client1, Commande1 @Site2 : Client2, Commande2

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Update of distributeddatabase

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Requests on distributed DBS • We produce first of all the algebrictree of the request. Every sheet(leaf) of the tree represents a relation, and every knot represents an algebricoperation. We enrich the tree by specifying on which site every operation must be executed.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • The complexity of a request in a distributed database is defined according to the following factors: • Input/Output on disks. • Cost CPU. • Communication on the network. => In a centralized database, only factors I/O and CPU determines complexity of a request.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Data transfer:

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Data transfer: Example: P (NP, NOMP, MADE IN, COULEUR, POIDS). F (NF, NOMF, VILLE, ADRESSE, PAYS, COEF). The speed of transmission is 1000 bytes of useful informations per second

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Data transfer: Correction of the example: -> Size of P: 10 000 * 90 = 900 000 Time of transmission (P): 900 000/1000 = 900 (s) -> Size of F: 100 * 120 = 12 000 Time of transmission (F): 12 000/1000 = 12 (s) => Communication time: 12 + 900 = 912 (s)

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Treatment of distributed requests:

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Dynamic optimization of the requests: Having generated a tree of request, the strategy adopted for the execution is ascending.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Semi-join: It is in fact about a double joint: the principle is to make two small joints rather than a big. The semi-joint reduces the size of the relations operands. She allows to reduce the size of the data to pass on.

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion • Semi-join: Example: havingR1 and R2 two relations are repectivelyfound on the sites S1 and S2. But : EvaluateR1 R2 on the sites S1. The algorithm of semi-joint functionlike the following: • S1> temp1 ← πR1∩ R2(R1) • S1> sending temp1 to S2 • S2> temp2 ← R2 temp1 • S2> sending temp2 to S1 • S1> R1 temp2 (equal to R1 R2)

EXERCISE • We consider the following relational DB: • Product(Pnum,libelle,annee-fabrication,Fnum,Categorie,Px-revient) • Supplier(Fnum,Nom,Prenom,Adresse) • Customer(Cnum,Nom,Prenom,Specialite,contract) • Command(Ncde,Cnum,Fnum,Quantité,Px-unitaire) The category attribute in the relation product designs the category of every product, we distinguish essentially 4 categories:Alimentary, pharmacetical, cleaning and other

1/SQL express the following question: give the libel (libellé) and manufacturing year (année-abrication) of alimentary products a cost price above 200 We suppose in DB distributed on 4 sites:alimentary, pharmacetique, cleaning, other matches different values of category attribut in the relation product • 2/suggest and justify a decomposition of the base hard 4 sites knowing that: - Every product belongs to a category - all customer are managed on the site ”autre” - Every supplier can have all product’s categories • 3/ specify for every relation product, supplier, customer and commands the used type of fragmentation given 4/give the definition of every site in the form of a requet of relationel algebra and proposed a plan of execution for the SQlrequet seen in 1. If the requet is to launch of site

1/ Select libellee,annee-fabrication From Produit WhereCategorie = ‘alimentaire’ and px-revient>200; • 2/ Alimentary Pharmacetical Product-Alimentary Supplier Product-Pharmacetical Supplier other Cleaning Product-other Supplier costumer Command Product-Cleaning Supplier

3/ Product: Horizontal fragmentation supplier: Complete Replication Costumer: Centralized Command: Centralized • 4/ S1: Product-Alimentary=σ[categorir=‘alimentary’] Product S2: Product-Pharmacetical= σ[categorie=‘Pharmacetical’] Product Supplier_S2= σ (Supplier) S3: Product-Cleaning= σ[categorie=‘Cleaning’] Product Supplier_S2= σ (Supplier)

S4: Product-other= σ[categorie!=‘Cleaning’, categorie!=‘Pharmacetical’, categorir!=‘alimentary’] Product Supplier_S2= σ (Supplier) Costumer= σ costumer Command= σ Command

4/ Select libellee,annee-fabricationFrom ProduitWhereCategorie = ‘alimentaire’ and px-revient>200; Alimentary π(libelle,Anee-fabrication) σ(px-revient>20) Product-Alimentary other Select libellee,annee-fabricationFromProduct-alimentary@S1

Introduction DDB fragmentation Allocation and replication QueryProcessing and Optimization in DDB Conclusion Conclusion The distribution of the data entrain the revision of the notions of storage of the data, the techniques of cataloguing, the treatment of the requests, the control of the simultaneous access as well as the resumption. • Complexity increased to manage and synchronize the data and the transactions. • So certain data necessary for the treatment can pull important deadlines are situated on a distant knot.

International Journal of Computer & Communication Engineering Research (IJCCER) Effective Refinement Heuristic For Distributed Database Partitioning Using Weka TinMyintNaing, AungWin Ph.DStudent of UT(YCC), PyinOoLwin, Myanmar,utinmyintnaing08@gmail.com Principal, UT(YCC), PyinOoLwin, Myanmar page140 -144 Volume 2 - Issue 4 July 2014

HIGHER SCHOOL OF DIGITAL ECONOMY