1 / 41

Architecture of Cloud Computing and Distributed Database Systems

Architecture of Cloud Computing and Distributed Database Systems. Didem Gündoğdu 23.12.2011. CMPE 422. 1. Why everybody wants the Cloud?. What is Cloud Computing?. • Internet computing – Computation done through the Internet – No concern about any maintenance or management

sheera
Download Presentation

Architecture of Cloud Computing and Distributed Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture of Cloud Computing and Distributed Database Systems Didem Gündoğdu 23.12.2011 CMPE 422 1

  2. Why everybody wants the Cloud?

  3. What is Cloud Computing? • Internet computing – Computation done through the Internet – No concern about any maintenance or management of actual resources • Shared computing resources – As opposed to local servers and devices • Comparable to Grid Infrastructure • Web applications • Specialized raw computing services

  4. Why Cloud Computing? • Costs • Independency • Scalability • Redundancy and Disaster Recovery

  5. Cloud Service Taxonomy? • Layer – Software-as-a-Service (SaaS) – Platform-as-a-Service (PaaS) – Infrastructure-as-a-Service (IaaS) – Data Storage-as-a-Service (DaaS) – Communication-as-a-Service (CaaS) – Hardware-as-a-Service(HaaS) • Type – Public cloud – Private cloud – Inter-cloud

  6. The Software Stack and the Cloud SaaS:Softwareas a Service PaaS:Platformas a Service IaaS:Infrastructureas a Service Application ApplicationPlatform Infrastructure „old-style“ applications: everything local IaaS: infrastructure in the cloud, everything else „local“ PaaS: application development & runtime in the cloud SaaS: software „as is“ from the cloud

  7. Service Value to Users Value visibility to end users Salesforce.com,Google docs, Emailservices SaaS users Microsoft Azure,Google App Engine PaaS Application developers IaaS Amazon Elastic Compute Cloud(EC2), Amazon Simple StorageService (S3) Network admins Vision: Use applications just as we use electrical power

  8. SaaS: Software as a Service Service Provider hosts application that multiple customers access over the Internet Sales, marketing, support, payroll, … Goal: reduce total cost of ownership (TCO) Hardware and software („capital expenditures“) Bandwidth, personnel, electricity („operational exp.“) Most useful for small/medium businesses without own data center (and without knowledge in this area)

  9. Variants of Scalability Handle large data sets and many data sets Scale up: big iron Scale out: commodityhardware Scale in: multiple apps/tenants/VMs on singlemachine

  10. Scale Out: Pros and Cons Pros: more performance for the same cost (hardware, energy) Simple failure handling (replace broken machines, impact of failure is small) Easy to incrementally adjust capacity Easy to incrementally upgrade Cons: Frequent load balancing and capacity adjustment necessary (over-provisioning, move distributed data) Large data sets distributed over many machines (reads require good distribution of work or many network requests; writes require transactions)

  11. Scale In: Multi-Tenancy Run application for multiple clients („tenants“) on the same machine/in the same process Only makes sense if enough tenants can be served from one machine Reduces CapEx: resource utilization increased, fewer/smaller hardware required Reduces OpEx: Fewer processes/machines to manage Where is multi-tenancy feasible?

  12. Shared Machine One dedicated database process for each tenant Does not scale beyond a few tenants per server (overhead!) Applications with small number of tenants

  13. Shared Process Single database process manages several tenants (in separate table spaces) Scales to many tenants (regarding overhead) Migration of a tenant to a different process is simple (moving all data files)

  14. Shared Table Data from many tenants in the same table Add tenant_id column, must be set for each access (by application or by dbms) But: sometimes tenants require additional columns Extend schema with generic columns; database needs to efficiently support rows with many NULL values Advantage: everything is pooled Processes, memory, connections, prepared statements New tenants can be created by DML (not DDL) operations Disadvantage: isolation is very weak Query optimization, statistics, cost estimation more difficult (data from other tenants can influence estimates) Table scans more expensive, caching etc not as effective Tenant migration by extracting data from the operational system

  15. Schema Flexibility in SaaS Each tenant uses common base schema plus optional (private or shared) extensions All mapped into one (multi-tenant) physical schema by a dedicated Mapper in the database Schema evolution must be possible during operation, without intervention by DBA

  16. Schema Management Two alternatives: Database manages the schema (evolution through DDL operations): Private tables Extension tables Sparse columns Application manages the schema (evolution through DML operations) XML Columns Universal tables Pivot tables Running example:simple account table Account

  17. Private Tables Account_3 Account_2 Account_1 • Each tenant uses private set of tables • Works well for small number of tables and tenants • Constant overhead per table • each table is small, storage space not used effectively Automotive Domain Healthcare Domain

  18. Private Tables – Query Transformation • Tenant 1: SELECT Beds FROM AccountWHERE Hospital=‘State‘ • SELECT Beds FROM Account_1 WHERE Hospital=‘State‘ • Tenant 2: SELECT Name FROM AccountWHERE Dealers>50 • SELECT Name FROM Account_2WHERE Dealers>50 Account_1 Account_2

  19. Extension Tables • Keep common data in base table with explicit Tenant-ID and row-ID • Each tenant‘s extension gets its own table, join on tenant-ID and row-ID at query time • Number of table still proportional to number of tenants (but additional tables are rather small) Account_Healthcare Account Account_Automotive

  20. Extension Tables – Query Transformation Account • Tenant 1: SELECT Name,BedsFROM AccountWHERE Hospital=‘State‘ • SELECT A.Name,H.BedsFROM Account A,Account_Healthcare H WHERE A.Tenant=1 AND H.Tenant=1 AND A.Row=H.Row AND H.Hospital=‘State‘ Account_Healthcare

  21. Sparse Columns • Each tuple has only a few out of many possible attributes (e.g., catalogues)  many NULL values • Store only attributes with values to avoid NULLs • store values with their attribute (identifier) • not widely supported in systems (Microsoft SQL Server) • Add extensions as „sparse column“ to tables • Schema evolution requires DDL Account

  22. Sparse Columns – Query Transformation System retrieves attribute ID andextracts values from SPARSE column • CREATE TABLE Account (Tenant INT, Account INT, Name VARCHAR(100),Hospital VARCHAR(100) SPARSE,Beds INT SPARSE, Dealer INT SPARSE) • Tenant 1: SELECT Name,BedsFROM AccountWHERE Tenant=1 AND Hospital=‘State‘ Account

  23. Universal Table • Use wide table with generic VARCHAR columns • Application-specific mapping • Requires casting of non-textual values • Many NULL values, usually no indexes possible • Used with huge number of extensions and many tenants Universe

  24. Universal Table – Query Transformation • Tenant 1: SELECT Name,BedsFROM AccountWHERE Hospital=‘State‘ • SELECT Col2, TO_INTEGER(Col4)FROM UniverseWHERE Tenant=1 AND Table=1 AND Col3=‘State‘ Universe

  25. Pivot Tables • Store data in 3-ary tables with column_ids and values • One tuple for each non-NULL attribute of original table • One pivot table for each type (int, string, …) • Eliminates the problem of many NULL values • No casts necessary, indexing possible • Google BigTable Pivot_Int Pivot_String

  26. Pivot Tables – Example Account_3 Account_2 Account_1 Automotive Domain Healthcare Domain Pivot_Int Pivot_String

  27. Pivot Tables – Query Transformation Pivot_Int • Tenant 1: SELECT BedsFROM AccountWHERE Hospital=‘State‘ • SELECT I.IntFROM Pivot_Int I,Pivot_String S WHERE I.Tenant=1 AND S.Tenant=1 AND S.Table=0 AND S.Col=3 AND I.Table=0 AND I.Col=4 AND I.Row=S.Row AND S.String=‘State‘ Pivot_String

  28. Pivot Tables – Query Transformation Pivot_Int • Tenant 1: SELECT Name,BedsFROM AccountWHERE Hospital=‘State‘ • SELECT S1.String,I.IntFROM Pivot_Int I,Pivot_String S1, Pivot_String S2 WHERE I.Tenant=1 AND S1.Tenant=1AND S2.Tenant=1ANDS1.Table=0 AND S1.Col=2 AND S2.Table=0 AND S2.Col=3 AND I.Table=0 AND I.Col=4 AND I.Row=S2.RowANDS1.Row=S2.Row AND S2.String=‘State‘ Pivot_String

  29. Google BigTable Group columns into column families Explicitly defined Not too many (hundreds), should be stable Expected type, but internally stored as strings Roughly correspond to pivot table Data in a family compressed and stored together Columns Created on the fly, unbounded number

  30. Google BigTable BigTable One column family for logical table „Account“

  31. Potential Problems with Multi-Tenancy Resource contention among tenants Main problem: malicious and bad requests Impose limit on resource consumption of requests SLAs Access control among tenants Dangerous to rely only on application (bugs!) Carefully implement Mappers or use tuple-level ACLs (expensive and not always available) Moving data for individual tenants Load balancing, archiving (, backup) Requires expensive queries at runtime

  32. Different Cloud Computing Layers MS Live/ExchangeLabs, IBM, Google Apps; Salesforce.com Quicken Online, Zoho, Cisco Application Service (SaaS)‏ Google App Engine, Mosso, Force.com, Engine Yard, Facebook, Heroku, AWS Application Platform 3Tera, EC2, SliceHost, GoGrid, RightScale, Linode Server Platform Storage Platform Amazon S3, Dell, Apple, ...

  33. Technical Issues • Virtualization Security • Reliability • Monitoring • Manageability

  34. Virtualization Security (1/2) • Virtualization – Abstracting the underlying resources so that multiple operating systems can be run on a single physical environment simultaneously – Improving resource utilization by sharing available resources to multiple on demand needs

  35. Virtualization Security (2/2) • Virtualization security – Including the standard enterprise security policies on access control, activity monitoring and patch management • Many IT people still believe that the hypervisor and virtual Machines are safe – Becoming one of the important factors when virtualization technologies move into the cloud • Access control and monitoring of the virtual infrastructure will be on top of providers’ mind

  36. Reliability One of the biggest factors in enterprise adoption – Almost no SLAs provided by the cloud providers today • Enterprises cannot reasonably rely on the cloud infrastructures/platforms to run their business • Amazon said that “AWS (Amazon Web Service) only provides SLA for their S3 service” – Hard to imagine enterprises signing up cloud computing contracts without SLAs clearly defined • Some startup coming up with clever idea to provide SLA as a third party vendor • Cloud providers to grow/wake up and actually do something to encourage the enterprise adoption

  37. Monitoring • Critical to any IT for performance, availability and payment • Monitoring in Cloud Computing – How much CPU or memory the machines are using • CPU and memory usage are misleading most of the time in virtual environments – Only real measurements: how long your transactions are taking and how much latencies there are • High Availability’s article on latency – Amazon finding every 100ms of latency cost them 1% in sales – Google finding an extra .5 seconds in search page generation time dropped traffic by 20%

  38. Managebility • Most of IaaS/PaaS providers being raw infrastructures and platforms that do not have great management capabilities • Auto-scaling: example of missing management capabilities for cloud infrastructures – Amazon EC2 claiming to be elastic; however, it really means that it has the potential to be elastic – Amazon EC2 not automatically scaling your application as your server becomes heavily loaded

  39. Summary: Cloud Computing New paradigm to provide services (S,P,I) SaaS important part of cloud computing Different forms of scaling: up, out, in Re-use processes: multiple tenants on single database instance (but: isolation?) Map many logical schemas to a single physical schema, many mapping techniques(but: query transformation?)

  40. References References 1.Architecture and Design of Distributed Database Systems, WAMDM Cloud Computing Group Haiping Wang 2.Eom, Hyeonsang ,School of Computer Science & Engineering Seoul National University 3. Distributed Database Systems , Grid and Cloud, Katja Hose, Ralf Schenkel 4. Introduction to Cloud Computing and Technical Issues , Eom, Hyeonsang School of Computer Science & Engineering Seoul National University 5. Cloud Computing Databases: Latest Trends and Architectural Concepts , Tarandeep Singh, Parvinder S. Sandhu 6. Cloud Computing & Databases , Mike Hogan, Scale DB INC.

  41. Thank you!!!

More Related