1 / 41

DOLLY: Virtualization-Driven Database Provisioning for the Cloud

VEE 2011 paper available at: http://lass.cs.umass.edu/papers.html. DOLLY: Virtualization-Driven Database Provisioning for the Cloud. Emmanuel Cecchet Joint work with Rahul Singh, Upendra Sharma and Prashant Shenoy. PROVISIONING IN THE CLOUD. Virtualization No shared disk

durin
Download Presentation

DOLLY: Virtualization-Driven Database Provisioning for the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VEE 2011 paper available at: http://lass.cs.umass.edu/papers.html DOLLY: Virtualization-Driven Database Provisioning for the Cloud Emmanuel Cecchet Joint work with Rahul Singh, Upendra Sharma and PrashantShenoy

  2. PROVISIONING IN THE CLOUD • Virtualization • No shared disk • Provisioning based on volume and machine load Internet Frontend/ Load balancer App. Servers Provisioning logic Databases

  3. WHY IS IT HARD TO ADD A DB REPLICA? Administration issues Install/setup new database Backup/Restore database content Configure new replica Synchronize live content How much time does it take? Depends on the database size Depends on the backup/restore technique From minutes to hours… Dolly: VM cloning to blackbox the database

  4. Dolly • Database replication in the Cloud • Provisioning with Dolly • Prototype & Evaluation

  5. SPAWNING A REPLICA Multi-master middleware-based replication Three main phases Backup Restore Replay add replica 1 4 resynchronize 2 3 restore snapshot backup Client SQL requests Management console Replication middleware Transactional log Load balancer new replica DB1 DB2

  6. SPAWNING A REPLICA WITH CLONING Backup & Restore replace by VM cloning VM3 VM3 DB2 DB2 OS OS 4 add replica resynchronize 1 VM1 VM2 2 3 clone clone OS OS Client SQL requests Management console Replication middleware Transactional log Load balancer DB1 DB2 3

  7. Cloning: Backup/Restore in constant time • Filesystem snapshot/copy is DB agnostic • Only depends on VM size

  8. Dolly • Database replication in the Cloud • Provisioning with Dolly • Prototype & Evaluation

  9. MODELING SPAWNING TIME Predictable backup and restore times are required Replay time can be estimated from current write throughput (wt) replica spawning time backup restore replay time ri bi updates

  10. WHEN TO SNAPSHOT? Time to spawn from a live replica Time to spawn from an existing snapshot Faster to take a new snapshot j to spawn a new replica than using old snapshot i if:backupj+restorej<restorei +replayi

  11. DOLLY OVERVIEW Input capacity prediction write prediction Output schedule of snapshots schedule of replica spawning admission control if needed Dolly write predictions capacity predictions Snapshot scheduler Capacity Provisioning HA adjuster Spawning options Write throttling Paused pool cleaner Scheduler start/stop clone/ snapshot write throttling/ read throttling delete VM/ snapshot Predictors Admission Control Management API Monitoring Free pool Manager reclaim

  12. PROVISIONING REPLICAS • Dolly does not provide predictors • Dolly can work with any predictor (see [Eurosys09]) Workload prediction Capacity prediction Write prediction

  13. CLOUD COST FUNCTIONS Adapt the provisioning decisions to the cloud platform specifics Cost can be $ on public cloud or time on private cloud

  14. PROVISIONING REPLICAS Parse capacity provisioning predictions Decrease capacity by pausing VMs Increasing capacity Check if we can reuse a paused VM Check if we can spawn from an existing snapshot Choose cheapest options according to spawn_costfunction Perform admission control if all replicas cannot be provisioned in time

  15. SNAPSHOT SCHEDULING How to snapshot? Clone a paused VM Pause an active VM to clone it When to snapshot? At time j whenbackupj+restorej<restorei+replayi If new snapshot is scheduled, re-run capacity provisioning Prediction window must have minimum size

  16. Dolly • Database replication in the Cloud • Provisioning with Dolly • Prototype & Evaluation

  17. IMPLEMENTATION C-JDBC/Sequoia replication middleware OpenNebula Cloud management middleware Cost functions private cloud: minimize resource utilization time Amazon EC2: minimize cost VM4 VMclone VM5 VM3 VM2 VM1 OS OS OS OS OS OS Dolly Private EC2 Recovery Log Dump table Log table DB3snapshot New replica New replica admission control TPC-W load injector predictions Sequoia driver write throttling SQL requests add/remove replica snapshot/pause/… Sequoia controller JMX Management API Scheduler Backupers Dolly OpenNebula Loadbalancer start/stop/ clone/… DB1 DB2 DB3 OpenNebula clone clone Backup server or NAS

  18. IMPLEMENTATION – COST FUNCTIONS Private cloud: minimize resource utilization Amazon EC2: minimize cost

  19. WORKLOAD DESCRIPTION TPC-W online bookstore e-commerce benchmark Snapshot s0 available at t0

  20. Overprovisioning with 6 replicas – 1h snapshot

  21. Reactive provisioning – 2h snapshot replicas available replica spawning triggered here

  22. Dolly – 30m Prediction Window Private Cloud Amazon EC2 s2 s1 cheaper to leave instances online

  23. VM cloning Solves administration issues by blackboxing the database Constant time backup/restore needed to predict replica spawning time New provisioning algorithm Decouples capacity provisioning from snapshot scheduling Cost functions to optimize for cloud platform specifics CONCLUSION VEE 2011 paper available at: http://lass.cs.umass.edu/papers.html

  24. Bonus Slides

  25. Reactive provisioning – 15m snapshot

  26. Reactive provisioning – 1h snapshot

  27. Dolly – 10m Prediction Window Private Cloud Amazon EC2 s1 s1 s2 s2

  28. SPAWNING IN A PUBLIC CLOUD Storage decoupled from computing resource Starting a new instance clones the volume Vol2 Vol4 Vol2 Vol3 DB2 DB2 DB2 DB2 DB1 DB1 OS OS OS OS DB1 snapshot stop register start restart Vol1 Vol1 Vol1 OS OS OS

  29. BACKUP/RESTORE TECHNIQUES Database native tools Vendor specific or 3rd party ETL Understand database semantics Filesystem copy Low-level data copy Need to know what to copy VM cloning Copies database content + configuration + OS Unused space can be compressed

  30. DATABASE SIZES

  31. BACKUP/RESTORE PERFORMANCE (1/3) Performance depends on database content

  32. BACKUP/RESTORE PERFORMANCE (2/3) • File copy is the most effective for small databases

  33. BACKUP/RESTORE PERFORMANCE (3/3) • VM cloning most effective on large databases

  34. BACKUP/RESTORE SUMMARY

  35. DOLLY MAIN ALGORITHM Capacity provisioning depends on available snapshots Snapshots scheduled according to capacity demand Decouple capacity provisioning from snapshot scheduling if (predictor.capacity_changes || predictor.write_workload_changes) { do { schedule = capacity_provisioning(predictions) snapshot_schedule = snapshot_scheduling(predictions) } while (snapshot_schedule schedules new snapshots) scheduler.schedule(snapshot_schedule) scheduler.schedule(capacity_schedule) } if (time since last operation > threshold) { paused_pool_cleaner.release_old_paused_vms(); paused_pool_cleaner.delete_old_snapshots(); }

  36. RELEASING RESOURCES Paused VMs VM never re-used if cost to resume > cost to spawn from last snapshot Snapshots Old snapshots can be released based on cost to keep them around Free server pool Can reclaim servers with paused VMs when pool is empty

  37. TPC-W EVALUATION Multi-tier online bookstore benchmark 4GB VM for the database Large EC2 instances from EBS volumes with CloudWatch

  38. PHASE 1 – CAPACITY PROVISIONING Pause VMs when capacity decreases Resume VMs when capacity increases Spawn VMs from snapshot when additional capacity required Not cost effective to pause VMs for less than 1 hour with EC2

  39. PHASE 2 – SNAPSHOT SCHEDULING Faster (but more costly) to spawn a new replica and take a new snapshot from it to spawn 3 replicas at d3 Faster to snapshot paused VM and spawn replica from it for d5 Cost of volume storage on EC2 more expensive than IO cost

  40. PHASE 3 – CAPACITY PROVISIONING Schedule new replica spawning using new snapshots Next snapshot scheduling does not generate new snapshots and the algorithm terminates

  41. FINAL SCHEDULING Different scheduling for private and public cloud Minimize resource (energy) usage for private cloud Minimize cost for for public cloud

More Related