1 / 31

SMILE: A Data Sharing Platform for Mobile Apps in the Cloud

SMILE: A Data Sharing Platform for Mobile Apps in the Cloud. Mohamed Sarwat UMN. Haopeng Zhang UMass, Amherst. Jagan Sankaranaryanan Hakan Hacıgümüs NEC Labs America. Motivation For Sharing in Cloud. Mobile apps run their databases in the cloud Often small databases

kylie-bird
Download Presentation

SMILE: A Data Sharing Platform for Mobile Apps in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SMILE: A Data Sharing Platform for Mobile Apps in the Cloud Mohamed Sarwat UMN Haopeng Zhang UMass, Amherst Jagan Sankaranaryanan HakanHacıgümüs NEC Labs America

  2. Motivation For Sharing in Cloud • Mobile apps run their databases in the cloud • Often small databases • Often hosted in the same cloud infrastructure • Often need “fresh” data from other apps • e.g., Calendar app wants the itinerary from airline booking app • Need a declarative way for apps to share data Sharing MiddLewarE(SMILE) Database As a Service Multitenant Database App 1 DB App 2 DB App n DB

  3. Declarative Sharing Sharing (S1): CloudDB Datasets D1 D2 D3 D2 Transformation: (SPJ) Staleness SLA D3 Transform Sharing (S2): … D1 Sharing (Sn): …

  4. Three ways of Enabling Sharing Web Service Service provider’s cost inkeeping shared space consistent What requirements materialization satisfies? App Alice Data App Bob Data API App Alice Data App Bob Data SMILE App Alice Data App Bob Data Materialized Shared Space SQL SQL SQL Sharing via API Sharing using a Materialized Shared Space (i.e., view) Direct Sharing

  5. Sharing ExampleSimple Sharing Scenario Sharing (S1): Sources: SP, UP Transformation: ps(SP ✖ UP) Staleness: <= 5 Seconds SP UP ps(SP ✖ UP) < 5 seconds SP ✖ UP SP = Stock Price UP = User Portfolio

  6. Sharing Example (Contd.) UP UP UP SP SP SP SP SP DISTRIBUTED JOIN JOIN JOIN COPY COPY COPY COPY COPY SP ✖ UP SP UP SP ✖ UP SP ✖ UP $$$, 1 second staleness $, 10 second staleness $$, 3 second staleness

  7. Problem Formulation • Given n sharings S: • S = {S1Sn} • Each sharing specifies a staleness requirement in seconds • e.g., 5 seconds • Datasets are relations in RDBMS • Updated asynchronously (i.e., independently) • Goal: Enable all sharing such that • Using MVs that are always consistent • All MVs under the staleness SLA • At the cheapest cost for service provider

  8. SMILE System Architecture Postgres 1 Sharing Plan Postgresql Database Gateway Postgres 4 Postgres 3 R ¢R Copy Delta Postgres 2 Capture Delta ¢R Updates LOG R SMILE Input Sharings Sharing Plan optimizer

  9. Sharing Plan Optimizer • R*-style optimizer • Varies join ordering and operator placement • Using a dynamic programming formulation • Uses four operators to express SPJ transformations in sharings • DeltaToRel, Join, Union, CopyDelta • Two cost models: • Dollar Cost of a plan • Time Cost of a plan

  10. A ΔA DETATOREL DETATOREL B ΔB ΔB COPYDELTA ΔA COPYDELTA JOIN JOIN Δ(A⋈ΔB) Δ(ΔA⋈B) Machine m2 Machine m1 COPYDELTA COPYDELTA Δ(A⋈ΔB) Δ(ΔA⋈B) UNION Δ(A⋈B) A⋈B Sharing Plan DETATOREL Machine m3

  11. Cost Models: Dollar and Time Dollar cost is expense to provider to execute thesharing plan, in $/second • Use Amazon EC2 pricing Time cost is critical data path time in seconds • Using a synthetic time model for each operatortype $ staleness

  12. Time Cost Model We use a simple linear cost model to estimate the time taken by each operator CopyDelta DeltaToRel Join Union

  13. Generating Global Sharing Plan • Input: Set of n sharings • Step 1: For each sharing generate a sharing plan so that: • Plan is admissible • Means that its critical time path is less than the Staleness SLA • Generate two plans • DPD: Cheapest Dollar Cost Plan • DPT: Smallest Critical Time Path Plan • Discard if not admissible but choose DPD is both admissible • Step 2: Make cheaper by merging commonalities with other sharing plans in the style of Multi-query optimization • We call merging operation as ``plumbing’’

  14. Plumbing Operation Remove Remove SRC (pi) pi pi • DST(pi) • SRC(pi) DST(pi) COPY DELTA JOIN Plumbing increases the critical time path of the left plan, so valid as long as left plan is still under its staleness SLA Perform plumbing in a greedy fashion one at a time starting with the one resulting in most cost savings

  15. SMILE System Architecture Postgres 1 Sharing Plan Postgresql Database Gateway Postgres 4 Postgres 3 R ¢R Copy Delta Postgres 2 Capture Delta ¢R Updates Heartbeat Agent Agent LOG R Push Agent Agent Pub/Sub SMILE Sharings Sharing Executor Sharing Plan optimizer

  16. Sharing Executor • Accounts for runtime variations in the system • Change in the input update rate • Machine or resource contention or unavailability • Basically obtains current timestamp of vertices and issues “push” operation • Push operation specifies how much to “synchronously” advance the timestamp of each vertex in the sharing plan • Tries to combine work as much as possible • Uses a feedback loop to automatically account for runtime variations

  17. Staleness and Push - TS(DEST) MAX_TS(SRCS) • Current STALENESS = • PUSH: How much to advance TS(DEST)? • Cannot be more than MIN_TS(SRCS) – TS(DEST) • Look at the paper for a sharing executor that is lazy by design and refreshes MVs just as it is about miss the staleness SLA

  18. Experiments • Twitter GardenHose Stream • 6 machines • One machine generates updates and hosts base relations • 5 machines for hosting sharing plan operators • Rate: 50—10k tweets/sec • Sharings: 5—50 sharings • SLA: 10—60 seconds

  19. Base relations • Unpack incoming Tweets into 9 base relations

  20. Sharing Arrangements • 25 sharing arrangements as SPJ transformations on base relations

  21. Sharing Plan 25Sharings, 6 machines

  22. Staleness: 10k tweets/sec, 25 Sharings

  23. Why Some Sharings have a large gap?

  24. Tuples Moved across Sharing Plan

  25. Staleness before vs. after push

  26. For Varying Update Rates • SLA violation is low even for large update rates

  27. Actual Running Cost

  28. Related Work • View Maintenance • View Selection • Cache Placement • Data Quality/Staleness • Data Integration • Distributed Databases • Multi-query optimization • Other data sharing effort

  29. View Maintenance • When sources not always at a consistent snapshot • Need to use compensation [Zhuge et al., SIGMOD 1995] • Rolling join [Salem et al., SIGMOD 2000] • Shows how to compose n-way asynchronous propagation queries • Sharing plan is based on this work • How to reduce maintenance cost? • Merge common sub-expressions in the update mechanism of different MV’s to reduce cost [Ross et al., SIGMOD 1996][Mistry et al., SIGMOD 2001] • Staleness in data warehouse setup: • Labrinidis et al., UMD CS TR, 1998]

  30. Summary • SMILE is a declarative data sharing platform in the cloud • Sharings can specify a transformation and a staleness SLA • SMILE uses both static and runtime optimizations • Experimental results show that it can handle high update rates and large number of sharings

More Related