html5-img
1 / 30

Megastore: Providing Scalable Highly Available Storage for Interactive Services.

Megastore Scalable Highly Available Storage for Interactive Systems. Megastore: Providing Scalable Highly Available Storage for Interactive Services.

kyne
Download Presentation

Megastore: Providing Scalable Highly Available Storage for Interactive Services.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Megastore Scalable Highly Available Storage for Interactive Systems Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Baker, Chris Bond, James C Corbett, JJ Furman, AndreyKhorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, VadimYushprakh. CIDR 2011. Presented By Ajit Padukone

  2. Megastore - Agenda • Motivation • Design of Megastore • Data Model • Data Storage • Transactions and Concurrency Control • How Megastore achieves Availability and Scalability. • PAXOS. • Megastore’s approach.

  3. Megastore - Motivation • Storage requirements of today’s interactive online applications. • Scalability. • Rapid Development. • Responsiveness (Low Latency). • Durability and Consistency. • Fault Tolerant. • These requirements are in conflict !

  4. Megastore - Motivation • Available systems • Relational DBMS– Rich set of features, expressive language helps development, but difficult to scale. Eg: MySQL, PostgreSQL, MS SQL Server, Oracle RDB. • NoSQL Systems – Highly Scalable but Limited API and loose consistency models. Eg: Google’s BigTable, Apache Hadoop’sHbase, Facebook’s Cassandra. • Megastore blends the scalability of NoSQL with the convenience of traditional RDBMS.

  5. Megastore – Data Model • API Design Requirements • Predictable Runtimes rather than Expressiveness. • Reads dominate Writes. • Storing and Querying Hierarchical data is easier in BigTable. • Hence • Data is not normalized but stored in Hierarchical method. • Joins are not supported – have to be implemented in application code.

  6. Megastore – Data Model • Between abstract tuples of RDBMS and concrete row-column storage of NoSQL. • Tables are entity group root tables or child tables. • Entity Group – consists of a root entity along with all child entities. • There can be several root tables – leading to several classes of Entity Groups.

  7. Megastore – Data Model CREATE TABLE Photo { required int64 user_id; required int32 photo_id; required int64 time; required string full_url; optional string thumbnail_url; repeated string tag; } PRIMARY KEY(user_id, photo_id), IN TABLE User, ENTITY GROUP KEY(user_id) REFERENCES User; CREATE SCHEMA PhotoApp; CREATE TABLE User { required int64 user_id; required string name; } PRIMARY KEY(user_id), ENTITY GROUP ROOT;

  8. Megastore – Data Storage How is it stored in BigTable? “A Bigtable is a sparse, distributed, persistent multidimensional sorted map” !!!

  9. Megastore – Data Storage A Sorted Map { "1" : "x", "aaaaa" : "y", "aaaab" : "world", "xyz" : "hello", "zzzzz" : "woot } A Map { "zzzzz" : "woot“, "xyz" : "hello", "aaaab" : "world", "1" : "x", "aaaaa" : "y" }

  10. Megastore – Data Storage A Sorted Multidimensional Map { "1" : { "A" : "x", "B" : "z" }, "aaaaa" : { "A" : "y", "B" : "w" }, "aaaab" : { "A" : "world", "B" : "ocean" }, "xyz" : { "A" : "hello", "B" : "there” }, "zzzzz" : { "A" : "woot", "B" : "1337" } }

  11. Megastore – Data Storage "aaaab" : { "A" : "world", "B" : "ocean" }, "xyz" : { "A" : "hello", "B" : "there” }, "zzzzz" : { "A" : "woot", "B" : "1337" } } A Sorted Multidimensional Map { "1" : { "A" : "x", "B" : "z" }, "aaaaa" : { "A" : "y", "B" : "w" },

  12. Megastore – Data Storage A BigTable – Column families are static, columns are not. "aaaaa" : { "A" : { "foo" : "y", "bar" : "d" }, "B" : { "" : "w" } }, "aaaab" : { "A" : { "foo" : "world", "bar" : "domination" }, "B" : { “position" : "ocean“ } }

  13. Megastore – Data Model Example: User {user_id:101, name: ‘John’ } Photo{ user_id:101, photo_id:500, time:2009, full_url: ‘john-pic1’, tag:’vacation’, tag:’holiday’, tag:’Paris’} Photo{ user_id:101, photo_id:500, time:2010, full_url: ‘john-pic2’, tag:’office’, tag:’friends’, tag:’pub’} User{user_id:102, name: ‘Mary’ } Photo{ user_id:102, photo_id:600, time:2009, full_url: ‘mary-pic1’, tag:’office’, tag:’picnic’, tag:’Paris’} Photo{ user_id:102, photo_id:601, time:2011, full_url: ‘mary-pic2’, tag:’birthday’, tag:’friends’}

  14. Megastore – Data Storage How is it stored in BigTable? “user_id" : { “User" : { “name" : “<name>" }, }, “user_id, photo_id" : { “Photo" : { “time" : “<time>" , “full_url”: ”<url>”, “thumbnail_url”:”<thumbnail_url”>, “tag”: “<tag 1>”, “tag”: “<tag 2>”, “tag”: “<tag 2>”, … } }

  15. Megastore – Data Storage How is it stored in BigTable?

  16. Megastore – Data Storage • Indexing • Local Index – find data within Entity Group. CREATE LOCAL INDEX PhotosByTime ON Photo(user_id, time); • Global Index - spans entity groups. CREATE GLOBAL INDEX PhotosByTag ON Photo(tag) STORING (thumbnail_url); • The ‘Storing’ Clause • Faster retrieval of certain properties. • Repeated Index • Efficient alternative to child tables. • Inline Index • Useful for extracting slices of data from child tables in parent tables

  17. Megastore – Data Storage PhotosByTag How is it stored in BigTable? PhotosByTime

  18. Megastore – Data Storage Inline Indexes - How is it stored in BigTable? “user_id" : { “User" : { “name" : “<name>“, “PhotosByTime”: “<user_id>,<time1>,<user_id,>,<photo_id1>” “PhotosByTime”: “<user_id>,<time2>,<user_id,>,<photo_id2>” “PhotosByTime”: “<user_id>,<time3>,<user_id,>,<photo_id3>” “PhotosByTime”: “<user_id>,<time4>,<user_id,>,<photo_id4>” } }

  19. Megastore – Data Storage • Transactions and Concurrency Control • Each Entity Group acts as mini-db, provides ACID semantics. • Transaction management using Write Ahead Logging. • BigTablefeature – ability to store multiple data for same row/column with different timestamps. • Multiversion Concurrency using timestamps – reads and writes do not block each other. Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html

  20. Megastore – Availability / Scalability • Availability • Fault Tolerance achieved by Replication. • Fault Tolerant replication of logs. Adapted the PAXOS algorithm. • Scalability • Performance maximized by partitioning based on Entity Groups. • Transactions wihtin entity-group – single phase using PAXOS. • Transactions across entity groups – two phase using Asynchronous Message Queue • Indexes – ACID within Entity Group, Looser semantics across Entity Groups. Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html

  21. Megastore – Availability / Scalability • Replication Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011 Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html

  22. Megastore – Availability / Scalability • Operations: Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011

  23. Megastore – Replication • PAXOS Algorithm • a way to reach consensus among a group of replicas on a single value. • tolerates delayed or reordered messages and replicas that fail by stopping. • Can tolerate upto N/2 failures. • The original PAXOS algorithm is ill-suited for high-latency network links because it demands multiple rounds of communication so Megastore uses an improved version. • Use? • Databases typically use PAXOS to replicate a transaction log, where a separate instance of PAXOS is used for each position in the log. Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html

  24. Megastore – Data Storage • PAXOS Algorithm • A Master-Slave model is generally used where the Master handles all the replication of writes. • But it causes a bottleneck. Source: http://en.wikipedia.org/wiki/Paxos_(computer_science) Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html

  25. Megastore – Replication • Megastore Architecture Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011

  26. Megastore – Replication • Megastore Read Process Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011

  27. Megastore – Replication • Megastore Write Process Source: http://paprika.umw.edu/~ernie/cpsc321/10312006.html Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011

  28. Megastore Experience: • Megastore has been deployed within Google for several years; more than 100 production applications use it as their storage service • Most of the customers see extremely high levels of availability (at least five nines) despite a steady stream of machine failures, network hiccups, datacenter outages, and other faults. • Average read latencies are tens of milliseconds, depending on the amount of data, showing that most reads are local

  29. Megastore Performance Source: Megastore: Providing Scalable Highly Available Storage for Interactive Services. Jason Bakeret al.. CIDR 2011

  30. Megastore Questions?

More Related