1 / 41

The Art of Database Sharding

The Art of Database Sharding. Maxym Kharchenko Amazon.com. April 22-26, 2012 Mandalay Bay Convention Center Las Vegas, Nevada, USA. www.collaborate12.org www.collaborate12.ioug.org. When your data grows …. New System. Problem. Old System. The Big Data problem.

zita
Download Presentation

The Art of Database Sharding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Artof Database Sharding Maxym Kharchenko Amazon.com

  2. April 22-26, 2012Mandalay Bay Convention CenterLas Vegas, Nevada, USA www.collaborate12.org www.collaborate12.ioug.org

  3. When your data grows … New System Problem Old System

  4. The Big Data problem One machine is not enough

  5. Vertical Scaling

  6. Scaling Up …

  7. Scaling Up …

  8. Scaled!

  9. What you getwhen you scale up 2+2=5

  10. What you getwhen you scale up 2+2=3

  11. Scale out, not up

  12. Running on >1 machines 10,000,000 1 Courtesy: John Rauser @amazon.com

  13. Distributed computing is hard

  14. Distributed System

  15. Sharded System

  16. Sharding is (relatively) easy

  17. Split your datainto small independent chunks And run each chunkon cheap commodity hardware

  18. How to split your data Data Data Data Data Data

  19. How to split your data

  20. How to split your data

  21. How to split your data

  22. How to split your data

  23. Step 1: Split off different things

  24. Vertical Partitioning

  25. Vertical Partitioning

  26. Vertical Partitioning

  27. Step 2: Chose sharding key and function

  28. Sharding

  29. Bad Sharding Can we partition collaborate participants by last name ? CREATE TABLE Collaborate_Participants(last_namevarchar2(30) PRIMARY KEY,signup_datedate)

  30. Avalanche Effect i.e. MD5 

  31. Step 3: Make enough shards

  32. Hashes and Buckets MOD MOD MOD

  33. Resharding 75 % bad 3 shards • Adding 4th shard

  34. Logical Shards MOD MOD MOD MOD

  35. Implementing Shards: Standbys Apps Read Only Unsharded Shard 1 Standby Shard 2

  36. Implementing Shards: Tables Apps Read Only Create materialized view … as select … from a@shard1 Drop materialized view … preserve table Shard 2 Shard1 MVA TabA TabA

  37. Why shards are awesome • Small data, small load • Better caching, faster queries • Smaller load, fewer surprises • Faster maintenance, i.e. restores • Eggs not in one basket: • Availability redefined • Safer maintenance • Multiple points of view: • SQL performance • System load

  38. Why shards are NOT so great • More systems • Power, rack space etc • Needs automation … bad • More likely to fail overall • Some operations become impractical: • Joins across shards • Foreign keys across shards • More work: • Applications, developers, DBAs • High skill, DIY everything

  39. Thank you

  40. Implementing Shards:Moving “data head” Apps Shard 1 Shard 2 Shard 3 Shard 4

  41. Bad Sharding. Example 2 Can we shard customers by meaningless sequence ? CREATE TABLE Orders (order_id number PRIMARY KEY,customer_fname varchar2(30),customer_lname varchar2(30),order_date date) order_id: 10000 - 20000 order_id: 20001 - 30000 order_id: 30001 - 40000 order_id: 40001 - 50000

More Related