1 / 52

The Art of Database Sharding

The Art of Database Sharding. Maxym Kharchenko Amazon.com. Whoami. Started as a database kernel developer Network database: db_VISTA ORACLE DBA for ~ 10-12 years Starting with ORACLE 8 Last 3 years: Sr. Persistence Engineer @ Amazon.com OCM, ORACLE Ace Associate

Download Presentation

The Art of Database Sharding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Artof Database Sharding Maxym Kharchenko Amazon.com

  2. Whoami • Started as a database kernel developer • Network database: db_VISTA • ORACLE DBA for ~ 10-12 years • Starting with ORACLE 8 • Last 3 years: Sr. Persistence Engineer @Amazon.com • OCM, ORACLE Ace Associate • Blog: http://intermediatesql.com • Twitter: @maxymkh

  3. Agenda • The “big data” scaling problem • Solving scaling with “sharding” • Practical sharding • Your sharding experience: Good and bad

  4. How to scale a database New System Problem Old System 2013 2014 2015 2016 2017

  5. The Big Data problem

  6. Vertical Scaling

  7. Scaling Up …

  8. Scaling Up …

  9. Scaled!

  10. “Scaling up” math:System capabilities 2+2=3

  11. “Scaling up” math:System cost 2+2=7

  12. Scale out, not up

  13. Use lots of cheap machines Not bigger machines

  14. Commodity hardware = $$$$$ $$

  15. Distributed System

  16. Distributed System

  17. Distributed System

  18. Distributed computing is hard

  19. Shared Nothing (“Sharded”) System

  20. Sharding is (relatively) easy

  21. Split your datainto small independent chunks And run each chunkon cheap commodity hardware

  22. How to split your data Data

  23. How to split your data

  24. How to split your data

  25. How to split your data

  26. How to split your data

  27. Vertical Partitioning

  28. Vertical Partitioning

  29. Vertical Partitioning

  30. Horizontal Partitioning

  31. Sharding

  32. Sharding CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) );

  33. Sharding CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200) ) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES );

  34. Split size evenly SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly ); H-M A-G N-T U-Z

  35. Split load evenly SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 1-100 101-200 201-300 301-400

  36. Split load evenly SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly ); 0 1 2 3

  37. Discourage cross shard access SHARD BY HASH (id)( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE id = 34567876;

  38. Discourage cross shard access SHARD BY HASH (id)( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;

  39. Discourage cross shard access SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS ); SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title; 0 1 2 3

  40. Discourage data move SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE ); 0 1 2 3

  41. Discourage data move SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE ); 4 5 0 1 2 3

  42. Resharding

  43. Physical and Logical shards SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE ); DB 1 DB 2 DB 3 DB 4

  44. Executing queries defshard_query(sql, binds, shard_key): """ Execute query in the correct db """ shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds) SELECT title FROM books WHERE author = 'Isaac Asimov' ORDER BY title;

  45. Implementing Shards: Standbys Apps Read Only Unsharded Shard 1 Standby Shard 2 Drop non-qualifying data Drop non-qualifying data

  46. Implementing Shards: Tables Apps Read Only Create materialized view … as select … from a@shard1 Drop materialized view … preserve table Shard 2 Shard1 TabA MVA TabA

  47. Implementing Shards:Moving “data head” Apps Shard 1 Shard 2 Shard 3 Shard 4

  48. Data protection App App Shard 1 Shard 2 Shard 3 Shard 4 Stb1 Stb 2 Stb 3 Stb 4

  49. Why shards are awesome • (potentially) Unlimited scaling • Local ACID + relational • Better maintenance • Eggs not in one basket • “Apples to apples comparison” with other shards

  50. Why shards are NOT so great • More systems • Power, rack space etc • Needs automation … bad • More likely to fail overall • Some operations become difficult: • Transactions across shards • Foreign keys across shards • More work: • Applications, developers, DBAs • High skill, DIY everything

More Related