1 / 56

Where’s my lookup table ? Modeling relational data in a denormalized world. Rick Houlihan

Where’s my lookup table ? Modeling relational data in a denormalized world. Rick Houlihan Principal Technologist, NoSQL - AWS. What are we talking about?. A Brief History of Data P rocessing – Why NoSQL ? The Great Migration – Amazon’s path to database freedom

leeh
Download Presentation

Where’s my lookup table ? Modeling relational data in a denormalized world. Rick Houlihan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where’s my lookup table? Modelingrelational data in a denormalized world. Rick Houlihan Principal Technologist, NoSQL - AWS

  2. What are we talking about? A Brief History of Data Processing – Why NoSQL? The Great Migration – Amazon’s path to database freedom Making it Work - Normalized versus denormalized data modeling Document, Wide Column, Key-Value –Why it’s all the same Why Cloud Native NoSQL is the best choice

  3. History of Data Processing “History repeats itself because nobody was listening the first time” - Anonymous

  4. Timeline of Database Technology

  5. Technology Adoption and the Hype Curve

  6. Amazon’s Great Migration “We all know about certain problems we wish we didn't have.” – Dean Kamen

  7. 20 years of code 3,000 Oracle Instances Over 12,000 services ~25,000 developers 54.5MM TPS Peak (Prime Day 2019)

  8. Why make the move? RDBMS was breaking at scale • Required rollups and denormalized data to avoid complex queries Impact of bad deployments • One bad stored procedure would break a dozen services NoSQL databases are “flexible” • Schemaless design allows the app to evolve over time Cost of RDBMS was too high • CPU is expensive, storage is cheap

  9. Sizing the Workloads • Problems with limited scope are easier to solve • “I need a system to track employee vacation time.” • Unbounded problems are harder to solve • “I need a root cause analysis engine to correlate transaction level events to buying patterns across global markets.”

  10. Sizing the Database

  11. Scaling Relational DBs

  12. Sharded Relational DBs? ? C D A B

  13. Denormalize and shard to provide horizontal scale Near unbounded throughput and storage NoSQL Databases Collection 1 1 TB Shard B Shard A 500 GB 500 GB

  14. Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale Partition Keys in NoSQL Id = 1 Name = Jim Id = 2 Name = Andy Dept = Eng Id = 3 Name = Kim Dept = Ops Hash (1) = 7B Hash (2) = 48 Hash (3) = CD 00 54 55 A9 AA FF Key Space 00 FF

  15. Online Transaction Processing (OLTP) Most common type of app Online Analytics Processing (OLAP) BI and ad-hoc data projections Decision Support Systems (DSS) Long running query aggregations and projections Types of Database Workloads Operations Analytics

  16. Why NoSQL? SQL NoSQL

  17. NoSQLData Modeling “A ship in port is safe, but that’s not what ships were built for.” - Grace Hopper

  18. It’s all about relationships… Social network Document management Process control IT monitoring Data trees

  19. The NoSQL Model • Table/Collection • Items • Attributes • PartitionKey • Sort Key All items for key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values Mandatory Key-value access pattern Determines data distribution Optional Model 1:N and N:N relationships Enables rich query capabilities

  20. SQL vs. NoSQL design pattern SELECT * FROM Products INNER JOIN Books SELECT * FROM PRODUCTS SELECT * FROM Products INNER JOIN Albums INNER JOIN Tracks SELECT * FROM Products INNER JOIN Videos INNER JOIN ActorVideo INNER JOIN Actors

  21. Ad hoc “Joins” in SQL SELECT * FROM PRODUCTS INNER JOIN BOOKS ON productId = productId WHERE name = “Book Title” SELECT * FROM PRODUCTS INNER JOIN ALBUMS ON productId = productId INNER JOIN TRACKS ON albumId= albumId WHERE name = “Album Title” SELECT * FROM PRODUCTS INNER JOIN VIDEOS ON productId = productId INNER JOIN ACTORVIDEO ON videoId = videoId INNER JOIN ACTORS ON actorId = actorId WHERE name = “Movie Title”

  22. Modeled “Joins” in NoSQL SELECT * WHERE PK=“Book Title” SELECT * WHERE PK=“Album Title” SELECT * WHERE PK=“Movie Title”

  23. Modeled “Joins” in NoSQL Swap PK and SK on index SELECT * WHERE PK=“Author Name” SELECT * WHERE PK=“Song Title” SELECT * WHERE PK=“Actor Name” SELECT * WHERE PK=“Director Name” SELECT * WHERE PK=“Musician”

  24. Document vs. Wide Column Data Modeling { Default “_id” Index supports K/V access patterns, e.g “Get employee data by email”, etc. _id: “jsmith@abc.com”, firstName: “John”, lastName: “Smith”, address: “123 A Street”, city: “Seattle”, state: “WA”, Secondary index on “desk” supports sub-tree aggregations, e.g. StartsWith(“SEA58”) desk: “SEA58-07.650.O1” }

  25. Document vs. Wide Column { _id: “jsmith@abc.com”, firstName: “John”, lastName: “Smith”, address: “123 A Street”, city: “Seattle”, state: “WA”, desk: “SEA58-07.650.O1” }

  26. Indexing Efficiently in NoSQL

  27. NoSQL Data Modeling - KEY CONCEPTS • Selecting a Partition/Shard Key • Large number of distinct values • Items are uniformly requestedand randomly distributed • Creating Compound Indexes • Model 1:n and n:nrelationships • Efficient/selective patterns • Query multiple entities • Leverage range queries • Examples: • Bad: Status, Gender • Good: CustomerId, DeviceId • Examples: • Orders.OrderItem • Hierarchical relationships

  28. TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review

  29. TENETS OF NoSQL DATA MODELING • Nature of the application • OLTP / OLAP / DSS • Define the Entity-Relationship Model • Identify Data Life Cycle • TTL, Backup/Archival, etc. • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review

  30. TENETS OF NoSQL DATA MODELING • Identify data sources • Define queries and write patterns • Document all workflows • Understand the use case • Define the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review

  31. TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review 1 application service = 1 table • Reduce round trips • Simplify access patterns Identify Primary Keys • How will items be inserted and read? • Overload items into partitions Define indexes for secondary access patterns

  32. TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat-> Review

  33. Complex Queries “Computers are useless. They can only give you answers.” - Pablo Picasso

  34. DynamoDB Streams and AWS Lambda

  35. Triggers Item/table level metrics • Amazon CloudSearch Notify change Kinesis Firehose Lambda function

  36. Advanced Data Modeling “It turns out it's important to build a product and not just a bunch of data models.” - Hilary Mason

  37. Managing Relational Transactions • Configuration Management Service • Resolver Groups • Contacts • Configuration Items • Transactional Workflows • Add Config Items to Resolver Groups • Update Config Item status • Add Contacts to Resolver Groups

  38. DynamoDB Transactions API TransactWriteItems • Synchronous update, put, delete, and check • Atomic • Automated Rollbacks • Up to 10 items within a transaction • Supports multiple tables • Complex conditional checks Good Use Cases • Commit changes across items • Conditional batch inserts/updates • Bad Use Case • Maintaining normalized data

  39. DynamoDB Table Schema De-normalized Contacts Contact Partitions Configuration Items Resolver Partitions Resolver Metadata

  40. Reverse Lookup GSI

  41. Complex Relational Data “Dude, where’s my lookup table?” - Anonymous Amazon SDE

  42. Modeling Complex Relationships

  43. The Table Schema

  44. The Index Schema (GSI1)

  45. The Index Schema (GSI2)

  46. The Index Schema (GSI3)

  47. The Final Result

  48. Optimize for Common Patterns “If you optimize everything, you will always be unhappy.” - Donald Knuth

  49. Access Patterns Matter • Insurance quote service • Store all quote versions • 200+ attributes per quote • 50KB Average record size • 800 quotes per minute peak • 1K WCU provisioned

  50. Optimized for Writes • Version category items as they are updated • Send all versions when queried • Process with client side logic • 20 WCU Provisioned

More Related