1 / 52

Achieving 100,000 Transactions Per Second with a NoSQL Database

Achieving 100,000 Transactions Per Second with a NoSQL Database. Eric David Bloch @ eedeebee 19 jun 2012. A bit about me. I’ve written software used by millions of people. Apps, libraries, compilers, device drivers, operating systems This is my second QCon and my first talk

cwen
Download Presentation

Achieving 100,000 Transactions Per Second with a NoSQL Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Achieving 100,000 Transactions Per Second with a NoSQL Database Eric David Bloch @eedeebee 19 jun 2012

  2. A bit about me • I’ve written software used by millions of people. • Apps, libraries, compilers, device drivers, operating systems • This is my second QCon and my first talk • I’m the Community Director at MarkLogic, last 2 years. • Born here in NY in 1965; now in CA • I survived having 3 kids in less than 2 years.

  3. Me, 1967

  4. Me, today

  5. Musical Form for the Talk • A: Why? • B: How? • Whirl-wind database architecture tour • Melody from part A again • Techniques to get to 100K

  6. It’s about money. • Top 5 bank needed to manage trades • Trades look more like documents than tables • Schemas for trades change all the time • Transactions • Scale and velocity (“Big Data”)

  7. Trades and Positions • 1 million trades per day • Followed by 1 million position reports at end of day • Roll up trades of current date for each “book, instrument” pair • Group-by, with key = “date, book, instrument” 1M positions 1M trades Day 1 Day 3 Day 4 Day 5 . . . Day 2

  8. Trades and Positions <trade> <quantity>8540882</quantity> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> <position> <instrument>EAAFX</instrument> <book>679</book> <quantity>3</quantity> <business-date>2011-03-25Z</business-date> <position-date>2011-03-24Z</position-date> </position>

  9. Now show us • 15K inserts per second • Linear scalability

  10. Requirements NoSQL flexibility, performance & scale Enterprise-grade transactional guarantees

  11. ? What NoSQL, OldSQL, or NewSQL database out there can we use?

  12. Your database has a large effect on how you see the world

  13. What is MarkLogic? • Non-relational, document-oriented, distributed database • Shared nothing clustering, linear scale out • Multi-Version Concurrency Control (MVCC) • Transactions (ACID) • Search Engine • Web scale (Big Data) • Inverted indexes (term lists) • Real-time updates • Compose-able queries

  14. Architecture • Data model • Indexing • Clustering • Query execution • Transactions Question What does data model have to do with scalability?

  15. Inverted Index “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . Document References “each” 123, 130, 131, 135, 162, 177 . . . “uniquely identify” 126, 130, 167, 212, 219, 377 . . . <article> . . . 126, 130, 167, 212, 219, 377 . . . article/abstract/@author . . . <product>IMS</product>

  16. Range Index <trade> <trader_id>8</trader_id> <time>2012-02-20T14:00:00</time> <instrument>IBM</instrument> … </trade> <trade> <trader_id>13</trader_id> <time>2012-02-20T14:30:00</time> <instrument>AAPL</instrument> … </trade> Rows <trade> <trader_id>0</trader_id> <time>2012-02-20T15:30:00</time> <instrument>GOOG</instrument> … </trade> • Column Oriented • Memory Mapped

  17. Shared-Nothing Clustering Host E1 Host D1 Host D2 Host D3 Host Dj …

  18. Query Evaluation – “Map” Q Host E1 Host E3 Host Ei Host E2 … Host D1 Host D2 Host D3 Host Dj … Q Q Q Q F1 F1 F1 F1 Fn Fn Fn Fn … … … …

  19. Query Evaluation – “Reduce” Top 10 Host E1 Host E3 Host Ei Host E2 … Host D1 Host D2 Host D3 Host Dj … Top 10 Top 10 Top 10 Top 10 F1 F1 F1 F1 Fn Fn Fn Fn … … … …

  20. Queries/Updates with MVCC • Every query has a timestamp • Documents do not change • Reads are lock-free • Inserts – see next slide • Deletes – mark as deleted • Edits – • copy • edit • insert the copy • mark the original as deleted

  21. Insert Mechanics • New URI+Document arrive at E-node • URI Probe – determine whether URI exists in any forest • URI Lock – write locks taken on D node(s) • Forest Assignment – URI is deterministically placed in Forest • Indexing • Journaling • Commit – transaction complete • Release URI Locks – D node(s) are notified to release lock

  22. Save-and-merge (Log Structured Tree Merge) Database Insert/Update Forest 1 Forest 2 S0 Memory Save J Disk S1 S2 S3 Journaled S1 S2 S3 Merge

  23. Back to the money

  24. Trades and Positions • 1 million trades per day • followed by 1 million position reports at end of day • Roll up trades of the current “date” for each “book:instrument” pair • Group-by, with key = “book:date:instrument” 1M positions 1M trades Day 1 Day 3 Day 4 Day 5 . . . Day 2

  25. Naive Query Pseudocode for each book for each instrument in that book position = position(yesterday, book, instrument) for each trade of that instrument in this book position += trade(today, book, instrument).quantity insert(today, book, instrument, position)

  26. Initial results Single node – 19,000 inserts per second

  27. Initial results with cluster

  28. Techniques to get to 100K • Insert Query for Computing New Positions • Materialized compound key, Co-Occurrence Query and Aggregation • Insert of New Positions • Batching • Optimized insert mechanics

  29. Materializing a compound key <trade> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade>

  30. Materializing a compound key <trade> <roll-upbook-date-instrument=“151333445566782303”/> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade>

  31. Co-Occurrence and Distributed Aggregation <trade> <roll-up book-date-instrument=“151373445566703”/> <quantity>8540882</quantity> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> • Co-occurrences: • Find pairings of range indexed values • Aggregate on the D nodes (Map/Reduce): • Sum up the quantitiesabove • Similar to a Group-by, • in a column-oriented, in-memory database

  32. Initial results with new query > 30K inserts/second on a single node

  33. Co-Occurrence + Aggregate versus Naïve approach

  34. Techniques • Computing Positions • Materialized compound key, Co-Occurrence Query and Aggregation • Updates • Batching • Optimized insert mechanics

  35. Transaction Size and Throughput

  36. Techniques • Computing Positions • Materialized compound key, Co-Occurrence Query and Aggregation • Updates (Transaction) • Batching • Optimized insert mechanics

  37. Insert Mechanics, Again • New URI+Document arrive at E-node • *URI Probe – determine whether URI exists in any forest • *URI Lock – write locks are created • Forest Assignment – URI is deterministically placed in Forest • Indexing • Journaling • Commit – transaction complete • *Release URI Locks – D nodes are notified to release lock * Overhead of these operations increases with cluster size • x

  38. Deterministic Placement 4) Forest Assignment – URI is deterministically placed in Forest Hash function 64-bit number Bucketed into a Forest Fi URI • Done in C++ within server • But… • Can also be done in the client • Server allows queries to be evaluated against only one forest…

  39. Optimized Insert Q E D1 D2 Q F3 F1 F4 F2

  40. Optimized Insert Mechanics • New URI+Document arrive at E-node • Compute Fi using • Ask server to evaluated the insert query only against Fi • URI Probe – Fi Only • URI Lock – Fi Only • Forest Assignment – Fi Only • Indexing • Journaling • Commit – transaction complete • Lock Release - Fi Only

  41. Regular Insert Vs. Optimized Insert In-Forest Eval

  42. Tada! Achieving 100K Updates In-Forest Eval

  43. We weren’t content with 15K So we showed them… 100,000 INSERTS PER SECOND A quote from the bank “We threw everything we had at MarkLogic and it didn’t break a sweat”

More Related