1 / 24

ScaleDB: Persistence for Stream Data

ScaleDB: Persistence for Stream Data. ScaleDB: Big Fast Data w/ MariaDB. In-Memory SAP HANA BigQuery. High-Velocity / Disk. ScaleDB. Data Velocity (Driven by Performance). Disk MariaDB , Oracle, SQL Server, etc. Disk Hadoop. Data Volume (Driven by Cost – DRAM vs. Disk). Demo.

errin
Download Presentation

ScaleDB: Persistence for Stream Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ScaleDB: Persistence for Stream Data

  2. ScaleDB: Big Fast Data w/MariaDB In-Memory SAP HANA BigQuery • High-Velocity / Disk • ScaleDB Data Velocity (Driven by Performance) Disk MariaDB, Oracle,SQL Server, etc. Disk Hadoop Data Volume (Driven by Cost – DRAM vs. Disk)

  3. Demo • Payment Table • P.K. * FK: Account, Time, * Fields: Store, Amount, Coupon • Inserts • Lookup by Primary Key • Lookup by Account (Foreign Key) • Complex queries - BI & analytics

  4. Demo

  5. ScaleDB’s Solution • 1M Inserts/Second (indexed) with Simultaneous Queries • Commodity “Cloud” Instance Total: 6 Nodes, 48 cores, 0.2TB main memory • ~1M inserts/second, cost is less than $15,000 • SAP HANA (In memory DBMS) • Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory • “1.5M inserts/second” (Vishal Sikka, SAP TechED) • In Memory: DRAM cost alone is ~ $2M More Than 2 Orders of Magnitude Cost Advantage

  6. Data Volumes are Exploding Tweets per Day iPhone Downloads AWS S3 & Dropbox Data Objects …Driven by new data sources and data types Devices Social Log Files Analytics Business

  7. Faster Insights = More Value (Complements Kinesis, Storm, etc.) Twitter Storm Response Latency 0 ms Milliseconds to minutes Later. Possibly much later Lower Value of the Data to Users/Advertisers Higher

  8. Big Data Fast Data Twitter Storm • Real-Time Data • Ad Hoc (SQL) Processing • ScaleDB & Stream Processors • Pools of Data at Rest • Batch (programmatic) Processing • Hadoop MillWheel BigQuery

  9. Hadoop’s Batch Processing • “…MapReducetechnologies are good at handling large volumes of data. But they are fundamentally batch-based, and struggle with enabling real-time decisions on a never-ending—and never fully complete—stream of data.” • Terry Hanold • Vice President of New Business Initiatives • Amazon AWS

  10. Fast Data: The Car Metaphor Limited View / Real-Time Data No Historical View Historical View “Batch Lag” Real-Time Data Historical View SQL Support

  11. DRAM Too Expensive for Stream Data • $20,000 • $200,000 • $2,000,000 • $20,000,000 • Disk • $43 • $430 • $4,300 • $43,000 Media Costs Based upon Data Volume (DRAM vs. Disk) This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis • 1M inserts/second (100 byte rows), 24 hours = >8.5 TB/Day • Disk Media Cost = ~ $370 • DRAM Media Cost = ~ $172,800 (>450X more)

  12. But Data Volumes Increase 78% CAGR According to IDC1 and Gartner2 data volumes have been measured to increase ten-fold every five years. 1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. Tech. An IDC White Paper 2. Paquet, Raymond. “Technology Trends You Can’t Afford to Ignore.” Lecture. Gartner Webinar. Gartner.com. Gartner Inc., Jan. 2010.

  13. In-Memory & Big Data Data Volume Growth Dramatically Outpaces DRAM Affordability Increase Multiplier (Volume/Affordability) Increase Multiplier (Volume/Affordability) Years Years

  14. ScaleDB: Big Fast Data w/MariaDB 1,000,000 Inserts per second In-Memory SAP HANA BigQuery • High-Velocity / Disk BigQuery Cost: $86,400/day ScaleDB Cost*: $46/day * AWS: $28 for 8.4TB storage, $18 for 6 instances of heavy usage EBS optimized • ScaleDB Data Velocity (Driven by Performance) Disk MariaDB, Oracle,SQL Server, etc. Disk Hadoop Data Volume (Driven by Cost – DRAM vs. Disk)

  15. How it Works

  16. Scaling the Database MariaDB DBMS Instance MariaDB MyIsam InnoDB Data • ScaleDB Storage Instance Storage • ScaleDB

  17. Scaling the Database Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager Storage Instance Storage Instance

  18. Scaling the Storage Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance

  19. High-Availability DBMS Instance DBMS Instance DBMS Instance DBMS Instance ClusterManager • MirroredVolumes Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance

  20. NoSQL v. MySQL

  21. Push-Down: Distributed Parallel Processing Query Query Query Query Push Processing to the Data Result: High-PerformanceParallel Processing Similar to Map/Reduce Response Response Response Response MariaDB • ScaleDB ScaleDB Storage ScaleDB Storage ScaleDB Storage

  22. Customer Success Story

  23. Customer Success Story: Statricks Target: 300M-450M Listings per Day From: eBay, Craigslist …. • Processing: • Price trends • Listing Longevity • Spam Detection • Ad Metrics • Price Trend Time Series • Statistical Analysis

  24. Thank You

More Related