1 / 26

The Analytic DBMS Market(s) New opportunities with new technology by Curt A. Monash, Ph.D. President, Monash Research Ed

The Analytic DBMS Market(s) New opportunities with new technology by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @ monash.com http://www.monash.com http://www.DBMS2.com. Curt Monash. Analyst since 1981 Covered DBMS since the pre-relational days

fausta
Download Presentation

The Analytic DBMS Market(s) New opportunities with new technology by Curt A. Monash, Ph.D. President, Monash Research Ed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Analytic DBMS Market(s) New opportunities with new technology by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

  2. Curt Monash Analyst since 1981 Covered DBMS since the pre-relational days Also analytics, search, etc. Own firm since 1987 Publicly available research Blogs, including DBMS2 (www.DBMS2.com -- the source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com User and vendor consulting

  3. Our agenda • Why there are specialty analytic DBMS • It’s not just the analytic area • Hardware issues • Tips for choosing among them • Segments and priorities • The selection process

  4. Database diversity • High-end e-commerce • 100-terabyte analytics • High-volume call center • Media-heavy web startup • Simple departmental application • (and many more)

  5. 11 kinds of data management software • High-end OLTP/general-purpose DBMS • Mid-range OLTP/general-purpose DBMS • Row-based analytic RDBMS • Column- or array-based analytic RDBMS • Text search engines • XML and OO DBMS (but these may merge with search) • RDF and other graphical DBMS (but these may merge with relational) • Event/stream processing engines (aka CEP) • Embedded DBMS for devices • Sub-DBMS file managers (e.g. SimpleDB, some MySQL uses) • Science DBMS

  6. Why are there specialized analytic DBMS? • General-purpose database managers are optimized for updating short rows … • … not for analytic query performance • 10-100X price/performance differences are not uncommon At issue is the interplay between storage, processors, and RAM

  7. Moore’s Law, Kryder’s Law, and a huge exception Growth factors: • Transistors/chip: >100,000 since 1971 • Disk density: >100,000,000 since 1956 • Disk speed: 12.5 since 1956 The disk speed barrier dominates everything!

  8. The “1,000,000:1” disk-speed barrier • RAM access times ~5-7.5 nanoseconds • CPU clock speed <1 nanosecond • Interprocessor communication can be ~1,000X slower than on-chip • Disk seek times ~2.5-3 milliseconds • Limit = ½ rotation • i.e., 1/30,000 minutes • i.e., 1/500 seconds = 2 ms Tiering brings it closer to ~1,000:1 in practice, but even so the difference is VERY BIG

  9. Hardware strategies to optimize analytic I/O • Lots of RAM • Parallel disk access!!! • Lots of networking Tuned MPP (Massively Parallel Processing) is the key

  10. Software strategies to optimize analytic I/O • Minimize data returned • Classic query optimization • Minimize index accesses • Page size • Precalculate results • Materialized views • OLAP cubes • Return data sequentially • Store data in columns • Stash data in RAM

  11. 16 contenders • Aster Data • Dataupia • Exasol • Greenplum • HP Neoview • IBM DB2 BCUs • Infobright • Kickfire • Kognitio • Microsoft Madison • Netezza • Oracle Exadata • ParAccel • Sybase IQ • Teradata • Vertica

  12. Varied approaches • 3 are trying to meld OLTP and analytic processing • 2 have very specialized hardware • 1 is purely RAM-centric • Several use Infiniband; several stress gigE switches • 6 are columnar • 2 stress cloud/DaaS

  13. Segmentation made simple • One database to rule them all • One analytic database to rule them all • Frontline analytic database • Very, very big analytic database • Big analytic database handled very cost-effectively

  14. 7 more precise segmentation issues • What is your tolerance for specialized hardware? • What is your tolerance for set-up effort? • What is your tolerance for ongoing administrative burden? • What are your insert and update requirements? • At what volumes will you run fairly simple queries? • What are your complex queries like? and, most important, • Are you madly in love with your current DBMS?

  15. Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts Specialized hardware

  16. Hardware acquisition and installation Database and index design Data cleaning and integration Porting of existing applications Set-up effort

  17. Part of the set-up effort also translates to an ongoing administrative burden Indexes, materialized views, cubes, etc. … … unless the DBMS architecture minimizes their use Ongoing administration

  18. Finally we get to the performance criteria Batch load ELT (or ETLT) vs. pure ETL Mini-batches or trickle feeds True transactional updates Inserts and updates

  19. Major use cases Traditional BI Customer-facing apps Product maturity is often key Concurrent queries

  20. This is where the glamour is MPP to speed up I/O Clever answers to the data redistribution problem Table scans vs. random access Columns vs. rows Aggressive use of RAM Compression (saving on disk cost isn’t the point) … and fast analytics even beyond the queries Complex queries

  21. The analytic DBMS selection process • Figure out what you’re trying to buy • Make a short list • Do free POCs • Evaluate and decide

  22. Figure out what you’re trying to buy • Inventory your use cases • Current • Known future • Wish-list/dream-list future • Set constraints • People and platforms • Money • Establish target SLAs • Must-haves • Nice-to-haves

  23. Short list basics • You might as well consider the incumbent(s) • Cash cost is an easy filter to apply • What is the crux of the deployment effort? • References can be scarce

  24. Free POCs are a great invention • Most of the effort is in the set-up • The better you match your use cases, the more reliable the POC is • You might as well do POCs for several vendors – at (almost) the same time! • Where is the POC being held? Can you plan this yourself, or do you need outside help?

  25. Evaluate and decide It all comes down to • Cost • Speed • Risk and in some cases • Time to value • Upside

  26. Further information Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

More Related