1 / 29

Handling BigData On the Public Cloud

Handling BigData On the Public Cloud. Dealing with datasets that grow so large that they become awkward to work with using on-hand database management tools. Based on InterOp 2011 presentation by Liran Zelkha ( Liran.zelkha@scalebase.com ) Co founder of ScaleBase

eros
Download Presentation

Handling BigData On the Public Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling BigData On the Public Cloud Dealing with datasets that grow so large that they become awkward to work with using on-hand database management tools.

  2. Based on InterOp 2011 presentation by LiranZelkha (Liran.zelkha@scalebase.com) • Co founder of ScaleBase • Before that, lead Aluna – a database and architecture consulting company • Over 15 years of hands on technology experience

  3. Agenda • What is Big Data • Big Data On Public Clouds • Some solutions

  4. What is Big Data?

  5. Big Data (from wikipedia) • …are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to "spot business trends, prevent diseases, combat crime." Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data.

  6. Top 3 ways to know you have big data:

  7. Number 3: • ... you get a call from the utility company asking you not to run 'that brownout query' again. (@aristippus303 at Datawatch)

  8. Number 2: • ... it piles up so high that it disappears into the clouds (@evertlammerts - I assume pun was intended?)

  9. Number 1: • ... the SAN undergoes gravitational collapse and you get cited by OSHA for an unlicensed singularity. (@datamartist)

  10. But seriously • Its not a single number • It is a set of parameters

  11. Big Data Parameters Complexity of Analysis Big Data Velocity of Data Volume of Data http://www2.neilmcgovern.com/main.html

  12. Where do we see big data? • Everywhere • Data Warehouse • OLTP • Web 2.0 • SaaS • Billing • Fraud detection • CMS • …Family history • …Social networking

  13. Volume of data • How much data do you have? • The more, the merrier • – Better analysis • Used to be measured in 100’s of GB, then TB, now PB • But even a 300GB DB can still have Big Data problems • “If you have over 1TB of data – you have a Big Data problem”, IDC

  14. Velocity of data • How many users access the data? • How many writes occur on your data? • How much transactions does your database have? • Measured in TPS, counted by the thousands

  15. Complexity of Analysis • How complex are your queries? • An example: • SELECT * FROM ( • SELECT w.*, ROWNUM rnum FROM ( • SELECT distinct w.watcher_id from watch w • left outer join Profile p on p.watcher_id = w.watcher_id • join atom_feedaf on af.resource_id_hash = w.resource_id_hash • join atom_feed_entryafe on afe.atom_feed_id = af.atom_feed_id • where (p.LAST_ENTRY_PROCESSED_DATE is null • or p.LAST_ENTRY_PROCESSED_DATE < afe.create_date) • and (p.email_enabled_flag is null or p.email_enabled_flag != 'F') • and af.resource_id = w.resource_id • and afe.create_date <= sysdate - ? • ORDER BY w.watcher_id ASC ) w • where ROWNUM <= ? ) where rnum > ?;

  16. Big Data on Public Clouds

  17. Again from Wikipedia • – Public cloud or external cloud describes cloud computing in the traditional mainstream sense, whereby resources are dynamically provisioned on a fine-grained, self-service basis over the Internet, via web applications/web services, from an off-site third-party provider who bills on a fine- grained utility computing basis.

  18. Public Cloud Implications • Pros: • Elastic • Unlimited storage • Unlimited capacity • Cons: • Performance • Standard hardware (no appliances...)

  19. Some Solutions

  20. Column Store Database • New databases that internally store the data in columns, and not rows. • Very good for OLAP • Excellent for BigData

  21. NoSQL Database

  22. Again, from Wikipedia: • – NoSQL is the term used to designate database management systems that differ from classic relational database management systems (RDBMSes) in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage, a term that would include classic relational databases as a subset.

  23. NoSQL Database • Non-relational databases • Usually store data in memory, replicated across multiple machines • Great latency

  24. Unstructured Schema • Since SQL is not used, ERD can be dynamic • Some solutions store data as objects of any kind • Some use binary serialization of the object • Others use Map API (put, get) • Players include: Casandra, HiveDB, MemBase, MongoDB

  25. newSQL • Dubbed by 451 analyst Matthew Aslett • "NewSQL" is our shorthand for the various new scalable/high performance SQL database vendors. We have previously referred to these products as 'ScalableSQL' to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all the products, we adopted the term 'NewSQL' in the new report. And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL.

  26. New Databases • New database engines • Usually scale very well, can store a lot of data, and targeted for virtual environments • Players include • – NimbusDB • – VoltDB

  27. New MySQL Storage Engines • New databases that look like MySQLfrom the outside • – MySQL network protocol • – MySQL SQL flavor • Players include • – Akiban • – ScaleDB

  28. ScaleBase • ScaleBase offers Database Load Balancers • Scalability and high availability for your database, totally transparent to your application

  29. Summary • There are many ways to handle BigData on cloud environments • Understand your data requirements well – and use the right tool for the job • No one tool fits them all!

More Related