1 / 33

Big Data, NoSQL . . . So What?

Big Data, NoSQL . . . So What?. Iran Hutchinson. Me. I work for InterSystems who: Drives http : // globalsdb.org NoSQL project . Has 20+ years of NoSQL production deployments Has 20+ years of Big Data production deployments Built a ~250 million Euro business on the above

samson
Download Presentation

Big Data, NoSQL . . . So What?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data, NoSQL. . . So What? Iran Hutchinson

  2. Me • I work for InterSystems who: • Drives http://globalsdb.orgNoSQL project. • Has 20+ years of NoSQL production deployments • Has 20+ years of Big Data production deployments • Built a ~250 million Euro business on the above • Email: iran.hutchinson@intersystems.com • Twitter: #iranic

  3. Big Data is … • Important data in varying formats and volumes that is being generated across all areas affecting your business that is generally not centrally correlated or managed. • Examples include: • Word Files, PowerPoint, PDFs • Emails, Instant Messaging, Texts • Blogs and Social Media • Automated data from machine activities • Stream data from financial stock markets

  4. Some Big Data Numbers • Source: McKinsey Global Institute • 5 Billion mobile phones used in 2010 • 30 Billion pieces of info shared on Facebook each month • 40% projected growth in global data generated • 235 Terabytes collected by US Library of Congress 04/11 • 15 out of 17 sectors in US have more data stored per company than this.

  5. Some Big Data Numbers … • Source: McKinsey Global Institute • $300 Billion in potential value in US Healthcare system • €250 Billion in Europe’s public sector administration • $600 Billion in annual consumer surplus using location data • 60% Potential increase in retail operating margins • 140,000 – 190,000 analytical talent positions in US • 1.5 Million data-savvy managers needed in US

  6. Case Study: Credit Suisse • Key Challenges: • Revamp order routing architecture • Revamp order management architecture • Serve current demand and scale to new levels • Address downtime challenges

  7. Case Study: Credit Suisse … • Big Data in the form of volumes of transactions • Leveraged Caché’s: • In-memory architecture for performance • On-disk resiliency for availability • Distributed architecture for data coherency • Can easily process 1,000,000,000 transactions • During business hours

  8. Case Study: European Space Agency (ESA) • Key Challenges • Make the largest, most precise 3-D map of our Galaxy • Monitor 1,000,000,000 stars over 5 years, precisely charting position, movement, and brightness • Along the way discover hundreds of thousands of new celestial objects

  9. Case Study: ESA Continued … • Challenge Calculation: • Capture data for 1 Billion Celestial Objects • http://www.intersystems.com/cache/whitepapers/pdf/Charting_the_Galaxy.pdf 1,000,000,000 objects X 100 observations per object X 600 bytes per observation 60,000,000,000,000 (60TB) Solution: Caché/XEP, delivering 100,000+ sustained inserts per second per server, stored as real objects with SQL access

  10. Enabling Technology • Focus on Caché • A quick look at the architecture Paradigm Language

  11. Enabling Technology … • Java + C database kernel run in same process

  12. Enabling Technology … • ECP, Distributed Computing

  13. Enabling Technology … • Multiple, simultaneous data to disk writers Caché Buffer Journalers Hard Disk

  14. Who is this Guy? • Edgar Frank “Ted” Codd • Known for 12 Rules (0 ~ 12) for Relational Data Systems

  15. NoSQL … Breaking the Rules • Rule 1: The information Rule • All information is represented in 1 and only 1 way, namely by values in column positions within rows of tables • Rule 12: The no subversion Rule • If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system i.e. relational security or integrity constraints.

  16. Why NoSQL? • No to ACID transactions • No to the impedance mismatch with SQL • Dealing with Big Data and Web Scale • High prices from RDBMS vendors • Use commodity hardware • Flexible data models • It’s a cool movement ….

  17. Is NoSQL a new Concept? • No • Remember MUMPS? • SET ^Car("Door","Color")="BLUE” • Remember Multi-value/PICK • MATWRITE array.variable ON file.variable,id. …. • Ever heard of the NoSQL RDB? • Carlo Strozzi • http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20Page

  18. CAP Theorem • Consistent • A service that is consistent operates fully or not. • Availability • The service is available to operate fully or not. • Partition Tolerance • Managing data on multiple nodes. 1 node is 1 partition so it works or does not when it comes to processing data. • Significant as you can get 2 of these only …

  19. CAP Theorem … • Arguments and links • http://www.julianbrowne.com/article/viewer/brewers-cap-theorem • http://ksat.me/a-plain-english-introduction-to-cap-theorem/ • http://voltdb.com/company/blog/clarifications-cap-theorem-and-data-related-errors

  20. CAP Theorem …: Consistency

  21. CAP Theorem …: Consistency

  22. CAP Theorem …: Consistency

  23. Distributed computing • Fallacies (Peter Deutsch) • The network is reliable • Latency is zero • Bandwidth is infinite • The network is secure • Topology doesn’t change • There is one administrator • Transport cost is zero • The network is homogeneous • Remember JINI? (See Apache River project)

  24. NoSQL: Which Model to Use?

  25. NoSQL: Which project? • http://nosql-database.org/ lists 122 today. • Depends on your model selection. • Most likely choose well-known project. • Don’t forget about shared risk!

  26. NoSQL: Querying • Some solutions have no querying • When available query languages differ • Lack of general AD-Hoc querying – “no” SQL • Have you heard of UnQL? • http://www.unqlspec.org/display/UnQL/Home • NOTE: Toad for Cloud

  27. NoSQL: How to Succeed? • Know your application • Don’t forget the past lessons • Consider a hybrid approach • Fight the desire to Roll-Your-Own-DB • Start small but significant

  28. NoSQL: Hybrid Approach 1 • Two Systems • NoSQL System • SQL/RDBMS NoSQL SQL/RDBMS Data Mapper / Translator

  29. NoSQL: Hybrid Approach 2 • One system does both NoSQL and SQL

  30. GlobalsDB.org Project • Name comes from the underlying data structure • Multi-dimensional array • Basis for commercial Caché data system • Free for development and production deployment • NoSQLDB with Java and Node.js APIs • Code base is same as commercial product • APIs are open sourced or being open sourced • Database kernel is not open source

  31. A “Global” Definition • A Global is persistent sparse multi-dimensional array, which consists of one or more storage elements or "nodes". Each node is identified by a node reference (which is, essentially, its logical address) • simple =="some data” • complex["subscript-1", "subscript-2"] =="some data” • Example • product[item,type,os,proccessor] == quantity • product[“computer”,”laptop”,”Mac”,”i7”] == 3

  32. GlobalsDB Architecture • Current Architecture Paradigm Language

  33. GlobalsDB, NoSQL, Big Data • http://nosql.mypopescu.com/ • http://highscalability.com/ • http://nosqltapes.com/ • http://globalsdb.wordpress.com

More Related