1 / 49

Infobright Meetup Host Avner Algom May 28, 2012

Infobright Meetup Host Avner Algom May 28, 2012. Agenda. Infobright Paul Desjardins What part of the Big Data problem does Infobright solve? VP Bus Dev Where does Infobright fit in the database landscape? Infobright Inc Joon Kim Technical Overview

ide
Download Presentation

Infobright Meetup Host Avner Algom May 28, 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infobright Meetup Host AvnerAlgom May 28, 2012

  2. Agenda • Infobright • Paul Desjardins What part of the Big Data problem does Infobright solve? VP Bus Dev Where does Infobright fit in the database landscape? Infobright Inc • Joon Kim Technical Overview Sr. Sales Engineer Use cases • WebCollage • LiorHahamWebCollage Case Study: Using Analytics for Hosted Web Applications • Zaponet • AsafBirenzvieg Introduction / Experience with Infobright CEO • Q/A

  3. Growing Customer Base across Use Cases and Verticals • 1000 direct and OEM installations across North America, EMEA and Asia • 8 of Top 10 Global Telecom Carriers using Infobright via OEM/ISVs

  4. The Machine-Generated Data Problem “Machine-generated data is the future of data management.” Curt Monash, DBMS2 • Machine-generated/hybrid data • Weblogs • Computer, network events • CDRs • Financial trades • Sensors, RFID etc • Online game data • Human-generated data - input from most conventional kinds transactions • Purchase/sale • Inventory • Manufacturing • Employment status change Rate of Growth

  5. The Value in the Data “Analytics drives insights; insights lead to greater understanding of customers and markets; that understanding yields innovative products, better customer targeting, improved pricing, and superior growth in both revenue and profits.” Accenture Technology Vision, 2011

  6. Current Technology: Hitting the WallToday’s database technology requires huge effort and massive hardware How Performance Issues are Typically Addressed – by Pace of Data Growth Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA By Joseph McKendrick, Research Analyst; Unisphere Research October 2010

  7. Infobright Customer Performance Statistics Fast query response with no tuning or indexes • Mobile Data(15MM events) • Analytic Queries • Oracle Query Set • Alternative • Alternative • Alternative • 43 min with SQL Server • 23 seconds • 2+ hours with MySQL • <10 seconds • 10 seconds – 15 minutes • 0.43 – 22seconds • BI Report • Data Load • Alternative • Alternative • 7 hrs in Informix • 17 seconds • 11 hours in MySQL ISAM • 11 minutes

  8. Save Time, Save Cost • Fastest time to value • Download in minutes, install in minutes • No indexes, no partitions, no projections • No complex hardware to install • Minimal administration • Self-tuning • Self-managing • Eliminate or reduce aggregate table creation • Outstanding performance • Fast query response against large data volume • Load speeds over 2TB /hour with DLP • High data compression 10:1 to 40:1+ • Economical • Low subscription cost • Less data storage • Industry-standard servers

  9. Where does Infobright fit in the database landscape? • One Size DOESN’T fit all. • Specialized Databases Deployed • Excellent at what they were designed for • More open source specialized databases than commercial • Cloud / SaaS use for specialty DBMS becomes popular • Database Virtualization • Significantly lowered DBA costs

  10. The Emerging Database Landscape

  11. Why use Infobright to deal with large volumes of machine generated data? • EASY • TO INSTALL • TO USE • AFFORD- • ABLE • LESS HW • LOW SW • COST • FAST • FAST QUERY • FAST LOAD

  12. Technical Overview of Infobright Joon Kim Senior Sales Engineer joon.kim@infobright.com

  13. Key Components of Infobright 003 Column-Oriented • Smarter architecture • Load data and go • No indices or partitions to build and maintain • Knowledge Grid automatically updated as data packs are created or updated • Super-compact data foot- print can leverage off-the-shelf hardware Knowledge Grid–statistics and metadata “describing” the super-compressed data Data Packs – data stored in manageably sized, highly compressed data packs Data compressed using algorithms tailored to data type

  14. Infobright Architecture

  15. 1. Column Orientation Incoming Data Column Oriented Layout (1,2,3; Moe,Curly,Larry; Howard,Joe,Fine; 10000,12000,9000;) • Works well with aggregate results (sum, count, avg. ) • Only columns that are relevant need to be touched • Consistent performance with any database design • Allows for very efficient compression

  16. Compression • Results vary depending on the distribution of data among data packs • A typical overall compression ratio seen in the field is 10:1 • Some customers have seen results of 40:1 and higher • For example, 1TB of raw data compressed 10 to 1 would only require 100GB of disk capacity 2. Data Packs and Compression Patent Pending Compression Algorithms • Data Packs • Each data pack contains 65,536 data values • Compression is applied to each individual data pack • The compression algorithm varies depending on data type and distribution 64K 64K 64K 64K

  17. 3. The Knowledge Grid Knowledge Grid Applies to the whole table Knowledge Nodes Built for each Data Pack Column A Information about the data Calculated during load Basic Statistics DP1 Column A Column B … Numerical Ranges DP1 DP2 Character Maps DP3 DP4 DP5 DP6 Dynamic Calculated during query 17

  18. Knowledge Grid Internals 006 Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals. Pack-to-Pack Nodes (PPN) PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used. This metadata layer = 1% of the compressed volume

  19. Optimizer / Granular Computing Engine Query received Engine iterates on Knowledge Grid Each pass eliminates Data Packs If any Data Packs are needed to resolve query, only those are decompressed Knowledge Grid Query Results 1% Q: How are my sales doing this year? Compressed Data 19

  20. How the Optimizer Works 007 SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘TORONTO’; salary age job city All packs ignored Rows 1 to 65,536 Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 65,537 to 131,072 All packs ignored Find the Data Packs that have job = ‘Shipping’ 131,073 to …… Find the Data Packs that have City = “Toronto’ All packs ignored Now we eliminate all rows that have been flagged as irrelevant. Finally we have identified the data pack that needs to be decompressed Only this pack will be decompressed All values match Completely Irrelevant Suspect

  21. Infobright Architected on MySQL • “The world’s most popular open source database”

  22. Sample Script (Create Table, Import, Export) 065 -- Import the text file. Set AUTOCOMMIT=0; SET @bh_dataformat = 'txt_variable'; LOAD DATA INFILE "/tmp/Input/customers.txt" INTO TABLE customers FIELDS TERMINATED BY ';' ENCLOSED BY 'NULL' LINES TERMINATED BY '\r\n'; COMMIT; -- Export the data into BINARY format. SET @bh_dataformat = 'binary'; SELECT * INTO OUTFILE "/tmp/output/customers.dat" FROM customers; -- Export the data into TEXT format. SET @bh_dataformat = 'txt_variable'; SELECT * INTO OUTFILE "/tmp/output/customers.text" FIELDS TERMINATED BY ';' ENCLOSED BY 'NULL' LINES TERMINATED BY '\r\n' FROM customers; USE Northwind; DROP TABLE IF EXISTS customers; CREATE TABLE customers ( CustomerID varchar(5), CompanyName varchar(40), ContactName varchar(30), ContactTitle varchar(30), Address varchar(60), City varchar(15) Region char(15) PostalCode char(10), Country char(15), Phone char(24), Fax varchar(24), CreditCard float(17,1), FederalTaxes decimal(4,2) ) ENGINE=BRIGHTHOUSE;

  23. Infobright 4.0 – Additional Features Built-in intelligence for machine-generated data: Find ‘Needle in the Haystack’ faster

  24. Work with Data Even Faster • Intelligence to automatically optimize the database • DomainExpert DomainExpert: Breakthrough Analytics • Enables users to add intelligence into Knowledge Grid directly with no schema changes • Pre-defined/Optimized for web data analysis • IP addresses • Email addresses • URL/URI • Can cut query time in half when using this data definition

  25. Pattern recognition enables faster queries Patterns defined and stored Complex fields decomposed into more homogeneous parts Database uses this information when processing query Users can also easily add their own data patterns Identify strings, numerics, or constants Financial Trading example– ticker feed “AAPL–350,354,347,349” encoded “%s-%d,%d,%d,%d” Will enable higher compression DomainExpert: Prebuilt plus DIY options

  26. Get Data In Faster: DLP • Near-real time ad-hoc analysis Distributed Load Processor (DLP) • Add-on product to IEE which linearly scales load performance • Remote servers compress data and build Knowledge Grid elements on-the-fly… • Appended to the data server running the main Infobright database • It’s all about speed: Faster Load & Queries • Linear scalability of data load for very high performance

  27. Get Data In Faster: Hadoop • Near-real time ad-hoc analysis Big Data - Hadoop Support • DLP Hadoop connector • Extracts data from HDFS, load into Infobright at high speeds • You load 100s of TBs or Petabytes into Hadoop for bulk storage and batch processing • Then load TBs into Infobright for near-real time analytics using Hadoop connector and DLP • Hadoop connectivity • Use the right tool for the job Infobright / Hadoop: Perfect complement to analyze Big Data

  28. Rough Query: Speed Up Data Mining by 20x • Near-real time ad-hoc analysis Rough Query – Another Infobright Breakthrough • Enables very fast iterative queries to quickly drill down into large volumes of data • “Select roughly” to instantaneously see interval range for relevant data, • uses only the in-memory Knowledge Grid information • Filtering can narrow results • Need more detail? Drill down further with rough query or query for exact answer • Rough Query: Data mining “drill down” at RAM speed

  29. The Value Infobright Delivers High performance with much less work and lower cost

  30. Q & A

  31. Infobright Use Cases

  32. Infobright and Hadoop in Video Advertising: LiveRail “Infobright and Hadoop are complementary technologies that help us manage large amounts of data while meeting diverse customers needs to analyze the performance of video advertising investments.” Andrei Dunca, CTO of LiveRail

  33. Example in Mobile Analytics: Bango

  34. Online Analytics: Yahoo!

  35. Case Study: JDSU • Annual revenues exceeded $1.3B in 2010 • 4700 employees are based in over 80 locations worldwide • Communications sector offers instruments, systems, software, services, and integrated solutions that help communications service providers, equipment manufacturers, and major communications users maintain their competitive advantage

  36. JDSU Service Assurance Solutions • Ensure high quality of experience (QoE) for wireless voice, data, messaging, and billing. • Used by many of the world’s largest network operators

  37. JDSU Project Goals • New version of Session Trace solution that would: • Support very fast load speeds to keep up with increasing call volume and the need for near real-time data access • Reduce the amount of storage by 5x, while also keeping much longer data history • Reduce overall database licensing costs 3X • Eliminate customers’ “DBA tax,” meaning there should require zero maintenance or tuning while enabling flexible analysis • Continue delivering the fast query response needed by Network Operations Center (NOC) personnel when troubleshooting issues and supporting up to 200 simultaneous users

  38. High Level View

  39. Session Trace Application For deployment at Tier 1 network operators, each site will store between 6 and 45TB of data, and the total data volume will range from 700TB to 1PB of data.

  40. Infobright Implementation

  41. Save Time, Save Cost • Fastest time to value • Download in minutes, install in minutes • No indexes, no partitions, no projections • No complex hardware to install • Minimal administration • Self-tuning • Self-managing • Eliminate or reduce aggregate table creation • Outstanding performance • Fast query response against large data volume • Load speeds over 2TB /hour with DLP • High data compression 10:1 to 40:1+ • Economical • Low subscription cost • Less data storage • Industry-standard servers

  42. What Our Customers Say “Using Infobright allows us to do pricing analyses that would not have been possible before.” “With Infobright, [this customer] has access to data within minutes of transactions occurring, and can run ad-hoc queries with amazing performance.” "Infobright offered the only solution that could handle our current data load and scale to accommodate a projected growth rate of 70 percent, without incurring prohibitive hardware and licensing costs. “Using Infobright allowed JDSU to meet the aggressive goals we set for our new product release: reducing storage and increasing data history retention by 5x, significantly reducing costs, and meeting the fast data load rate and query performance needed by the world’s largest network operators.”

  43. Where does Infobright fit in the database landscape? • One Size DOESN’T fit all. • Specialized Databases Deployed • Excellent at what they were designed for • More open source specialized databases than commercial • Cloud / SaaS use for specialty DBMS becomes popular • Database Virtualization • Significantly lowered DBA costs

  44. NoSQL: Unstructured Data Kings • Schema-less Designs • Extreme Transaction Rates • Massive Horizontal Scaling • Heavy Data Redundancy • Niche Players • Tame the Unstructured • Store Anything • Keep Everything Top NoSQL Offerings

  45. NoSQL: Breakout 120+ Variants : Find More at nosql-databases.org

  46. What do we see with NoSQL

  47. Lest We Forget Hadoop • Scalable, fault-tolerant distributed system for data storage and processing • HadoopDistributed File System (HDFS): self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing Value Add • Flexible :store schema-less data and add as needed • Affordable : low cost per terabyte • Broadly adopted : Apache Project with a large, active ecosystem • Proven at scale : petabyte+ implementations in production today

  48. Hadoop Data Extraction

  49. NewSQL: Operational, Relational Powerhouses • Overclock Relational Performance • Scale-Out • Scale “Smart” • New, Scalable SQL • Extreme Transaction Rates • Diverse Technologies • ACID Compliance

More Related