1 / 42

Pentaho business analytics & data integration Amjad.akkawi@zaponet

Zaponet is a service integrator and development shop that provides state-of-the-art data products leveraging big data and data science technologies. We architect, design, and build big data solutions including data warehouses, user-profile systems, recommendation engines, and more.

Download Presentation

Pentaho business analytics & data integration Amjad.akkawi@zaponet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pentaho business analytics & data integration Amjad.akkawi@zaponet.com

  2. About US – Zaponet data science solutions • Zaponet is a service integrator and development shop providing solutions & professional services for building state of the art data-products which leverage big-data & data-science technologies. • Zaponet architect, design and builds big-data solutions: data warehouses, user-profile systems, recommendation engines, complex event processing and more • Some of our technology partners are: pentaho ,cloudera ,infobright , vertica, kognitio ,gigaspaces • more details www.zaponet.com *future meetup: Pentaho Weka for data science

  3. About Me – AmjadAkkawi • ZaponetCTO • Experience in pentaho

  4. Agenda • Pentaho in business analytics & data integration • Pentaho BI Demo • Pentaho PDI Demo

  5. About Pentaho • Recognized leader in business analytics & data integration • Subscription-based business model • Achieved critical mass: • Over 1,200 commercial customers • Over 10,000 production deployments • Over 185 countries • Stewardship of most important open source analytics projects OVER 160 PARTNERS GLOBALLY INDUSTRY RECOGNITION

  6. Why Customer Love Pentaho Speed of Deployment Marketing dashboard in less than 1 day 2 weeks time to market 8 weeks time to market Fully rolled out in budget in 4 months Innovation & Scalability Analyzing buying patterns of 5 million members Music files from 20,000 sources Operational reports at all 1000 retail stores Analytics on 500,000 patients records Superior Customer Service “… a great partner through every phase of our project” “Pentaho support is as good as its software” “… better functionality and more support” “… top-notch professional support” Total Value “…ROI was almost immediate.” 75% lower acquisition costs €350K+ cost saving Less than 1 month ROI

  7. Pentaho in the Big Data Fabric Pentaho Business Analytics 3rd Party Tools Big Analytics • R • 3rd Party BI Tools • Applications Data Integration Job Orchestration Workflow Scheduling High Performance Visual IDE Data Integration Hadoop Java MapReduce, Pig Pentaho MapReduce NoSQL Databases Analytic Databases Big Data Mgmt

  8. High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Components are independent Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining

  9. High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining

  10. Dashboards

  11. Dashboards & Interactive Dashboards

  12. Dashboards – Geo Location-Based

  13. High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining

  14. Reports – Interactive, Static, Distributed

  15. Reports – Reporting Pack & House Styles

  16. Reports – Reporting Pack & House Styles

  17. High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining

  18. Enhanced In-Memory Analytics • Enhanced in-memory caching for speed of thought visualization & analysis • More re-usability of in-memory data • Fewer trips to the database/disk • Builds on existing unique extreme-scale in-memory analytics • Support for external data grids • Infinispan / JBossEnteprise Data Grid and Memcached • Scale to caching hundreds of GBs (potentially TBs) of data in-memory • Competition • Java heap or C++ memory space (a few GB at most (most BI products) or • Proprietary (hard to manage) in-memory technology (e.g. Qlikview, Microstrategy)

  19. Analyzer – Table format

  20. Analyzer – Chart format

  21. Analyzer: Geo Location-Based Analysis

  22. High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining

  23. Scenario 1 Dashboard OperationalDatabase Report

  24. Scenario 2 Dashboard Data Mart(s) / Warehouse Metadata Report Analyzer

  25. Metadata – Schema Workbench Complex calculations and multi-cube requirements may need more modeling

  26. Scenario 3 BIG DATA Technologyand/orStaging Area & Data Vault Structured Data Dashboard Data Mart(s) / Warehouse PDI PDI Metadata Report Unstructured Data100 Analyzer Pentaho Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management Pentaho Data Integration Source data acquisition Initial consolidation as required

  27. Variations on a Theme Alerting SMS, eMail & attachments BIG DATA Technologyand/orStaging Area & Data Vault Structured Data Dashboard Data Mart(s) / Warehouse PDI PDI Metadata Report Unstructured Data Analyzer Pentaho Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management Pentaho Data Integration Source data acquisition Initial consolidation as required Ad-hoc Data

  28. PDI Components • Enterprise Edition Data Integration Server • Execution and remote monitoring • Integrated scheduling • Enterprise Security options • Enhanced content management including revision history and locking • Remote distributed cluster based processing

  29. Kettle Conceptual Model

  30. Pentaho Data Integration Step based processing engine with instant visualization of results

  31. Pentaho Data Integration Step based performance

  32. Pentaho Data Integration Integrated Metadata Creation

  33. Pentaho and Big DataForrester Wave, Enterprise Hadoop Solutions, Q1 2012 • Only vendor in strong performer category: “an impressive Hadoop integration tool” • Only business analytics vendor • Richest functionality • Most extensive integration with open source Apache Hadoop and major Hadoop distributions

  34. Expanded Insight into Big and Diverse Data • Improved support for Hadoop • Simpler deployment across Hadoop clusters • Support for the Hadoop cache • Debian RPM installer • Performance and ease of use enhancements for Pentaho MapReduce visual development • Support for Hadoop Security data access • New NoSQL database support • Cassandra • MongoDB • Growing the Pentaho big data community • Open sourced all big data components (Hadoop & NoSQL) • Apache License – same as used by leading Hadoop and NoSQL distros • New big data developer resources: How to documents, videos, walk-throughs

  35. Hadoop Data Management & Integration Accessible by any ETL developer or data scientist Pentaho MapReduce

  36. NoSQL Data Management & Integration Visual Job OrchestrationAny Data Source Accessible by any ETL developer or data scientist

  37. Visual Job Orchestration Any Data Source Accessible to any ETL developer or data scientist Scheduling

  38. Pentaho Integration Options PentahoBI Server Other Application Pentaho Custom Stuff My Application PentahoComponents

  39. Integration

  40. Q & A • NEXT … • Pentaho PDI Demo • Pentaho BI Demo

  41. “Traditional” Database Support DATA ANALYSIS DATA INTEGRATION

  42. Broadest Support for Big Data Platforms Hadoop NoSQL Analytic Databases

More Related