1 / 18

Agenda

Agenda. Big Data Trends What is Jumbune jumbune.org Component Descriptions Future Release Insights. Big Data Trends. No more single purpose Hadoop clusters – resource sharing Data Lake: Data ETL- ing from many sources Integrated platforms using variety of analytical engines

duer
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda Big Data Trends What is Jumbunejumbune.org Component Descriptions Future Release Insights

  2. Big Data Trends • No more single purpose Hadoop clusters – resource sharing • Data Lake: Data ETL-ingfrom many sources • Integrated platforms using variety of analytical engines • Serving Multiple Business applications

  3. Shared Cluster among Execution Engines Yarn, Mesos - Hama, Giraph, Storm, MapReduce MapReduce, Hama, Giraph Hadoop MapReduce 2014 2007 2011

  4. Big Data based solution life stages (High level view) Production Business User Analyst MapReduce Dev

  5. Hadoop based solution life stages (as on ground) – Cyclic execution Bad Logic? xxx xxx Data Analyst MapReduce Dev Business User Logic & Data Test Monitoring Needs Resource Utilization ? Production Devops Staging Data Bad Data?

  6. Challenges in Analytical Solutions 3. Cluster resources are shared and optimal utilization is key 1. No common platform across actors to detect root causes 2. Incremental imports may ingest bad data 4. Implementing models in custom MR in initial attempts is like hitting bull’s eye 5. Bad Logic or Bad data

  7. Jumbune “A catalyst to accelerate realization of analytical solutions” Data Validation FlowAnalyzer Cluster Monitor Job Profiler

  8. Intersecting solution Lifecycle Stages xxx xxx Solution Development Quality Test Devops Bulk & Incremental Data

  9. Niche Features • In depth code level analysis of cluster wide flow analyzer • Record level data violation reports. • No deployment on Workers - Ultra light agent installation on gateway node only • Ability to turn on/off cluster monitoring at will – lessens resource load • Customizable rack aware cluster monitoring • Correlated job profilinganalysis of phases, throughput and resource consumption • Ability to work across all Hadoop Distributions

  10. Components - Recommended Environments

  11. Supported Deployments Azure, EC2 All major distributions Jumbune On Premise

  12. MapReduce Flow Debugger Verifies the flow of input records in user’s map reduce implementation Drill down visualization helps developer to quickly identify the problem. Only tool to assist developers to figure out MapReduce implementation faults without any extra coding

  13. Data Validator • Validates inconsistencies in data in the form of : • Null checks • Data type checks • Regular expression checks • Generic way of specifying validation rules • Provides record level report for found anomalies • Currently supports HDFS as the lake file system

  14. MR Job Profiling • Per Job Phase wise • performance for each JVM • data flow rate • Resource usage • Per Job Heap sites for Mapper & Reducer • Per Job CPU cycles for Mapper & Reducer

  15. Hadoop Cluster Monitoring Data Centre & Rack aware nodes view Dynamic Interval based monitoring Hadoop JMX, Node Resource Statistics Network Latency across Hadoop nodes Per file, node wise replica Placement (which nodes have replicas of a given file ?) HDFS data placement view (HDFS balanced ?) HDFS Health statistics (HDFS corrupted ?)

  16. Immediate next release • 1.3.0 • Yarn compatible • Support for all 3 major Apache Hadoop branches – 0.23.x, 1.2.x, and 2.4.x

  17. Connect to Jumbune • Website • http://jumbune.org • Contribute • http://github.com/impetus-opensource/jumbune • http://jumbune.org/jira/JUM • Social • Follow @jumbune Use #jumbune • Jumbune Group: http://linkd.in/1mUmcYm • Forums • Users: users-subscribe@collaborate.jumbune.org • Dev: dev-subscribe@collaborate.jumbune.org • Issues: issues-subscribe@collaborate.jumbune.org • Downloads • http://jumbune.org • https://bintray.com/jumbune/downloads/jumbune

More Related