slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Agenda PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 18

Agenda - PowerPoint PPT Presentation

  • Uploaded on

Agenda. Big Data Trends What is Jumbune Component Descriptions Future Release Insights. Big Data Trends. No more single purpose Hadoop clusters – resource sharing Data Lake: Data ETL- ing from many sources Integrated platforms using variety of analytical engines

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Agenda' - duer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Big Data Trends

What is

Component Descriptions

Future Release Insights

big data trends
Big Data Trends
  • No more single purpose Hadoop clusters – resource sharing
  • Data Lake: Data ETL-ingfrom many sources
  • Integrated platforms using variety of analytical engines
  • Serving Multiple Business applications
shared cluster among execution engines
Shared Cluster among Execution Engines

Yarn, Mesos - Hama, Giraph, Storm, MapReduce

MapReduce, Hama, Giraph

Hadoop MapReduce




big data based solution life stages high level view
Big Data based solution life stages (High level view)


Business User


MapReduce Dev

hadoop based solution life stages as on ground cyclic execution
Hadoop based solution life stages (as on ground) – Cyclic execution

Bad Logic?



Data Analyst

MapReduce Dev

Business User

Logic & Data Test

Monitoring Needs

Resource Utilization ?



Staging Data

Bad Data?

challenges in analytical solutions
Challenges in Analytical Solutions

3. Cluster resources are shared and optimal utilization is key

1. No common platform across actors to detect root causes

2. Incremental imports may ingest bad data

4. Implementing models in custom MR in initial attempts is like hitting bull’s eye

5. Bad Logic or Bad data


“A catalyst to accelerate realization of analytical solutions”

Data Validation


Cluster Monitor

Job Profiler

intersecting solution lifecycle stages
Intersecting solution Lifecycle Stages



Solution Development

Quality Test


Bulk & Incremental Data

niche features
Niche Features
  • In depth code level analysis of cluster wide flow analyzer
  • Record level data violation reports.
  • No deployment on Workers - Ultra light agent installation on gateway node only
  • Ability to turn on/off cluster monitoring at will – lessens resource load
  • Customizable rack aware cluster monitoring
  • Correlated job profilinganalysis of phases, throughput and resource consumption
  • Ability to work across all Hadoop Distributions
supported deployments
Supported Deployments

Azure, EC2

All major distributions


On Premise

mapreduce flow debugger
MapReduce Flow Debugger

Verifies the flow of input records in user’s map reduce implementation

Drill down visualization helps developer to quickly identify the problem.

Only tool to assist developers to figure out MapReduce implementation faults without any extra coding

data validator
Data Validator
  • Validates inconsistencies in data in the form of :
    • Null checks
    • Data type checks
    • Regular expression checks
  • Generic way of specifying validation rules
  • Provides record level report for found anomalies
  • Currently supports HDFS as the lake file system
mr job profiling
MR Job Profiling
  • Per Job Phase wise
    • performance for each JVM
    • data flow rate
    • Resource usage
  • Per Job Heap sites for Mapper & Reducer
  • Per Job CPU cycles for Mapper & Reducer
hadoop cluster monitoring
Hadoop Cluster Monitoring

Data Centre & Rack aware nodes view

Dynamic Interval based monitoring

Hadoop JMX, Node Resource Statistics

Network Latency across Hadoop nodes

Per file, node wise replica Placement (which nodes have replicas of a given file ?)

HDFS data placement view (HDFS balanced ?)

HDFS Health statistics (HDFS corrupted ?)

immediate next release
Immediate next release
  • 1.3.0
    • Yarn compatible
    • Support for all 3 major Apache Hadoop branches – 0.23.x, 1.2.x, and 2.4.x
connect to jumbune
Connect to Jumbune
  • Website
  • Contribute
  • Social
          • Follow @jumbune Use #jumbune
          • Jumbune Group:
  • Forums
    • Users:
    • Dev:
    • Issues:
  • Downloads