Apache tez accelerating hadoop data processing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Apache Tez : Accelerating Hadoop Data Processing PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Apache Tez : Accelerating Hadoop Data Processing. Bikas Saha @ bikassaha. Tez – Introduction. Distributed execution framework targeted towards data-processing applications. Based on expressing a computation as a dataflow graph. Highly customizable to meet a broad spectrum of use cases.

Download Presentation

Apache Tez : Accelerating Hadoop Data Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Apache tez accelerating hadoop data processing

Apache Tez : Accelerating Hadoop Data Processing

Bikas Saha

@bikassaha


Tez introduction

Tez – Introduction

  • Distributed execution framework targeted towards data-processing applications.

  • Based on expressing a computation as a dataflow graph.

  • Highly customizable to meet a broad spectrum of use cases.

  • Built on top of YARN – the resource management framework for Hadoop.

  • Open source Apache project and Apache licensed.


Tez hadoop 1 hadoop 2

Tez – Hadoop 1 ---> Hadoop 2

  • Monolithic

  • Resource Management – MR

  • Execution Engine – MR

Layered

  • Resource Management – YARN

  • Engines – Hive, Pig, Cascading, Your App!


Tez empowering applications

Tez – Empowering Applications

  • Tez solves hard problems of running on a distributed Hadoop environment

  • Apps can focus on solving their domain specific problems

  • This design is important to be a platform for a variety of applications

App

  • Custom application logic

  • Custom data format

  • Custom data transfer technology

Tez

  • Distributed parallel execution

  • Negotiating resources from the Hadoop framework

  • Fault tolerance and recovery

  • Horizontal scalability

  • Resource elasticity

  • Shared library of ready-to-use components

  • Built-in performance optimizations

  • Security


Tez end user benefits

Tez – End User Benefits

  • Better performance of applications

    • Built-in performance + Application define optimizations

  • Better predictability of results

    • Minimization of overheads and queuing delays

  • Better utilization of compute capacity

    • Efficient use of allocated resources

  • Reduced load on distributed filesystem(HDFS)

    • Reduce unnecessary replicated writes

  • Reduced network usage

    • Better locality and data transfer using new data patterns

  • Higher application developer productivity

    • Focus on application business logic rather than Hadoop internals


Tez design considerations

Tez – Design considerations

Don’t solve problems that have already been solved. Or else you will have to solve them again!

  • Leverage discrete task based compute model for elasticity, scalability and fault tolerance

  • Leverage several man years of work in Hadoop Map-Reduce data shuffling operations

  • Leverage proven resource sharing and multi-tenancy model for Hadoop and YARN

  • Leverage built-in security mechanisms in Hadoop for privacy and isolation

Look to the Future with an eye on the Past


Tez problems that it addresses

Tez – Problems that it addresses

  • Expressing the computation

    • Direct and elegant representation of the data processing flow

    • Interfacing with application code and new technologies

  • Performance

    • Late Binding : Make decisions as late as possible using real data from at runtime

    • Leverage the resources of the cluster efficiently

    • Just work out of the box!

    • Customizable to let applications tailor the job to meet their specific requirements

  • Operation simplicity

    • Painless to operate, experiment and upgrade


Tez simplifying operations

Tez – Simplifying Operations

  • No deployments to do. No side effects. Easy and safe to try it out!

  • Tez is a completely client side application.

  • Simply upload to any accessible FileSystem and change local Tez configuration to point to that.

  • Enables running different versions concurrently. Easy to test new functionality while keeping stable versions for production.

  • Leverages YARN local resources.

HDFS

Tez Lib 1

Tez Lib 2

Client

Machine

Node

Manager

Node

Manager

Client

Machine

TezTask

TezTask

TezClient

TezClient


Tez expressing the computation

Tez – Expressing the computation

Distributed data processing jobs typically look like DAGs (Directed Acyclic Graph).

  • Vertices in the graph represent data transformations

  • Edges represent data movement from producers to consumers

Preprocessor Stage

Task-1

Task-2

Samples

Sampler

Partition Stage

Ranges

Task-1

Task-2

Distributed Sort

Aggregate Stage

Task-1

Task-2


Tez expressing the computation1

Tez – Expressing the computation

Tez provides the following APIs to define the processing

  • DAG API

    • Defines the structure of the data processing and the relationship between producers and consumers

    • Enable definition of complex data flow pipelines using simple graph connection API’s. Tez expands the logical DAG at runtime

    • This is how the connection of tasks in the job gets specified

  • Runtime API

    • Defines the interfaces using which the framework and app code interact with each other

    • App code transforms data and moves it between tasks

    • This is how we specify what actually executes in each task on the cluster nodes


Tez dag api

Tez – DAG API

Defines the global processing flow

// Define DAG

DAG dag = DAG.create();

// Define Vertex

Vertex Map1 = Vertex.create(Processor.class);

// Define Edge

Edge edge= Edge.create(Map1, Reduce1, SCATTER_GATHER, PERSISTED, SEQUENTIAL, Output.class, Input.class);

// Connect them

dag.addVertex(Map1).addEdge(edge)…

Map1

Map2

Scatter

Gather

Reduce1

Reduce2

Scatter

Gather

Join


Tez dag api1

Tez – DAG API

Edge properties define the connection between producer and consumer tasks in the DAG

Data movement – Defines routing of data between tasks

  • One-To-One : Data from the ith producer task routes to the ith consumer task.

  • Broadcast : Data from a producer task routes to all consumer tasks.

  • Scatter-Gather : Producer tasks scatter data into shards and consumer tasks gather the data. The ith shard from all producer tasks routes to the ith consumer task.


Tez logical dag expansion at runtime

Tez – Logical DAG expansion at Runtime

Map1

Map2

Reduce1

Reduce2

Join


Tez runtime api

Tez – Runtime API

Flexible Inputs-Processor-Outputs Model

  • Thin API layer to wrap around arbitrary application code

  • Compose inputs, processor and outputs to execute arbitrary processing

  • Applications decide logical data format and data transfer technology

  • Customize for performance

  • Built-in implementations for Hadoop 2.0 data services – HDFS and YARN ShuffleService. Built on the same API. Your implementations are as first class as ours!


Tez task composition

Tez – Task Composition

Logical DAG

Task A

V-A

Processor-A

EdgeAB

EdgeAC

Output-3

Output-1

V-B

V-C

V-A = { Processor-A.class }

V-B = { Processor-B.class}

V-C = { Processor-C.class }

EdgeAB = { V-A, V-B,

Output-1.class, Input-2.class }

Edge AC = { V-A, V-C,

Output-3.class, Input-4.class }

Input-2

Input-4

Processor-B

Processor-C

Task B

Task C


Tez library of inputs and outputs

Tez – Library of Inputs and Outputs

Classical ‘Map’

Classical ‘Reduce’

HDFS Input

Shuffle Input

Shuffle Input

Sorted Output

HDFS Output

Sorted Output

  • What is built in?

    • Hadoop InputFormat/OutputFormat

    • SortedGroupedPartitioned Key-Value Input/Output

    • UnsortedGroupedPartitioned Key-Value Input/Output

    • Key-Value Input/Output

Map Processor

Reduce Processor

Reduce Processor

Intermediate ‘Reduce’ for

Map-Reduce-Reduce


Tez composable task model

Tez – Composable Task Model

Evolve

Adopt

Optimize

RDMA

Input

HDFS

Input

HDFS

Input

Native

DB

Input

Remote

File

Server

Input

Remote

File

Server

Input

Custom Processor

Hive Processor

Custom Processor

Kakfa

Pub-Sub

Output

HDFS

Output

HDFS

Output

Amazon

S3

Output

Local

Disk

Output

Local

Disk

Output


Tez performance

Tez – Performance

  • Benefits of expressing the data processing as a DAG

    • Reducing overheads and queuing effects

    • Gives system the global picture for better planning

  • Efficient use of resources

    • Re-use resources to maximize utilization

    • Pre-launch, pre-warm and cache

    • Locality & resource aware scheduling

  • Support for application defined DAG modifications at runtime for optimized execution

    • Change task concurrency

    • Change task scheduling

    • Change DAG edges

    • Change DAG vertices


Tez benefits of dag execution

Tez – Benefits of DAG execution

Faster Execution and Higher Predictability

  • Eliminate replicated write barrier between successive computations.

  • Eliminate job launch overhead of workflow jobs.

  • Eliminate extra stage of map reads in every workflow job.

  • Eliminate queue and resource contention suffered by workflow jobs that are started after a predecessor job completes.

  • Better locality because the engine has the global picture

Pig/Hive - Tez

Pig/Hive - MR


Tez container re use

Tez – Container Re-Use

  • Reuse YARN containers/JVMs to launch new tasks

  • Reduce scheduling and launching delays

  • Shared in-memory data across tasks

  • JVM JIT friendly execution

YARNContainer / JVM

YARN Container

Tez

Application Master

TezTask Host

Start Task

TezTask1

Task Done

Shared Objects

TezTask2

Start Task


Tez sessions

Tez – Sessions

  • Sessions

  • Standard concepts of pre-launch and pre-warm applied

  • Key for interactive queries

  • Represents a connection between the user and the cluster

  • Multiple DAGs executed in the same session

  • Containers re-used across queries

  • Takes care of data locality and releasing resources when idle

Start Session

SubmitDAG

Client

Application Master

Task Scheduler

Pre Warmed

JVM

Shared Object Registry

Container Pool


Tez re use in action

Tez – Re-Use in Action

Containers

Time 

Task Execution Timeline


Tez customizable core engine

Tez – Customizable Core Engine

Start

vertex

  • Vertex Manager

    • Determines task parallelism

    • Determines when tasks in a vertex can start.

  • DAG Scheduler

    • Determines priority of task

  • Task Scheduler

    • Allocates containers from YARN and assigns them to tasks

Get container

Vertex-1

GetPriority

DAG

Scheduler

Task

Scheduler

Start

vertex

Vertex Manager

Start

tasks

Vertex-2

GetPriority

Get container


Tez event based control plane

Tez – Event Based Control Plane

  • Events used to communicate between the tasks and between task and framework

  • Data Movement Event used by producer task to inform the consumer task about data location, size etc.

  • Input Error event sent by task to the engine to inform about errors in reading input. The engine then takes action by re-generating the input

  • Other events to send task completion notification, data statistics and other control plane information

Map Task 2

Map Task 1

Output1

Output1

Output2

Output2

Output3

Output3

Data Event

AM

Router

Scatter-Gather Edge

Error Event

Reduce Task 2

Input1

Input2


Tez automatic reduce parallelism

Tez – Automatic Reduce Parallelism

Event Model

Map tasks send data statistics events to the Reduce Vertex Manager.

Vertex Manager

Pluggable application logic that understands the data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism

App Master

Data Size Statistics

Vertex Manager

Map Vertex

Set Parallelism

Vertex State

Machine

Re-Route

Reduce Vertex

Cancel Task


Tez reduce slow start pre launch

Tez – Reduce Slow Start/Pre-launch

Event Model

Map completion events sent to the Reduce Vertex Manager.

Vertex Manager

Pluggable application logic that understands the data size. Advises the vertex controller to launch the reducers before all maps have completed so that shuffle can start.

App Master

Task Completed

Vertex Manager

Map Vertex

Start Tasks

Vertex State

Machine

Reduce Vertex

Start


Tez theory to practice

Tez – Theory to Practice

  • Performance

  • Scalability


Tez hive tpc ds scale 200gb latency

Tez – Hive TPC-DS Scale 200GB latency


Apache tez accelerating hadoop data processing

Tez – Pig performance gains

  • Demonstrates performance gains from a basic translation to a Tez DAG

  • Deeper integration in the works for further improvements


Tez observations on performance

Tez – Observations on Performance

  • Number of stages in the DAG

    • Higher the number of stages in the DAG, performance of Tez (over MR) will be better.

  • Cluster/queue capacity

    • More congested a queue is, the performance of Tez (over MR) will be better due to container reuse.

  • Size of intermediate output

    • More the size of intermediate output, the performance of Tez (over MR) will be better due to reduced HDFS usage.

  • Size of data in the job

    • For smaller data and more stages, the performance of Tez (over MR) will be better as percentage of launch overhead in the total time is high for smaller jobs.

  • Offload work to the cluster

    • Move as much work as possible to the cluster by modelling it via the job DAG. Exploit the parallelism and resources of the cluster. E.g. MR split calculation.

  • Vertex caching

    • The more re-computation can be avoided the better is the performance.


Tez data at scale

Tez – Data at scale

Hive TPC-DS Scale 10TB


Tez dag definition at scale

Tez – DAG definition at scale

Hive : TPC-DS Query 64 Logical DAG


Tez container reuse at scale

Tez – Container Reuse at Scale

78 vertices + 8374 tasks on 50 containers (TPC-DS Query 4)


Tez real world use cases for the api

Tez – Real World Use Cases for the API


Tez broadcast edge

Tez – Broadcast Edge

SELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantity_on_hand

FROM (select avg(ss_sold_price) as avg_price, ss_item_sk, ss_quantity_sk from store_sales

group by ss_item_sk)ss

JOIN inventory inv

ON (inv.inv_item_sk = ss.ss_item_sk);

Hive : Broadcast Join

Store Sales scan. Group by and aggregation reduce size of this input.

M

M

M

M

M

M

Store Sales scan. Group by and aggregation.

R

R

R

R

Broadcast edge

HDFS

Inventory scan and Join

M

M

M

Inventory and Store Sales (aggr.) output scan and shuffle join.

M

M

M

HDFS

R

R

HDFS


Tez multiple outputs

f = LOAD ‘foo’ AS (x, y, z);

g1 = GROUP f BY y;

g2 = GROUP f BY z;

j = JOIN g1 BY group,

g2 BY group;

Tez – Multiple Outputs

Pig : Split & Group-by

Load foo

Load foo

Split multiplex de-multiplex

Multiple outputs

Group by y Group by z

Group by z

Group by y

HDFS

HDFS

Load g1 and Load g2

Reduce follows

reduce

Join

Join


Tez multiple outputs1

Tez – Multiple Outputs

Hive : Multi-insert queries

FROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk = d_date_sk and d_year = 2000)

INSERT INTO TABLE t1 SELECT distinct ss_item_sk

INSERT INTO TABLE t2 SELECT distinct ss_customer_sk;

M

M

M

Broadcast Join (scan date_dim, join store sales)

Map join date_dim/store sales

HDFS

M

M

M

M

M

M

Materialize join on HDFS

HDFS

Distinct for customer + items

R

R

M

M

M

M

M

M

Two MR jobs to do the distinct

HDFS

R

R

HDFS


Tez one to one edge

l = LOAD ‘left’ AS (x, y);

r = LOAD ‘right’ AS (x, z);

j = JOIN l BY x, r BY x

USING ‘skewed’;

Tez – One to One Edge

Sample L

Load &

Sample

Pig : Skewed Join

Aggregate

Aggregate

Pass through input

via 1-1 edge

Stage sample map

on distributed cache

HDFS

Partition L

Partition R

Join

Broadcast

sample map

Join

Partition L and Partition R


Tez custom edge

Tez – Custom Edge

Hive : Dynamically Partitioned Hash Join

SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand

FROM store_salesss

JOIN inventory inv

ON (inv.inv_item_sk = ss.ss_item_sk);

M

Inventory scan (Runs as single local map task)

Inventory scan (Runs on cluster potentially more than 1 mapper)

M

M

Custom edge (routes outputs of previous stage to the correct Mappers of the next stage)

HDFS

Store Sales scan and Join (Inventory hash table read as side file)

Store Sales scan and Join (Custom vertex reads both inputs – no side file reads)

M

M

M

M

M

M

HDFS

HDFS


Tez bringing it all together

Tez – Bringing it all together

Tez Session populates container pool

Dimension table calculation and HDFS split generation in parallel

Dimension tables broadcasted to Hive MapJoin tasks

Final Reducer pre-launchedand fetches completed inputs

TPCDS – Query-27 with Hive on Tez

Architecting the Future of Big Data


Tez bridging the data spectrum

Tez – Bridging the Data Spectrum

Fact Table

Dimension Table 1

Dimension Table 1

Fact Table

Broadcast

Join

Dimension Table 1

Broadcast join

for small data sets

Result Table 1

Dimension Table 2

Dimension Table 1

Broadcast

Join

Result Table 2

Dimension Table 3

Shuffle

Join

Based on data size, the query optimizer can run either plan as a single Tez job

Typical pattern in a TPC-DS query

Result Table 3


Tez current status

Tez – Current status

  • Apache Project

    • Growing community of contributors and users

    • Latest release is 0.5.0

  • Focus on stability

    • Testing and quality are highest priority

    • Code ready and deployed on multi-node environments at scale

  • Support for a vast topology of DAGs

    • Already functionally equivalent to Map Reduce. Existing Map Reduce jobs can be executed on Tez with few or no changes

    • Rapid adoption by important open source projects and by vendors


Tez developer release

Tez – Developer Release

  • API stability – Intuitive and clean APIs that will be supported and be backwards compatible with future releases

  • Local Mode Execution – Allows the application to run entirely in the same JVM without any need for a cluster. Enables easy debugging using IDE’s on the developer machine

  • Performance Debugging – Adds swim-lanes analysis tool to visualize job execution and help identify performance bottlenecks

  • Documentation and tutorials to learn the APIs


Tez adoption

Tez – Adoption

  • Apache Hive

    • Hadoop standard for declarative access via SQL-like interface (Hive 13)

  • Apache Pig

    • Hadoop standard for procedural scripting and pipeline processing (Pig 14)

  • Cascading

    • Developer friendly Java API and SDK

    • Scalding (Scala API on Cascading) (3.0)

  • Commercial Vendors

    • ETL : Use Tez instead of MR or custom pipelines

    • Analytics Vendors : Use Tez as a target platform for scaling parallel analytical tools to large data-sets


Tez adoption path

Tez – Adoption Path

Pre-requisite : Hadoop 2 with YARN

Tez has zero deployment pain. No side effects or traces left behind on your cluster. Low risk and low effort to try out.

  • Using Hive, Pig, Cascading, Scalding

    • Try them with Tez as execution engine

  • Already have MapReduce based pipeline

    • Use configuration to change MapReduce to run on Tez by setting ‘mapreduce.framework.name’ to ‘yarn-tez’ in mapred-site.xml

    • Consolidate MR workflow into MR-DAG on Tez

    • Change MR-DAG to use more efficient Tez constructs

  • Have custom pipeline

    • Wrap custom code/scripts into Tez inputs-processor-outputs

    • Translate your custom pipeline topology into Tez DAG

    • Change custom topology to use more efficient Tez constructs


Tez roadmap

Tez – Roadmap

  • Richer DAG support

    • Addition of vertices at runtime

    • Shared edges for shared outputs

    • Enhance Input/Output library

  • Performance optimizations

    • Improve support for high concurrency

    • Improve locality aware scheduling

    • Add framework level data statistics

    • HDFS memory storage integration

  • Usability

    • Stability and testability

    • UI integration with YARN

    • Tools for performance analysis and debugging


Tez community

Tez – Community

  • Early adopters and code contributors welcome

    • Adopters to drive more scenarios. Contributors to make them happen.

  • Tez meetupfor developers and users

    • http://www.meetup.com/Apache-Tez-User-Group

  • Technical blog series

    • http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing

  • Useful links

    • Work tracking: https://issues.apache.org/jira/browse/TEZ

    • Code: https://github.com/apache/tez

    •  Developer list: [email protected] list: [email protected] list: [email protected]


Tez takeaways

Tez – Takeaways

  • Distributed execution framework that models processing as dataflow graphs

  • Customizable execution architecture designed for extensibility and user defined performance optimizations

  • Works out of the box with the platform figuring out the hard stuff

  • Zero install – Safe and Easy to Evaluate and Experiment

  • Addresses common needs of a broad spectrum of applications and usage patterns

  • Open source Apache project – your use-cases and code are welcome

  • It works and is already being used by Apache Hive and Pig


Apache tez accelerating hadoop data processing

Tez

Thanks for your time and attention!

Video with Deep Dive on Tez

http://goo.gl/BL67o7

https://hortonworks.webex.com/hortonworks/lsr.php?RCID=b4382d626867fe57de70b218c74c03e4– WordCount code walk through with 0.5 APIs. Debugging in local mode. Execution and profiling in cluster mode. (Starts at ~27 min. Ends at ~37 min)

Questions?

@bikassaha


  • Login