Spark streaming preview
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Spark Streaming Preview PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Spark Streaming Preview. Fault-Tolerant Stream Processing at Scale. Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker , Ion Stoica. UC BERKELEY. Motivation. Many important applications need to process large data streams arriving in real time

Download Presentation

Spark Streaming Preview

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Spark streaming preview

Spark Streaming Preview

Fault-Tolerant Stream Processing at Scale

Matei Zaharia, Tathagata Das,Haoyuan Li, Scott Shenker, Ion Stoica

UC BERKELEY


Motivation

Motivation

  • Many important applications need to process large data streams arriving in real time

    • User activity statistics (e.g. Facebook’s Puma)

    • Spam detection

    • Traffic estimation

    • Network intrusion detection

  • Our target: large-scale apps that need to run on tens-hundreds of nodes with O(1 sec) latency


System goals

System Goals

  • Simple programming interface

  • Automatic fault recovery (including state)

  • Automatic straggler recovery

  • Integration with batch & ad-hoc queries(want one API for all your data analysis)


Traditional streaming systems

Traditional Streaming Systems

  • “Record-at-a-time” processing model

    • Each node has mutable state

    • Event-driven API: for each record, update state and send out new records

mutable state

input records

push

node 1

node 3

input records

node 2


Challenges with traditional systems

Challenges with Traditional Systems

  • Fault tolerance

    • Either replicate the whole system (costly) or use upstream backup (slow to recover)

  • Stragglers (typically not handled)

  • Consistency (few guarantees across nodes)

  • Hard to unify with batch processing


Our model discretized streams

Our Model: “Discretized Streams”

  • Run each streaming computation as a series of very small, deterministic batch jobs

    • E.g. a MapReduce every second to count tweets

  • Keep state in memory across jobs

    • New Spark operators allow “stateful” processing

  • Recover from faults/stragglers in same way as MapReduce (by rerunning tasks in parallel)


Discretized streams in action

Discretized Streams in Action

batch operation

t = 1:

input

immutable dataset(output or state);

stored in memoryas Spark RDD

immutable dataset(stored reliably)

t = 2:

input

stream 2

stream 1


Example view count

Example: View Count

  • Keep a running count of views to each webpage

    views = readStream("http:...", "1s")

    ones = views.map(ev => (ev.url, 1))

    counts = ones.runningReduce(_ + _)

views

ones

counts

t = 1:

map

reduce

t = 2:

. . .

= dataset

= partition


Fault recovery

Fault Recovery

  • Checkpoint state datasets periodically

  • If a node fails/straggles, build its data in parallel on other nodes using dependency graph

map

output dataset

input dataset

Fast recovery without the cost of full replication


How fast can it go

How Fast Can It Go?

  • Currently handles 4 GB/s of data (42 million records/s) on 100 nodes at sub-second latency

  • Recovers from failures/stragglers within 1 sec


Outline

Outline

  • Introduction

  • Programming interface

  • Implementation

  • Early results

  • Future development


D s treams

D-Streams

  • A discretized stream is a sequence of immutable, partitioned datasets

    • Specifically, each dataset is an RDD (resilient distributed dataset), the storage abstraction in Spark

    • Each RDD remembers how it was created, and can recover if any part of the data is lost


D streams

D-Streams

  • D-Streams can be created…

    • either from live streaming data

    • or by transforming other D-streams

  • Programming with D-Streams is very similar to programming with RDDs in Spark


D stream operators

D-Stream Operators

  • Transformations

    • Build new streams from existing streams

    • Include existing Spark operators, which act on each interval in isolation, plus new “stateful” operators

  • Output operators

    • Send data to outside world (save results to external storage, print to screen, etc)


Example 1

Example 1

Count the words received every second

words = readStream("http://...", Seconds(1))

counts = words.count()

D-Streams

transformation

words

counts

time = 0 - 1:

count

= RDD

time = 1 - 2:

count

time = 2 - 3:

count


Spark streaming preview

Demo

  • Setup

    • 10 EC2 m1.xlarge instances

    • Each instance receiving a stream of sentences at rate of 1 MB/s, total 10 MB/s

  • Spark Streaming receives the sentences and processes them


Example 2

Example 2

Count frequency of words received every second

words = readStream("http://...", Seconds(1))

ones = words.map(w => (w, 1))

freqs = ones.reduceByKey(_ + _)

Scala function literal

words

freqs

ones

time = 0 - 1:

map

reduce

time = 1 - 2:

time = 2 - 3:


Spark streaming preview

Demo


Example 3

Example 3

Count frequency of words received in last minute

ones = words.map(w => (w, 1))

freqs = ones.reduceByKey(_ + _)

freqs_60s = freqs.window(Seconds(60), Second(1))

.reduceByKey(_ + _)

sliding window operator

window length

window movement

words

freqs

ones

time = 0 - 1:

freqs_60s

map

reduce

window

reduce

time = 1 - 2:

time = 2 - 3:


Simpler running reduce

Simpler running reduce

freqs= ones.reduceByKey(_ + _)

freqs_60s = freqs.window(Seconds(60), Second(1))

.reduceByKey(_ + _)

freqs= ones.reduceByKeyAndWindow(_ + _,Seconds(60),Seconds(1))


Spark streaming preview

Demo


Incremental window operators

“Incremental” window operators

words

freqs

freqs_60s

t-1

t

words

freqs

freqs_60s

t+1

t-1

t+2

t

t+3

t+1

+

t+4

t+2

t+3

Aggregation function

+

t+4

+

Invertible aggregation function


Smarter running reduce

Smarter running reduce

freqs= ones.reduceByKey(_ + _)

freqs_60s = freqs.window(Seconds(60), Second(1))

.reduceByKey(_ + _)

freqs= ones.reduceByKeyAndWindow(_ + _,Seconds(60),Seconds(1))

freqs = ones.reduceByKeyAndWindow(

_ + _,_ - _,Seconds(60),Seconds(1))


Output operators

Output Operators

  • save: write results to any Hadoop-compatible storage system (e.g. HDFS, HBase)

  • foreachRDD: run a Spark function on each RDD

freqs.save(“hdfs://...”)

words.foreachRDD(wordsRDD => {

// any Spark/scala processing, maybe save to database

})


Live batch interactive

Live + Batch + Interactive

  • Combining D-streams with historical datasets

    pageViews.join(historicCounts).map(...)

  • Interactivequeries on stream state from the Spark interpreter

    pageViews.slice(“21:00”, “21:05”).topK(10)


Outline1

Outline

  • Introduction

  • Programming interface

  • Implementation

  • Early results

  • Future development


System architecture

System Architecture

Built on an optimized version of Spark

Worker

Client

Input receiver

Task execution

Master

Block manager

Client

Replication of input & checkpoint RDDs

D-streamlineage

Worker

Task scheduler

Block tracker

Input receiver

Client

Task execution

Block manager


Implementation

Implementation

Optimizations on current Spark:

  • New block store

    • APIs: Put(key, value, storage level), Get(key)

  • Optimized scheduling for <100ms tasks

    • Bypass Mesos cluster scheduler (tens of ms)

  • Fast NIO communication library

  • Pipelining of jobs from different time intervals


Evaluation

Evaluation

  • Ran on up to 100 “m1.xlarge” machines on EC2

    • 4 cores, 15 GB RAM each

  • Three applications:

    • Grep: count lines matching a pattern

    • Sliding word count

    • Sliding top K words


Scalability

Scalability

Maximum throughput possible with 1s or 2s latency

100-byte records (100K-500K records/s/node)


Performance vs storm and s4

Performance vs Storm and S4

  • Storm limited to 10,000 records/s/node

  • Also tried S4: 7000 records/s/node

  • Commercial systems report 100K aggregated


Fault recovery1

Fault Recovery

  • Recovers from failures within 1 second

Sliding WordCount on 10 nodes with 30s checkpoint interval


Fault recovery2

Fault Recovery

Failures:

Stragglers:


Interactive ad hoc queries

Interactive Ad-Hoc Queries


Outline2

Outline

  • Introduction

  • Programming interface

  • Implementation

  • Early results

  • Future development


Future development

Future Development

  • An alpha of discretized streams will go into Spark by the end of the summer

  • Engine improvements from Spark Streaming project are already there (“dev” branch)

  • Together, make Spark to a powerful platform for both batch and near-real-time analytics


Future development1

Future Development

  • Other things we’re working on/thinking of:

    • Easier deployment options (standalone & YARN)

    • Hadoop-based deployment (run as Hadoop job)?

    • Run Hadoop mappers/reducers on Spark?

    • Java API?

  • Need your feedback to prioritize these!


More details

More Details

  • You can find more about Spark Streaming in our paper: http://tinyurl.com/dstreams


Related work

Related Work

  • Bulk incremental processing (CBP, Comet)

    • Periodic (~5 min) batch jobs on Hadoop/Dryad

    • On-disk, replicated FS for storage instead of RDDs

  • Hadoop Online

    • Does not recover stateful ops or allow multi-stage jobs

  • Streaming databases

    • Record-at-a-time processing, generally replication for FT

  • Approximate query processing, load shedding

    • Do not support the loss of arbitrary nodes

    • Different math because drop rate is known exactly

  • Parallel recovery (MapReduce, GFS, RAMCloud, etc)


Timing considerations

Timing Considerations

  • D-streams group input into intervals based on when records arrive at the system

  • For apps that need to group by an “external” time and tolerate network delays, support:

    • Slack time: delay starting a batch for a short fixed time to give records a chance to arrive

    • Application-level correction: e.g. give a result for time t at time t+1, then use later records to update incrementally at time t+5


  • Login