Spark streaming preview
Download
1 / 40

Spark Streaming Preview - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Spark Streaming Preview. Fault-Tolerant Stream Processing at Scale. Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker , Ion Stoica. UC BERKELEY. Motivation. Many important applications need to process large data streams arriving in real time

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Spark Streaming Preview' - calder


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Spark streaming preview

Spark Streaming Preview

Fault-Tolerant Stream Processing at Scale

Matei Zaharia, Tathagata Das,Haoyuan Li, Scott Shenker, Ion Stoica

UC BERKELEY


Motivation
Motivation

  • Many important applications need to process large data streams arriving in real time

    • User activity statistics (e.g. Facebook’s Puma)

    • Spam detection

    • Traffic estimation

    • Network intrusion detection

  • Our target: large-scale apps that need to run on tens-hundreds of nodes with O(1 sec) latency


System goals
System Goals

  • Simple programming interface

  • Automatic fault recovery (including state)

  • Automatic straggler recovery

  • Integration with batch & ad-hoc queries(want one API for all your data analysis)


Traditional streaming systems
Traditional Streaming Systems

  • “Record-at-a-time” processing model

    • Each node has mutable state

    • Event-driven API: for each record, update state and send out new records

mutable state

input records

push

node 1

node 3

input records

node 2


Challenges with traditional systems
Challenges with Traditional Systems

  • Fault tolerance

    • Either replicate the whole system (costly) or use upstream backup (slow to recover)

  • Stragglers (typically not handled)

  • Consistency (few guarantees across nodes)

  • Hard to unify with batch processing


Our model discretized streams
Our Model: “Discretized Streams”

  • Run each streaming computation as a series of very small, deterministic batch jobs

    • E.g. a MapReduce every second to count tweets

  • Keep state in memory across jobs

    • New Spark operators allow “stateful” processing

  • Recover from faults/stragglers in same way as MapReduce (by rerunning tasks in parallel)


Discretized streams in action
Discretized Streams in Action

batch operation

t = 1:

input

immutable dataset(output or state);

stored in memoryas Spark RDD

immutable dataset(stored reliably)

t = 2:

input

stream 2

stream 1


Example view count
Example: View Count

  • Keep a running count of views to each webpage

    views = readStream("http:...", "1s")

    ones = views.map(ev => (ev.url, 1))

    counts = ones.runningReduce(_ + _)

views

ones

counts

t = 1:

map

reduce

t = 2:

. . .

= dataset

= partition


Fault recovery
Fault Recovery

  • Checkpoint state datasets periodically

  • If a node fails/straggles, build its data in parallel on other nodes using dependency graph

map

output dataset

input dataset

Fast recovery without the cost of full replication


How fast can it go
How Fast Can It Go?

  • Currently handles 4 GB/s of data (42 million records/s) on 100 nodes at sub-second latency

  • Recovers from failures/stragglers within 1 sec


Outline
Outline

  • Introduction

  • Programming interface

  • Implementation

  • Early results

  • Future development


D s treams
D-Streams

  • A discretized stream is a sequence of immutable, partitioned datasets

    • Specifically, each dataset is an RDD (resilient distributed dataset), the storage abstraction in Spark

    • Each RDD remembers how it was created, and can recover if any part of the data is lost


D streams
D-Streams

  • D-Streams can be created…

    • either from live streaming data

    • or by transforming other D-streams

  • Programming with D-Streams is very similar to programming with RDDs in Spark


D stream operators
D-Stream Operators

  • Transformations

    • Build new streams from existing streams

    • Include existing Spark operators, which act on each interval in isolation, plus new “stateful” operators

  • Output operators

    • Send data to outside world (save results to external storage, print to screen, etc)


Example 1
Example 1

Count the words received every second

words = readStream("http://...", Seconds(1))

counts = words.count()

D-Streams

transformation

words

counts

time = 0 - 1:

count

= RDD

time = 1 - 2:

count

time = 2 - 3:

count


Demo

  • Setup

    • 10 EC2 m1.xlarge instances

    • Each instance receiving a stream of sentences at rate of 1 MB/s, total 10 MB/s

  • Spark Streaming receives the sentences and processes them


Example 2
Example 2

Count frequency of words received every second

words = readStream("http://...", Seconds(1))

ones = words.map(w => (w, 1))

freqs = ones.reduceByKey(_ + _)

Scala function literal

words

freqs

ones

time = 0 - 1:

map

reduce

time = 1 - 2:

time = 2 - 3:



Example 3
Example 3

Count frequency of words received in last minute

ones = words.map(w => (w, 1))

freqs = ones.reduceByKey(_ + _)

freqs_60s = freqs.window(Seconds(60), Second(1))

.reduceByKey(_ + _)

sliding window operator

window length

window movement

words

freqs

ones

time = 0 - 1:

freqs_60s

map

reduce

window

reduce

time = 1 - 2:

time = 2 - 3:


Simpler running reduce
Simpler running reduce

freqs= ones.reduceByKey(_ + _)

freqs_60s = freqs.window(Seconds(60), Second(1))

.reduceByKey(_ + _)

freqs= ones.reduceByKeyAndWindow(_ + _,Seconds(60),Seconds(1))



Incremental window operators
“Incremental” window operators

words

freqs

freqs_60s

t-1

t

words

freqs

freqs_60s

t+1

t-1

t+2

t

t+3

t+1

+

t+4

t+2

t+3

Aggregation function

+

t+4

+

Invertible aggregation function


Smarter running reduce
Smarter running reduce

freqs= ones.reduceByKey(_ + _)

freqs_60s = freqs.window(Seconds(60), Second(1))

.reduceByKey(_ + _)

freqs= ones.reduceByKeyAndWindow(_ + _,Seconds(60),Seconds(1))

freqs = ones.reduceByKeyAndWindow(

_ + _,_ - _,Seconds(60),Seconds(1))


Output operators
Output Operators

  • save: write results to any Hadoop-compatible storage system (e.g. HDFS, HBase)

  • foreachRDD: run a Spark function on each RDD

freqs.save(“hdfs://...”)

words.foreachRDD(wordsRDD => {

// any Spark/scala processing, maybe save to database

})


Live batch interactive
Live + Batch + Interactive

  • Combining D-streams with historical datasets

    pageViews.join(historicCounts).map(...)

  • Interactivequeries on stream state from the Spark interpreter

    pageViews.slice(“21:00”, “21:05”).topK(10)


Outline1
Outline

  • Introduction

  • Programming interface

  • Implementation

  • Early results

  • Future development


System architecture
System Architecture

Built on an optimized version of Spark

Worker

Client

Input receiver

Task execution

Master

Block manager

Client

Replication of input & checkpoint RDDs

D-streamlineage

Worker

Task scheduler

Block tracker

Input receiver

Client

Task execution

Block manager


Implementation
Implementation

Optimizations on current Spark:

  • New block store

    • APIs: Put(key, value, storage level), Get(key)

  • Optimized scheduling for <100ms tasks

    • Bypass Mesos cluster scheduler (tens of ms)

  • Fast NIO communication library

  • Pipelining of jobs from different time intervals


Evaluation
Evaluation

  • Ran on up to 100 “m1.xlarge” machines on EC2

    • 4 cores, 15 GB RAM each

  • Three applications:

    • Grep: count lines matching a pattern

    • Sliding word count

    • Sliding top K words


Scalability
Scalability

Maximum throughput possible with 1s or 2s latency

100-byte records (100K-500K records/s/node)


Performance vs storm and s4
Performance vs Storm and S4

  • Storm limited to 10,000 records/s/node

  • Also tried S4: 7000 records/s/node

  • Commercial systems report 100K aggregated


Fault recovery1
Fault Recovery

  • Recovers from failures within 1 second

Sliding WordCount on 10 nodes with 30s checkpoint interval


Fault recovery2
Fault Recovery

Failures:

Stragglers:



Outline2
Outline

  • Introduction

  • Programming interface

  • Implementation

  • Early results

  • Future development


Future development
Future Development

  • An alpha of discretized streams will go into Spark by the end of the summer

  • Engine improvements from Spark Streaming project are already there (“dev” branch)

  • Together, make Spark to a powerful platform for both batch and near-real-time analytics


Future development1
Future Development

  • Other things we’re working on/thinking of:

    • Easier deployment options (standalone & YARN)

    • Hadoop-based deployment (run as Hadoop job)?

    • Run Hadoop mappers/reducers on Spark?

    • Java API?

  • Need your feedback to prioritize these!


More details
More Details

  • You can find more about Spark Streaming in our paper: http://tinyurl.com/dstreams


Related work
Related Work

  • Bulk incremental processing (CBP, Comet)

    • Periodic (~5 min) batch jobs on Hadoop/Dryad

    • On-disk, replicated FS for storage instead of RDDs

  • Hadoop Online

    • Does not recover stateful ops or allow multi-stage jobs

  • Streaming databases

    • Record-at-a-time processing, generally replication for FT

  • Approximate query processing, load shedding

    • Do not support the loss of arbitrary nodes

    • Different math because drop rate is known exactly

  • Parallel recovery (MapReduce, GFS, RAMCloud, etc)


Timing considerations
Timing Considerations

  • D-streams group input into intervals based on when records arrive at the system

  • For apps that need to group by an “external” time and tolerate network delays, support:

    • Slack time: delay starting a batch for a short fixed time to give records a chance to arrive

    • Application-level correction: e.g. give a result for time t at time t+1, then use later records to update incrementally at time t+5


ad