dempsy stream based big data applied to traffic
Download
Skip this Video
Download Presentation
Dempsy - Stream Based “Big Data” Applied to Traffic

Loading in 2 Seconds...

play fullscreen
1 / 25

Dempsy - Stream Based “Big Data” Applied to Traffic - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Dempsy - Stream Based “Big Data” Applied to Traffic. Television. Radio. Traffic.com Sensor Network. Internet. DOT Sensor / Flow Data. Wireless. Incident and Event Data. Historic Data. In-Vehicle. Probe Data. Collection. Fusion. Dissemination. Traffic End-to-End. Data Fusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Dempsy - Stream Based “Big Data” Applied to Traffic' - penney


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
traffic end to end

Television

Radio

Traffic.com Sensor Network

Internet

DOT Sensor / Flow Data

Wireless

Incident and Event Data

Historic Data

In-Vehicle

Probe Data

Collection

Fusion

Dissemination

Traffic End-to-End

Data

Fusion

overview of arterial model
Overview of Arterial Model

Probe Data

  • Map Matcher

Matches the probe data to road network in real time with associated probabilities.

  • Path Analysis

Routes between pairs of probable matches across chains and applies a Hidden Markov Model to the results to determine the most likely path through a set of points.

  • Travel Time Allocation

Assigns the path travel times to the appropriate arterial segments.

  • Arterial Model

Combines expected values with the allocated travel times and previous estimates into the current estimate

Map

Matcher

Path Analysis

Travel Time Allocation

Arterial Travel Times

Arterial Traffic Data

width of the road
Width of the Road
  • Center the normal distribution over the probe reported location
  • Compute the distance from the peak of distribution to the edges of the road.
    • It is possible to estimate road width from the number of lanes
  • Integral of the normal distribution gives the probability of the probe being on that road.
technology survey
Technology Survey
  • Streams Processing Engines
  • Hadoop / Map Reduce
  • “Distributed Actors Model”
technology survey1
Technology Survey
  • Streams processing engines
    • Oracle, IBM, SQLStream
    • Not a good fit. More for relational data processing.
  • Hadoop Map Reduce
    • Not a good fit for low latency computations (15 to 30 minutes per batch)
    • Hbase Co-processors are a possibility but more of a hack
  • Actors Model
    • S4, Akka, Storm
    • Just what we need
dempsy distributed elastic message processing system
Dempsy – Distributed Elastic Message Processing System
  • POJO based Actors programming abstraction eliminates synchronization bugs
  • Framework handles messaging and distribution
  • Fine grained partitioning of work
  • Elastic
  • Fault tolerant
dempsy distributed elastic message processing system1
Dempsy – Distributed Elastic Message Processing System
  • Separation of concerns – scale agnostic apps versus scale aware platform
  • Support code quality goals (guidelines, reuse, design patterns, etc)
  • Functional programming (-like)
  • Map Reduce (-like)
  • Distributed Actors Model (-like)
dempsy
Dempsy

MP Container Cluster

MP Container Cluster

ZooKeeper

ZooKeeper

MP Container

MP Container

Distributor

MP Container

MP Container

system characteristics devops
System Characteristics - DevOps
  • Manage every node and every process in exactly the same way.  E.g. arterial, path analyzer, map matcher look the same to an operations person.
  • Everything runs on exactly the same hardware
  • Scale elastically.  To increase throughput, just add a machine to the cluster – no extra work required.  The system can even be automatically scaled as load increases.
  • Robust failure handling – no real-time manual intervention required when nodes fail.
  • Development, QA and Integration teams can use a pool of resources rather than dedicated resources.  The pool can grow elastically as required by overlapping project schedules
map matching and path analysis as an example
Map Matching and Path Analysis as an Example
  • Algorithm decomposition
    • Discrete Business Logic Components
      • Map Matching
      • Vehicle Accumulation
      • Path Analysis (currently A* routing)
    • MP Addressing
      • Tile based addressing
      • Addressing by vehicle id
      • Tile based addressing
dempsy arterial model example
Dempsy – Arterial Model Example

MapMatch

MP

Vehicle Accumulator

MP

PathAnalyzer

MP

TravelTime

MP

TrafficState

MP

Adaptor

OLTP

x 1

Key: tile

x 40k

Key: probeId

x 10M

Key: tile

x 40k

Key: tile

x 40k

Key: segment Id

x 2M

Traffic

Reporter

MapMatcher

Singleton

PathAnalyzer

Singleton

TravelTime

Singleton

TrafficState

Singleton

X 9

Every 60 seconds

x 50

x 50

x 50

x 50

Linkset

Astar

Graph

Traffic History

Segment Table

Extract

Analytics

Distributed Log Collection

Quality & Audit Logs

App Logs

Distributed File Storage

dempsy testing and analysis
Dempsy Testing and Analysis
  • Decomposed Arterial (MegaVM) into Dempsy Message processors
  • Implemented first two stages of Arterial, Map Match and Path Analysis
  • Implemented Message Processors as trivial POJOs around existing mapmatch and path analysis libraries
  • Wrapped into a Dempsy Application
  • Front ended with Dempsy Adaptor to read probe data from files and inject them into Dempsy
  • Deployed to Amazon EC2 to prove out scaling, collect performance data, and analyze behavior of system under load
  • Three main rounds of testing
    • Original HornetQ Transport (Sprint 6.2 )
    • Lighter weight TCP/Socket Based Transport (6.3 Sprint)
    • More finely grained Message Keys (6.3 Sprint)
distributed map match path analyzer testing
Distributed Map Match /Path Analyzer Testing
  • Ran multiple tests on EC2 with increasing number of Dempsy Nodes
    • Scaled Map Match in Parallel
    • Used a constant number of Probe Readers, empirically set at 3
development life cycle
Development Life Cycle
  • Write Message Processor (MP) prototypes
  • Configuration using the Dependency Injection container of your choice (currently supports Spring).
  • Develop using one node or pseudo distributed mode
  • No messaging code to write
  • No queues
  • No synchronization
  • Narrow scope of concern – each processing element deals with only a limited set of data. There may be millions of processing elements.
  • Simple debugging and unit testing
trade offs
Trade-offs
  • There’s no free lunch
    • Sacrifice guaranteed delivery, message ordering, message uniqueness
    • Gain response time
    • Gain simple clustering
    • Gain memory efficiency (no queuing)
    • Gain lower latency under load
  • Where does this work
    • Statistically based analytics
    • Techniques where sacrificing input data quantity results in low output quality
  • Where doesn’t this work
    • Transaction based systems
    • Techniques where a message results in ‘false’ results (e.g. bank transactions)
slide25

Dempsy – Mp Lifecycle diagram

Start

Message Processor Prototype

Startup

Start

Message Processor

Construct

@Start

Proposed Addition

Proposed Addition

Future Addition

message

Prototype

Ready

clone()

finalize

explicit

instantiation

Activate

jvmgc

No Activate

@Activate

@Passivate

Elasticity

jvmgc

Passivate

message

Ready

no eviction

eviction

complete

@Evictable

scheduled

evict

check

@MessageHandler

complete

output

scheduled

output

@Output

ad