1 / 25

Dempsy - Stream Based “Big Data” Applied to Traffic

Dempsy - Stream Based “Big Data” Applied to Traffic. Television. Radio. Traffic.com Sensor Network. Internet. DOT Sensor / Flow Data. Wireless. Incident and Event Data. Historic Data. In-Vehicle. Probe Data. Collection. Fusion. Dissemination. Traffic End-to-End. Data Fusion.

penney
Download Presentation

Dempsy - Stream Based “Big Data” Applied to Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dempsy-Stream Based “Big Data” Applied to Traffic

  2. Television Radio Traffic.com Sensor Network Internet DOT Sensor / Flow Data Wireless Incident and Event Data Historic Data In-Vehicle Probe Data Collection Fusion Dissemination Traffic End-to-End Data Fusion

  3. Sensor Data Collection

  4. Probe Data Collection

  5. Overview of Arterial Model Probe Data • Map Matcher Matches the probe data to road network in real time with associated probabilities. • Path Analysis Routes between pairs of probable matches across chains and applies a Hidden Markov Model to the results to determine the most likely path through a set of points. • Travel Time Allocation Assigns the path travel times to the appropriate arterial segments. • Arterial Model Combines expected values with the allocated travel times and previous estimates into the current estimate Map Matcher Path Analysis Travel Time Allocation Arterial Travel Times Arterial Traffic Data

  6. Width of the Road • Center the normal distribution over the probe reported location • Compute the distance from the peak of distribution to the edges of the road. • It is possible to estimate road width from the number of lanes • Integral of the normal distribution gives the probability of the probe being on that road.

  7. Real Life Examples

  8. Technology Survey • Streams Processing Engines • Hadoop / Map Reduce • “Distributed Actors Model”

  9. Technology Survey • Streams processing engines • Oracle, IBM, SQLStream • Not a good fit. More for relational data processing. • Hadoop Map Reduce • Not a good fit for low latency computations (15 to 30 minutes per batch) • Hbase Co-processors are a possibility but more of a hack • Actors Model • S4, Akka, Storm • Just what we need

  10. Dempsy – Distributed Elastic Message Processing System • POJO based Actors programming abstraction eliminates synchronization bugs • Framework handles messaging and distribution • Fine grained partitioning of work • Elastic • Fault tolerant

  11. Dempsy – Distributed Elastic Message Processing System • Separation of concerns – scale agnostic apps versus scale aware platform • Support code quality goals (guidelines, reuse, design patterns, etc) • Functional programming (-like) • Map Reduce (-like) • Distributed Actors Model (-like)

  12. Dempsy MP Container Cluster MP Container Cluster ZooKeeper ZooKeeper MP Container MP Container Distributor MP Container MP Container

  13. System Characteristics - DevOps • Manage every node and every process in exactly the same way.  E.g. arterial, path analyzer, map matcher look the same to an operations person. • Everything runs on exactly the same hardware • Scale elastically.  To increase throughput, just add a machine to the cluster – no extra work required.  The system can even be automatically scaled as load increases. • Robust failure handling – no real-time manual intervention required when nodes fail. • Development, QA and Integration teams can use a pool of resources rather than dedicated resources.  The pool can grow elastically as required by overlapping project schedules

  14. Example – Traffic Processing

  15. Map Matching and Path Analysis as an Example • Algorithm decomposition • Discrete Business Logic Components • Map Matching • Vehicle Accumulation • Path Analysis (currently A* routing) • MP Addressing • Tile based addressing • Addressing by vehicle id • Tile based addressing

  16. Dempsy – Arterial Model Example MapMatch MP Vehicle Accumulator MP PathAnalyzer MP TravelTime MP TrafficState MP Adaptor OLTP x 1 Key: tile x 40k Key: probeId x 10M Key: tile x 40k Key: tile x 40k Key: segment Id x 2M Traffic Reporter MapMatcher Singleton PathAnalyzer Singleton TravelTime Singleton TrafficState Singleton X 9 Every 60 seconds x 50 x 50 x 50 x 50 Linkset Astar Graph Traffic History Segment Table Extract Analytics Distributed Log Collection Quality & Audit Logs App Logs Distributed File Storage

  17. Dempsy Proof Of Concept Results

  18. Dempsy Testing and Analysis • Decomposed Arterial (MegaVM) into Dempsy Message processors • Implemented first two stages of Arterial, Map Match and Path Analysis • Implemented Message Processors as trivial POJOs around existing mapmatch and path analysis libraries • Wrapped into a Dempsy Application • Front ended with Dempsy Adaptor to read probe data from files and inject them into Dempsy • Deployed to Amazon EC2 to prove out scaling, collect performance data, and analyze behavior of system under load • Three main rounds of testing • Original HornetQ Transport (Sprint 6.2 ) • Lighter weight TCP/Socket Based Transport (6.3 Sprint) • More finely grained Message Keys (6.3 Sprint)

  19. Distributed Map Match /Path Analyzer Testing • Ran multiple tests on EC2 with increasing number of Dempsy Nodes • Scaled Map Match in Parallel • Used a constant number of Probe Readers, empirically set at 3

  20. Test 1: HornetQ Transport

  21. Test 2: TCP Transport

  22. Test 3: TCP w/ Small Tiles Transport

  23. Development Life Cycle • Write Message Processor (MP) prototypes • Configuration using the Dependency Injection container of your choice (currently supports Spring). • Develop using one node or pseudo distributed mode • No messaging code to write • No queues • No synchronization • Narrow scope of concern – each processing element deals with only a limited set of data. There may be millions of processing elements. • Simple debugging and unit testing

  24. Trade-offs • There’s no free lunch • Sacrifice guaranteed delivery, message ordering, message uniqueness • Gain response time • Gain simple clustering • Gain memory efficiency (no queuing) • Gain lower latency under load • Where does this work • Statistically based analytics • Techniques where sacrificing input data quantity results in low output quality • Where doesn’t this work • Transaction based systems • Techniques where a message results in ‘false’ results (e.g. bank transactions)

  25. Dempsy – Mp Lifecycle diagram Start Message Processor Prototype Startup Start Message Processor Construct @Start Proposed Addition Proposed Addition Future Addition message Prototype Ready clone() finalize explicit instantiation Activate jvmgc No Activate @Activate @Passivate Elasticity jvmgc Passivate message Ready no eviction eviction complete @Evictable scheduled evict check @MessageHandler complete output scheduled output @Output

More Related