1 / 22

Transparent and Flexible Network Management for Big Data Processing in the Cloud

Transparent and Flexible Network Management for Big Data Processing in the Cloud. Cristian Lumezanu Yueping Zhang Vishal Singh Guofei Jiang. Anupam Das Curtis Yu. Data processing. Network. Schedule computation. Schedule communication. 33% of average job running time.

sierra
Download Presentation

Transparent and Flexible Network Management for Big Data Processing in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transparent and Flexible Network Management for Big Data Processingin the Cloud Cristian Lumezanu Yueping Zhang Vishal Singh Guofei Jiang Anupam Das Curtis Yu

  2. Data processing Network

  3. Schedule computation

  4. Schedule communication 33% of average job running time

  5. FlowCombnetwork management framework for Big Data processing 2. which path to choose? 3. how to change the path?  1. what is the traffic demand?

  6. Demand prediction Use application semantics information to effectivelyand transparentlyinfer network transfers (possibly before they start)

  7. Demand prediction Agents on Hadoop nodes analyze Hadooplogs, query nodes and predict data transfers. Parses JobTracker logs to identify finished mappers Agent Parses TaskTracker logs to identify reducers and size of map output Hadoop node

  8. Flow scheduling Reroute flows on paths with sufficient available bandwidth

  9. Flow scheduling Where? Centralized decision engine Which flows? FIFO Reroute? If congestion on default path Which path? First with available bandwidth

  10. Flow control Use OpenFlowto install new forwarding rules in the network and enforce the new paths

  11. Install routing rules 5 System Architecture OpenFlow Controller PFS PFS PFS PFS PFS PFS Hadoop Cluster 4 Set up flow paths Master Slaves FlowComb Middleware 1 Analyze Hadoop logs 3 Schedule upcoming flows FlowComb agent 2 Extract flow information NEC Confidential

  12. Experiments

  13. Does the network matter? 4 times slower !!!

  14. Can FlowComb predict transfers? 28% of transfers detected before they start (and 56% before they end)

  15. How quickly can FlowComb change paths? 10% 70% 20% 60% beforetransfer midpoint

  16. Can FlowComb reduce processing time? 36% fasterthan Hadoop without FlowComb (and 28% faster than Hadoop with ECMP)

  17. FlowComb Network management platform for Big Data processing that is transparent to applications and quickand accurate in detecting their demand uses application semanticsto detect data transfers(sometimes before they even start)

  18. Testbed

  19. OpenFlow network Controller

  20. Hadoop sort performance baseline Avg utilization (MBps) FlowComb Time (s)

More Related