1 / 14

Programming Your Network at Run-time for Big Data Applications

Programming Your Network at Run-time for Big Data Applications. 張晏誌 0056092 指導老師:王國禎 教授. Outline. Introduction Integrated Network Control Architecture Network Configuration for Hadoop Jobs Implementation and Overheads Discussion and Future Work Reference. Introduction.

hallam
Download Presentation

Programming Your Network at Run-time for Big Data Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Your Network at Run-timefor Big Data Applications 張晏誌 0056092 指導老師:王國禎 教授

  2. Outline • Introduction • Integrated Network Control Architecture • Network Configuration for Hadoop Jobs • Implementation and Overheads • Discussion and Future Work • Reference

  3. Introduction • Two trends in data center applications and network architecture present a new opportunity to leverage the SDN for truly application-aware networking. • growing of big data applications • network architectures that leverage optical switches with low cabling complexity and energy consumption

  4. Introduction • Several challenges • WAN traffic engineering • Cloud network provisioning • Run-time network configurationfor big data jobs

  5. Integrated Network Control Architecture • System Architecture

  6. Integrated Network Control Architecture • Traffic Pattern of Big Data Applications • Bulk transfer • Latency sensitive control messages • The control traffic is typically low data rate. • In this architecture, control messages are always sent over the packet switched network using default routes that direct traffic over the Ethernet. • Data aggregation/partitioning

  7. Integrated Network Control Architecture • The Advantage of Application Awareness • Current approaches for allocating optical circuits in data centers rely on network level statistics to estimate the traffic demand matrix in the data center. • Without a true application-level view of traffic demands and dependencies, circuit utilization and application performance can be poor.

  8. Integrated Network Control Architecture • Without accurate information about application demand, optical circuits may be configured between the wrong locations, or circuit flapping may occur from repeated corrections. • It could cause blocking among interdependent applications and poor application performance.

  9. Network Configuration for Hadoop Jobs • Topology and Routing for Aggregation Patterns • Single aggregation pattern • To reduce the traffic sending over multi-hop optical paths, we want to place racks with higher traffic demand closer to the aggregator in the tree.

  10. Network Configuration for Hadoop Jobs • Data shuffling pattern • N-to-M Shuffling pattern • Recently proposed server-based data center network architectures, such as BCube and CamCube, leverage Hypercube and Torus topologies originally developed in the HPC community to build network with high path redundancy.

  11. Network Configuration for Hadoop Jobs • Partially overlapping aggregrations • In the general case, aggregation patterns may have partially overlapping sources and aggregators. For these patterns, the traffic demand among racks could be sparse. If we build a big Torus network among these racks, many optical links may not be highly utilized.

  12. Implementation and Overheads • Commercially available 10Gbps OpenFlow switches can install more than 700 new rules in a second depending on the load on the switch and how many rules are batched together. • Recent analysis of large production data center traces shows that most MapReduce jobs last for tens of seconds or longer, and many data intensive jobs run for hours.

  13. Discussion and Future Work • Fairness, priority and fault tolerance • In the integrated system, the failure handling mechanisms can remain untouched with application managers and the SDN controller handling failures at different levels. • Traffic engineering for big data applications • Accurate traffic demand and structural pattern from applications can allow SDN controller to split or re-route management and data flows on different routes

  14. Reference • Guohui Wang, T.S. Eugene Ng, AneesShaikh: “Programming Your Netowrk at Run-time for Big Data Applications”, In ACM SIGCOMM, August 2012.

More Related