1 / 24

A Methodology for Evaluating Runtime Support in Network Processors

This paper presents a methodology for evaluating the runtime support in network processors, including multi-core systems-on-chip, programmability, high packet processing rate, control processors, co-processors, memory hierarchy, and interconnection. The evaluation methodology is based on traffic representation and an analytical system model. Results are provided for three example runtime support systems: ideal allocation, full processor allocation, and partitioned application allocation.

glynne
Download Presentation

A Methodology for Evaluating Runtime Support in Network Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Methodology for Evaluating Runtime Support in Network Processors University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu

  2. Runtime Support in Network Processor • Network processor (NP) • Multi-core system-on-chip • Programmability & high packet processing rate • Heterogeneous resources • Control processors • Multiple packet processors • Co-processors • Memory hierarchy • Interconnection • Runtime support • Dynamic task allocation IXP 2800

  3. General Operation of Runtime Support in NP • Input • Hardware resources • Workload • Mapping method • Output • Task allocation • Dynamic adaptation • Different runtime support systems • Difficult to compare AP3 AP2 AP2 AP3 AP3 AP1

  4. Contributions • Evaluation methodology • Traffic representation • Analytical system model based on queuing networks • Results • Specific: 3 example runtime support system • Ideal Allocation • Full Processor Allocation • R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003 • Partitioned Application Allocation • T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005

  5. Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary

  6. Workload • NP workload is characterized by applications and traffic • How to represent workload?

  7. Dynamic Workload Model • Workload graph: • Application/Task: T • Traffic: • Processing requirement: • Example: • Processing requirement: • R. Ramaswamy and T. Wolf. PacketBench: A tool for workload characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003

  8. Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary

  9. Runtime System Model • Unified approach for all runtime systems • Queuing networks • Specific solution for each runtime system • Runtime mapping: • Graph: • Packet arrival rate: • Service time: • Metrics for all runtime systems • Processor utilization: • Average number of packets in the system:

  10. Three Example Runtime Support Systems • System I: Ideal Allocation • System II: Full Processor Allocation • System III: Partitioned Application Allocation

  11. Example Evaluation Model – System I • Ideal Allocation • All processors can process all packets completely • Unrealistic, but can provide baseline M/G/m FCFS single station

  12. M/G/m Single Station Queuing System • Cosmetatos approximation • Evaluation metrics G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976 G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998

  13. Example Evaluation Model – System II • Full Processor Allocation • Allocate entire tasks to subsets of processors • Allocate as few processors as possible to save power • One processor run one type of task • Reallocation is triggered by queue length BCMP M/M/1-FCFS model (Jackson network)

  14. BCMP Network • BCMP: Basket, Chandy, Muntz, and Palacios • Characteristics: Open, closed, and mixed queuing network; Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR • Product-form steady-state solution: • Open M/M/1-FCFS BCMP Queuing Network: • Evaluation metrics: F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975

  15. Example Evaluation Model – System III • Partitioned Application Allocation • Tasks be partitioned across multiple processors • Synchronized pipelines • Allocate tasks equally across all processors to maximize throughput • Reallocate at fixed time intervals Equations for evaluation metrics are the same as System II. BCMP M/M/1-FCFS model (Jackson network)

  16. Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary

  17. Setup • System • 16 100MIPS processing engines • Queue lengths are infinite • Workload • Other assumptions • Partition applications into 7-15 subtasks

  18. Processor Allocation Over Time • Ideal: • 16 processors • Full Processor: • Change with traffic • Partitioned Application: • 16 processors Full processor allocation system

  19. Processor Utilization Over Time • Ideal: • Lowest processor utilization • Full Processor: • Highest processor utilization because using fewer number of processors • Partitioned Application: • Low processor utilization • Not equal to ideal case due to the unbalanced task allocation and pipeline overhead

  20. Packets in System Over Time • Ideal: • Least number of packets • Full Processor: • Packets queued up due to its high processor utilization • Partitioned Application: • Most number of packets due to unbalanced task allocation and pipeline overhead • More stable performance because of finer processor allocation granularity

  21. Performance for Different Data Rates • Ideal: • Smooth increase • Full Processor: • Periodical peak • Partitioned Application: • Smooth increase • The maximum data rate supported by the systems • Ideal: 100% • Full Processor: 79.6% • Partitioned application: 75.1%

  22. Implication of the Results • Ideal Allocation • Provide a base line • Full Processor Allocation • Allocate as few processors as possible to save power • Use entire processor as the allocation granularity • Good: High processor utilization • Bad: High performance variance • Partitioned Application Allocation • Equally distribute tasks on all the processors • Finer processor allocation granularity • Good: Stable performance • Bad: Difficult to get optimized solution => pipeline synchronization overhead

  23. Summary • Analytical methodology for evaluating different runtime support NP systems • Dynamic workload model and runtime system model • Results: 3 example runtime support systems • Quantitative metrics • Tradeoffs

  24. Questions ?

More Related