1 / 18

A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification

A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification. Author: Weirong Jiang, Viktor K. Prasanna Publisher: 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors Presenter: Chin-Chung Pan Date: 2009/12/30. Outline.

ayita
Download Presentation

A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification Author: Weirong Jiang, Viktor K. Prasanna Publisher: 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors Presenter: Chin-Chung Pan Date:2009/12/30

  2. Outline • Introduction • Architecture and Algorithms • Motivations • Architecture Overview • Quadtree Search on Single Fields • Partitioning Algorithm • Performance Evaluation • Algorithm Evaluation • Implementation Results

  3. Introduction • Most of those algorithms fall into three categories: • decision-tree-based (e.g. HyperCuts) • decomposition-based (e.g. BV, cross-producting) • partitioning-based (partition the original rule set into multiple subsets) • Based on the idea of the Independent Sets, we propose a coarse-grained independent sets algorithm toreduce the number of partitions at the cost of increasing the number of linear search. Such extra cost is alleviated by pipelining the search process in hardware.

  4. Coarse-Grained Independent Sets • The original Independent Sets algorithm requires all the rules within an independent set must be mutually disjoint on the same field. • We propose a coarse-grained independent sets algorithm to reduce the number of independent sets effectively. • B is a design-time parameter controlling the granularity of the independent sets.

  5. Motivations(1/2)

  6. Motivations(2/2) • B is a design-time parameter controlling the granularity of the independent sets.

  7. Architecture Overview(1/5)

  8. Architecture Overview(2/5) • Each single-field search returns the information associated with the primitive range that matches the value of the corresponding field of the input packet. • The outputs of the first stage include all information needed by the second stage. The search result from each field contains. • the IDs of the tables to look up. • the indices that are used for table lookup.

  9. Architecture Overview(3/5) • For instance, an input packet with SA = 10001000 and DA = 01110111 will match the primitive ranges SA 011 and DA 010 on SA and DA fields, respectively.

  10. Architecture Overview(4/5) • The information associated with SA 011 will include two sets of {table ID, index} tuples: {00, 01} and {10, null}. This is because SA 011 is within the “01”th independent interval of the “00”th coarse-grained independent set, as well as within the only primitive range on the SA field of the cross-product table.

  11. Architecture Overview(5/5) • Similarly, the information associated with DA 010 is: {01, 00} and {10, 01}, since DA 010 is within the “00”th independent interval of the “01”th coarse-grained independent set as well as within the “01”th primitive range on the DA field of the cross-product table.

  12. Quadtree Search on Single Fields

  13. Partitioning Algorithm(1/2) R5 R6 R1 R5 R6 101 R10 R4 R10 100 011 DA R3 R9 R2 010 001 R7 R8 R7 R8 000 000 001 010 011 100 101 SA Prev Prev Curr Prev Curr Curr SA

  14. Partitioning Algorithm(2/2) R1 101 R4 R4 100 011 DA DA R2 R9 R3 R9 R2 001 000 000 001 010 SA

  15. Algorithm Evaluation(1/2)

  16. Algorithm Evaluation(2/2)

  17. Implementation Results(1/2) • We implemented our design (P = 4,B = 2) that supported the large rule set ACL_10K using Xilinx ISE 10.1 development tools.

  18. Implementation Results(2/2) • Post place and route results show that the design sustains 90 Gbps throughput for minimum size (40 bytes) packets, which is more than twice the current backbone network link rate.

More Related