1 / 16

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit. Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University, Japan) Koji YAMAMOTO (Renesas Design Corporation, Japan) Yasuto KURODA, Kazunari INOUE (Renesas Electronics Corporation, Japan). Outline.

raisie
Download Presentation

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Implementation ofFast Forwarding Engine usingStandard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University, Japan) Koji YAMAMOTO (Renesas Design Corporation, Japan) Yasuto KURODA, Kazunari INOUE (Renesas Electronics Corporation, Japan)

  2. Outline • Background • Objective • Proposed hardware architecture • Hardware architecture evaluation • FPGA implementation • Hardware evaluation • Conclusion

  3. What is TCAM? • TCAM = Ternary Content Addressable Memory • Feature • Very high speed searching • Input data for matching, output memory address • 3rd matching state of “don’t care” in addition 1s and 0s • Application • Looking up the routing table in IP routers Input 192.168.101.1 Output 3 Routing table

  4. TCAM problems • Manufacturing cost • $/bit is 4 times more expensive than SRAM. • Power consumption • All logical gates must be energized for every search. • Capacity • Expensive price-per-bit-ratio and power-saving activities • Hard to pursue denser TCAM

  5. Objective • Propose a new hardware architecture • Focus on the address lookup in the routing table of routers • RAM-based design • Named “Custom Memory” • Hardware design of the Custom Memory • Verify the effectiveness of the Custom Memory • Effectiveness of our architecture • Dramatically reduce its cost and power consumption • Implementation to the FPGA

  6. Design concepts • Divide the memory area into equal-sized tables • Low power • RAM-based design • Low cost, low power, high capacity • Lookup operation by single access • High search performance • Same physical user interface as TCAM • Aim to replace the TCAM in the market

  7. Architectural overview Divide into subtables RAM based design Custom Memory Command RAM Search device #0 Address Table #0 Table #1 Search device #1 IP addr. ・・・ Table # -1 ・・・ Comparator Prefix Search device #N Same physical user interface as TCAM

  8. Search device partitioning • How to decide a device to store? Partitioning based on prefix length Example 6.0.0.0/8 24.128.0.0/9 62.30.0.0/16 112.63.240/20 184.128.191.0/24 232.95.225.1/32 Search device #0 (prefix length 8) Search device #1 (prefix length 9) ・・・ Search device #N (prefix length 32)

  9. Table partitioning • How to decide a table to store? • bits in prefix are extracted for “index bits”. • Remainder bits are stored. • How to determine the index bits? Extract last bits from prefix Example ( =8) Search device (prefix length 16) RAM # 0 154.1.0.0/16 →10011010.00000001 →10011010.00000001 01011000 01101011 00110111 ・・・ # 1 # 1 01011000 empty ・・・ 10011010 ・・・ ・・・ ・・・ # -2 empty empty ・・・ Index bits Remainder bits # -1 01001111 empty ・・・

  10. Search operation Custom Memory Table # Search Command Destination IP Address Index calculator Search device (prefix length 8) Search device (prefix length 9) Destination IP Address RAM Table #0 LPM comparator Input-output controller Table #1 ・・・ ・・・ Table # -1 Search device (prefix length 32) ・・・ Comparator Hit address Hit address

  11. Evaluation of partitioning • Which bits are better to use as index bits? • Distribution of table is affected to the cost. • Evaluation metric • Maximum number of prefixes in the table Extract last bits from prefix RAM ・・・ word lines ・・・ # of prefixes in table ・・・ ・・・ ・・・ ・・・ ・・・ Comp. Comp. comparators Table # 11

  12. Effectiveness of indexing • Top k bits: using the top bits for index bits • proposal: using the last bits for index bits • bottom: ideal value (unrealizable) Max # of prefixes in table ( ) Prefix length

  13. FPGA implementation • ALTRA Stratix IV GX FPGA Development Kit • Verilog-HDL • Parameters • 4 search devices • 256 tables/device • 128 prefixes/table Search device #0 RAM Table #0 Search device #1 Table #1 ・・・ 128 prefixes Table # 255 Search device #2 Comparator Search device #3

  14. Hardware evaluation (4k bits Array, Vdd=1.0V, Room Temp. 125Msps) Custom Memory TCAM RAM RAM ・・・ RAM RAM ・・・ RAM RAM word lines ・・・ Comp. Comp. ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ RAM RAM ・・・ ・・・ ・・・ ・・・ RAM ・・・ RAM Comp. ・・・ Comp. Comp. ・・・ Comp. comparators Operation area

  15. FPGA experiment • Examine the hardware operation • Use a raw data (BGP routing table)

  16. Conclusion • Design RAM-based fast forwarding engine • Hardware architecture • FPGA implementation • Reduce the costs and power • 62% cost (compare with TCAM) • 52% power consumption (compare with TCAM) • Future work • Implementation parameter optimization • Handling of the table overflow

More Related