1 / 35

CprE / ComS 583 Reconfigurable Computing

CprE / ComS 583 Reconfigurable Computing. Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Applications II. Recap – Splash 1 Architecture. VME Bus. VSB Bus. Interface. Interface. FIFO IN. Control. FIFO OUT. F 3. F 2. F 1.

miyo
Download Presentation

CprE / ComS 583 Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CprE / ComS 583Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Applications II

  2. Recap – Splash 1 Architecture VME Bus VSB Bus Interface Interface FIFO IN Control FIFO OUT F3 F2 F1 F0 F31 F30 F29 F28 M3 M2 M1 M0 M31 M30 M29 M28 M4 M5 M6 M7 M24 M25 M26 M27 F4 F5 F6 F7 F24 F25 F26 F27 F11 F10 F9 F8 F23 F22 F21 F20 M11 M10 M9 M8 M23 M22 M21 M20 M12 M13 M14 M15 M16 M17 M18 M19 F12 F13 F14 F15 F16 F17 F18 F19 CprE 583 – Reconfigurable Computing

  3. Recap – Splash 2 Architecture From Prev Board M5 M6 M7 M8 M0 M1 M2 M3 M4 F5 F6 F7 F8 F1 F2 F3 F4 Crossbar Switch F0 To Next Board F12 F11 F10 F9 F16 F15 F14 F13 M12 M11 M10 M9 M16 M15 M14 M13 CprE 583 – Reconfigurable Computing

  4. Recap – Splash 2 Architecture External Input Array Boards R-Bus S-Bus Interface Board Sparc-based Host SIMD S-Bus External Output CprE 583 – Reconfigurable Computing

  5. Recap – Dictionary Search Shift amount: 7 bits Hash function: 1100 1000 1010 0011 00 0000 0000 0000 0000 0000 Clear hash register 01 1010 0001 1101 00 Input the letters “th” --------------------------------- 10 1000 0011 0101 1100 0000 Temporary Result 10 0000 0101 0000 0110 1011 Result for “th” 00 0000 0001 1001 01 Input for letters “e_” ----------------------------------------- 01 0010 0110 0001 1110 1011 Temporary result 10 0101 1010 0100 1100 0011 Result for “the_” • XOR two character value with temp result and hash function • Rotate result • Different hash function for each FPGA CprE 583 – Reconfigurable Computing

  6. Outline • Recap • The Field-Programmable Port Extender (FPX) • FPX Architecture • FPX Programming Model • FPX Applications • Pattern Matching • Packet Classification • Rule Processing CprE 583 – Reconfigurable Computing

  7. Application – Network Processing • Networking applications well-suited for reconfigurable hardware • Target signatures change often • Massive quantities of stream-based data • Repetitive operations • Connecting up to a realistic networking environment is hard • Washington University experimental setup one of the best • Shows importance of both memory and processing capability • Numerous experiments performed over the past five years CprE 583 – Reconfigurable Computing

  8. Network Routing with the FPX • FPX Modules distributed across each port of a switch • IP packets (over ATM) enter and depart line card • Packet fragments processed by modules • Advantages: • New protocols implemented directly in silicon • Easy to upgrade in the field IP Packets IP Packets CprE 583 – Reconfigurable Computing

  9. FPX Hardware Device CprE 583 – Reconfigurable Computing

  10. FPX Hardware in a WUGS-20 Switch CprE 583 – Reconfigurable Computing

  11. FPGA-based Router • FPX module contains two FPGAs • NID – network interface device • Performs data queuing • RAD – reprogrammable application device • Specialized control sequences CprE 583 – Reconfigurable Computing

  12. Reprogrammable Application Device • Spatial Re-use of FPGA Resources • Modules implemented using FPGA logic • Module logic can be individually reprogrammed • Shared Access to off-chip resources • Memory Interfaces to SRAM and SDRAM • Common Datapath to send and receive data CprE 583 – Reconfigurable Computing

  13. Architecture of the FPX • RAD • Large Xilinx FPGA • Attaches to SRAM and SDRAM • Reprogrammable over network • Provides two user-defined Module Interfaces • NID • Provides Utopia Interfaces between switch & line card • Forwards cells to RAD • Programs RAD CprE 583 – Reconfigurable Computing

  14. Architecture of the FPX (cont.) CprE 583 – Reconfigurable Computing

  15. FPX SRAM • Provide low latency for fast table-lookups • Zero Bus Turnaround (ZBT) allows back-to-back read / write operations every 10ns • Dual, Independent Memories • 36-bit wide bus CprE 583 – Reconfigurable Computing

  16. FPX SDRAM • Dual, independent SDRAM memories • 64-bit wide, 100 MHz • 64Mb / Module : 128 Mb total [expandable] • Burst-based transactions [1-8 word transfers] • Latency of 14 cycles to Read/Write 8-word burst CprE 583 – Reconfigurable Computing

  17. VC VC VC VC ccp EC EC EC ccp Switch LineCard VC Routing Traffic Flows • Functions • Check packets for errors • Process commands • Control, status, & reprogramming • Implement per-flow forwarding • Traffic flows routed among • Switch • Line Card • RAD.Switch • RAD.Linecard NID CprE 583 – Reconfigurable Computing

  18. RAD RAD RAD RAD RAD RAD Switch LineCard Switch LineCard Switch LineCard VC VC VC VC VC VC VC VC VC VC VC VC ccp ccp ccp EC EC EC EC EC EC Switch LineCard Switch LineCard Switch LineCard Default Flow Action Egress Processing Ingress Processing (Bypass) (Per-flow Output Queueing) (IP Routing) RAD RAD RAD RAD RAD RAD Switch LineCard Switch LineCard Switch LineCard VC VC VC VC VC VC VC VC VC VC VC VC ccp ccp ccp EC EC EC EC EC EC Switch LineCard Switch LineCard Switch LineCard Full RAD Processing Full Loopback Testing Partial Loopback Testing (Packet Routing and Reassembly) (System Test) (Egress Flow Processing Test) Typical Flow Configurations CprE 583 – Reconfigurable Computing

  19. LC IPP Switch Element OPP LC LC IPP OPP LC LC IPP OPP LC LC IPP OPP LC IPP LC OPP LC IPP LC OPP LC LC IPP OPP LC LC IPP OPP LC Reprogramming Logic • NID programs at boot from EPROM • Switch Controller writes RAD configuration memory to NID • Configuration file for RAD arrives transmitted over network via control cells • Switch Controller issues {Full/Partial} reconfigure command • NID reads RAD config memory to program RAD • Performs complete or partial reprogramming of RAD CprE 583 – Reconfigurable Computing

  20. FPX Interfaces Provides • Well defined Interface • Utopia-like 32-bit fast data interface • Flow control allows back-pressure • Flow Routing • Arbitrary permutations of packet flows through ports • Dynamically Reprogrammable • Other modules continue to operate even while new module is being reprogrammed • Memory Access • Shared access to SRAM and SDRAM • Request/Grant protocol CprE 583 – Reconfigurable Computing

  21. Pattern Matching using the FPX • Use Hardware to detect a pattern in data • Modify packet based on match • Pipeline operation to maximize throughput CprE 583 – Reconfigurable Computing

  22. “Hello, World” Module Function CprE 583 – Reconfigurable Computing

  23. Logical Implementation Append “WORLD” to payload VCI Match New Cell CprE 583 – Reconfigurable Computing

  24. The Wrapper Concept App Wrapper Wrapper CprE 583 – Reconfigurable Computing

  25. AAL5 Encapsulation • Payload is packed in cells • Padding may be added • 64 bit Trailer at end of cell • Trailer contains CRC-32 • Last Cell indication bit (last bit of PTI field) CprE 583 – Reconfigurable Computing

  26. HelloBob Module SRAM Interface Input UDP Hello Bob Output Echo UDP Processor IP Processor Frame Processor Cell Processor CprE 583 – Reconfigurable Computing

  27. Results: Performance • Operating Frequency: 119 MHz. • 8.4ns critical path • Well within the 10ns period RAD's clock. • Targeted to RAD’s V1000E-FG680-7 • Maximum packet processing rate: • 7.1 Million packets per second. • (100 MHz)/(14 Clocks/Cell) • Circuit handles back-to-back packets • Slice utilization: • 0.4% (49/12,288 slices) • Less than one half of one percent of chip resources • Search technique can be adapted for other types of data matching and modification • Regular expressions • Parsing image content … CprE 583 – Reconfigurable Computing

  28. CAM-based Packet Matching • Sample Packet: • Source Address = 128.252.5.5 (dotted.decimal) • Destination Address = 141.142.2.2 (dotted.decimal) • Source Port = 4096 (decimal) • Destination Port = 50 (decimal) • Protocol = TCP (6) • Payload = “Consolidate your loans. CALL NOW” • Payload Lists = { General SPAM (0), Save Money SPAM (1) } • Content Vector = “00000011” (binary) = x”03” (hex) 111 104 103 40 72 7 71 39 8 0 Dest IP (hex) = 8D8E0202 Con- tent = 03 Src IP (hex) = 80FC0505 SrcPort = 1000 Dest Port = 0050 Proto = 06 CprE 583 – Reconfigurable Computing

  29. Sample Filter • Source Address = 128.252.0.0 / 16 • Destination Address = 141.142.0.0 / 16 • Source Port = Don’t Care • Destination Port = 50 • Protocol = TCP (6) • Payload includes general SPAM (List 0) Con- ten t= 01 Src IP value = 80FC0000 Dest IP (hex) = 8D8E0000 SrcPort = 0000 Dest Port = 50 Proto = 06 Value Mask: 1=care 0=don’t care Con- ten t= 01 Src IP (hex) = FFFF0000 Dest IP (hex) = FFFF0000 SrcPort = 0000 Dest Port = FFFF Proto = FF 40 103 71 39 7 8 0 72 Con- tent= 03 Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 SrcPort = 1000 Dest Port = 0050 Proto = 06 IP Packet DROP the packet : It matches the filter CprE 583 – Reconfigurable Computing

  30. Packet Classifier with FlowID 16 bits 112 bits Flow ID [1] CAM MASK [1] CAM VALUE [1] Flow ID [2] CAM MASK [2] CAM VALUE [2] 16 bits - - CAM Table - - Flow ID Flow ID [3] CAM MASK [3] CAM VALUE [3] . . . Resulting Flow Identifier . . . . . . Flow ID [N] CAM MASK [N] CAM VALUE [N] Bits in IP Header Flow List Priority Encoder Source Port Protocol Payload Match Bits Mask Matchers Dest. Port Source Address Destination Address Value Comparators CprE 583 – Reconfigurable Computing

  31. Prefix Next Hop 4 * 01* 7 10* 2 1 0 110* 9 0 1 0 1 0001* 1 1011* 0 0 0 1 0 1 00110* 5 1 1 1 1 01011* 3 0 1 Fast IP Lookup Algorithm • Function • Search for best matching prefix using Trie algorithm CprE 583 – Reconfigurable Computing

  32. Grant Request 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 1 Hardware Implementation in the FPX SRAM1 SRAM1 Interface Remap VCIs for IP packets Extract IP Headers IP Lookup Engine counter On-Chip Cell Store SRAM2 Packet Reassembler Control Cell Processor RAD FPGA NID FPGA SW LC CprE 583 – Reconfigurable Computing

  33. Generate Address Generate Address Latch ADDR into SRAM SRAM D < M[A] Latch Data into FPGA Compute Space (Parallel lookup units on FPGA) Time (cycles) Pipelined FIPL Operations • Throughput : Optimized by interleaving memory accesses • Operate 5 parallel lookups • t_pipelined_lookup = 550ns / 5 = 110 ns • Throughput = 9.1 Million packets / second Time (cycles) CprE 583 – Reconfigurable Computing

  34. IPv4 CAM Filter 104 Bit header matching Fast IP Lookup (FIPL) Longest Prefix Match MAE-West at 10M pkts/second Packet Content Scanner Reg. Expression Search Data Queueing Per-flow queue in SDRAM IPv6 Tunneling Module Tunnels IPv6 over IPv4 Statistics Module Event counter Traffic Generator Per-flow mixing Video Recoder Motion JPEG Embedded Processor KCPSM Other Modules Implemented CprE 583 – Reconfigurable Computing

  35. Summary • Field Programmable Port Extender (FPX) • Network-accessible Hardware • Reprogrammable Application Device • Module Deployment • Modules implement fast processing on data flow • Network allows Arbitrary Topologies of distributed systems • Project Website • http://www.arl.wustl.edu/arl/projects/fpx/ CprE 583 – Reconfigurable Computing

More Related