80 likes | 235 Views
Deep Packet Inspection Which Implementation Platform?. Sarang Dharmapurikar Cisco. Implementation Platform. Several choices, each with some pros and cons ASICs FPGA Network Processors Graphics Processors (nVidia) multiple-core, multi-threaded Commodity processors
E N D
Deep Packet InspectionWhich Implementation Platform? Sarang Dharmapurikar Cisco
Implementation Platform • Several choices, each with some pros and cons • ASICs • FPGA • Network Processors • Graphics Processors (nVidia) • multiple-core, multi-threaded Commodity processors • Needs evaluation with respect to • Cost • Speed • Overall system performance (DPI is just a small piece of the puzzle) • Ease of use and upgrading • A hardware-software co-design approach • Profile a DPI system and push some components in hardware if the overall speed up is effective (Ahmdal’s law)
ASIC • Examples: ClassiPi, NetLogic, Tarari, some Cisco ASICs • Requires too much investment • NRE close to a million dollars! • A long design cycle • Most of the time is consumed in verification • Hard to upgrade • Algorithms evolve • It is hard to build a flexible enough ASIC • Applications get locked to a platform • To migrate to a new platform requires a lot of software rewriting
FPGA • Very flexible but expensive and power-consuming • Virtex-5 offers 330,000 lookup tables units • 4MB of SRAM • Latest Xilinx FPGA contain multiple PowerPC cores • Possible to design hybrid hw/sw systems • The compoents that assist DPI such as TCP-reassembly, normalization, flow classification done in hardware • Several FPGA platforms for networking acceleration available today • NetFPGA • FPX • Need to be careful in the DPI approach • The raw signature matching techniques that use FPGA logic resources for each signature won’t scale
Network Processors • Intel IXP2850 • 16 micro-engines with • 2KB D$ and 8KB I$ and 16 entry CAM • An integrated XScale processor for control path • 32KB I$ and 32kB D$ • 2 Crypto units • 16KB shared scratch pad SRAM • Cisco QuantumFlow processor • 40 packet processing engines (PPE) each @ 1.2 GHz • 4 threads per PPE • Dedicated hardware for queuing, buffering, IP lookup and classification
Commodity processors • Really powerful server class processors coming up • Intel’s Nehalem • 8 cores • 2 threads per core • 32KB L1, 256 KB L2, 10+MB of shared L3 cache • Sun’s Niagara2 • 8 cores • 8 threads per core! • 16KB I$ and 8KB D$ per core, 4MB shared L2 cache. • Integrated cryptographic coprocessors units • Need to think multi-core, multi-threaded • Think in terms of a complete system, not just pattern matching • Which core should do what? • Need to design cache-friendly data structures
Conclusion • While hardware can assist DPI systems, building proprietary hardware not a good idea • Let’s understand the “actual” performance needs • Let’s not be misguided by “marketing” needs • Need to think of hardware-software co-design • Requires careful profiling of DPI systems to identify the components that can be pushed to hardware • Need to design algorithms for multi-core multi-threaded processors