Gnort high performance intrusion detection using graphics processors l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Gnort: High Performance Intrusion Detection Using Graphics Processors PowerPoint PPT Presentation


  • 302 Views
  • Uploaded on
  • Presentation posted in: General

Gnort: High Performance Intrusion Detection Using Graphics Processors. Giorgos Vasiliadis , Spiros Antonatos , Michalis Polychronakis , Evangelos Markatos , Sotiris Ioannidis Institute of Computer Science Foundation for Research and Technology Hellas. General Idea.

Download Presentation

Gnort: High Performance Intrusion Detection Using Graphics Processors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Gnort high performance intrusion detection using graphics processors l.jpg

Gnort: High Performance Intrusion Detection Using Graphics Processors

GiorgosVasiliadis, SpirosAntonatos, MichalisPolychronakis, EvangelosMarkatos, Sotiris Ioannidis

Institute of Computer Science

Foundation for Research and Technology Hellas


General idea l.jpg

General Idea

  • How to speed up the processing throughput of intrusion detection systems by offloading the pattern matching operations to the GPU.

Giorgos Vasiliadis ICS-FORTH


Introduction l.jpg

Introduction

  • The problem

    • Network Intrusion Detection Systems (NIDS) are based on String Matching for detecting and preventing from well-known attacks

    • String Matching process accounts up to 75% of the total CPU processing

  • String Matching Algorithms

    • Aho-Corasick

  • Specialized hardware devices (NP, FPGAs, ASICs)

    • Complex to modify and program

    • Poor flexibility

  • Graphics Cards

    • Easy to program

    • Powerful and ubiquitous

    • Researches have begun exploring ways to tap their power for non-graphics applications

Giorgos Vasiliadis ICS-FORTH


Why use the gpu l.jpg

Why use the GPU ?

  • The GPU is specialized for compute-intensive, highly parallelcomputation

Giorgos Vasiliadis ICS-FORTH


Nvidia geforce simd architecture l.jpg

NVIDIA GeForce SIMD Architecture

  • Many Multiprocessors

  • Each multiprocessor contains many Stream Processors

  • Memory model

    • Shared On-Chip Memory

      • 1 cycle

    • Constant Memory

      • 400-600 cycles; 1 cycle if cached

    • Texture Memory

      • 400-600 cycles; 1 cycle if cached

    • Global Device Memory

      • 400-600 cycles

Size

GPU can be used as a general purpose processor, capable of executing many threads in parallel

Giorgos Vasiliadis ICS-FORTH


The aho corasick algorithm l.jpg

The Aho-Corasick Algorithm

  • Used in most modern NIDSes

    • Scans for multiple patterns simultaneously

  • Preprocess all patterns to build a state machine

  • The state machine is used to scan for multiple patterns simultaneously at linear time

    • Complexity is independent of the number of patterns

Example: P={he, she, his, hers}

Giorgos Vasiliadis ICS-FORTH


Mapping aho corasick on gpu l.jpg

Mapping Aho-Corasick on GPU

  • How to represent the State Machine ?

  • Snort represent each state as an array of pointers

    • It is difficult to map them on the GPU memory

  • Transform to a 2D array

    • Can easily bind to Texture Memory

      • Texture fetches are cached

        • Aho-Corasick exhibits strong locality of references

      • Random access memory read

    • The usage of Texture Memory boosts GPU execution time about 19 %

Giorgos Vasiliadis ICS-FORTH


Parallelizing packet searching 1 2 l.jpg

Parallelizing Packet Searching (1/2)

  • Assigning a Single Packet to each Multiprocessor

  • Each packet is copied to the shared memory of the Multiprocessor

  • Stream Processors search different parts of the packet concurrently

  • Overlapping computation

    • Matching patterns may span consecutive chunks of the packet

  • Same amount of work per Stream Processor

    • Stream Processors will be synchronized

Giorgos Vasiliadis ICS-FORTH


Parallelizing packet searching 2 2 l.jpg

Parallelizing Packet Searching (2/2)

  • Assigning a Single Packet to each Stream Processor

  • Each packet is processed by a different Stream Processor

  • No overlapping computation

  • Different amount of work per Stream Processor

    • Stream processors of the same Multiprocessor will have to wait until all have finished

Giorgos Vasiliadis ICS-FORTH


Software mapping l.jpg

Software Mapping

  • Packets are transferred to the GPU in batches

    • Performs much better than making each transfer separately

    • Packets are stored to a buffer that is copied to the GPU when gets full

  • Use page-locked memory to store the packets

    • Higher transfer throughput from host to device

    • Copies are performed using DMA, without occupying the CPU

      • CPU and GPU execution can overlap

Giorgos Vasiliadis ICS-FORTH


Evaluation 1 2 l.jpg

Evaluation (1/2)

  • Scalability as a function of the number of patterns

  • We ran Snort using random generated patterns

    • All patterns are matched against every packet

  • Payload trace contained UDP 800-bytes packets of random payload

  • Throughput remains constant when #patterns increases

  • 2.4x faster than the CPU

Giorgos Vasiliadis ICS-FORTH


Evaluation 2 2 l.jpg

Evaluation (2/2)

  • Throughput as a function of the packets size

  • Ran Snort using 1000 random patterns

    • All patterns are matched against every packet

  • 2.3 Gbit/s for full packets

  • 3.2xfaster compared to the CPU

  • Both GPU implementations do not present significant differences in performance

Giorgos Vasiliadis ICS-FORTH


Evaluation with real input and rules l.jpg

Evaluation with real input and rules

  • Experimental setup

    • Two PCs connected via a 1 Gbit/s Ethernet switch

  • To directly compare with prior work [Jacob et al], we re-implemented the Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM) algorithms on the GPU.

Giorgos Vasiliadis ICS-FORTH


Evaluation with real input and rules14 l.jpg

Evaluation with real input and rules

  • Snort loaded about 8000 patterns.

  • Preprocessors and PCRE were disabled

  • Original Snort (AC) cannot process all packets in rates higher than 300 Mbit/s

  • GPU-assisted Snort (AC1, AC2) begins to loose packets at 600 Mbit/s

    • 200% improvement

  • KMP and BM algorithms used from [Jacob et al] perform worse in all cases

Giorgos Vasiliadis ICS-FORTH


Conclusion l.jpg

Conclusion

  • Graphics cards can be used effectively to speed up Network Intrusion Detection Systems.

    • Low-cost

    • Easy programming

  • Future work includes

    • Transfer the packets directly from the NIC to the GPU

    • Utilize multiple GPUs on multi-slot motherboards

Giorgos Vasiliadis ICS-FORTH


Thank you l.jpg

Thank you

Any questions?

[email protected]

Giorgos Vasiliadis ICS-FORTH


  • Login