shift based pattern matching for compressed web traffic n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Shift-based Pattern Matching for Compressed Web Traffic PowerPoint Presentation
Download Presentation
Shift-based Pattern Matching for Compressed Web Traffic

Loading in 2 Seconds...

play fullscreen
1 / 17

Shift-based Pattern Matching for Compressed Web Traffic - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

Shift-based Pattern Matching for Compressed Web Traffic. Author: Anat Bremler-Barr, Yaron Koral ,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang, Liu Date: 2011/11/2. INTRODUCTION.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Shift-based Pattern Matching for Compressed Web Traffic' - julio


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
shift based pattern matching for compressed web traffic

Shift-based Pattern Matching for Compressed Web Traffic

Author:

Anat Bremler-Barr, Yaron Koral ,Victor Zigdon

Publisher: IEEE HPSR,2011

Presenter: Kai-Yang, Liu

Date: 2011/11/2

introduction
INTRODUCTION
  • Two-thirds of the top 1000 most popular sites like Yahoo!, Google, MSN, YouTube, Facebook and others use HTTP compression to enhance the speed of their content downloads.
the gzip algorithm
The GZIP Algorithm
  • LZ77 compression

LZ77 compression technique is that we can compress a series of bytes (characters) if we spot that this series of bytes has already appeared in the past. The algorithm replaces each repeated string by (distance,length) pair.

For example:

the text: ‘abcdefgabcde’ can be compressed to: ‘abcdefg(7,5)’; LZ77 refers to the above pair as “pointer” and to uncompressed bytes as “literals”.

  • Huffman Coding- reduce the symbol coding size by encoding frequent symbols with fewer bits.
introduction1
INTRODUCTION
  • Recent work (ACCH algorithm) presents technique for pattern matching on compressed traffic that decompresses the traffic and then uses data from the decompression phase to accelerate the process.
  • We present Shift-based Pattern matching for Compressed traffic algorithm, SPC, that accelerates MWM on compressed traffic.
the modified wu manber algorithm
THE MODIFIED WU-MANBER ALGORITHM
  • MWM trims all patterns to their mbytes prefix, where mis the size of the shortest pattern.
  • MWM chooses predefined group of bytes, namely B, to determine the shift value.
  • MWM starts by precomputing two tables: a skip shift table called ShiftTableand a patterns hash table, called Ptrns.
  • The scan is performed using a virtual scan window of size m. The shift value is determined by indexing the ShiftTable with theB bytes suffix of the scan window.
shift based pattern matching for compressed traffic spc
SHIFT-BASED PATTERN MATCHING FOR COMPRESSED TRAFFIC (SPC)
  • The bytes referred by the pointers were already scanned; hence, if we have a prior knowledge that an area does not contain patterns, we can skip scanning most of it.
  • Observe that even if no patterns were found when the referred area was scanned, patterns may occur in the boundaries of the pointer.
  • The general method of the algorithm is to use a combined technique that scans uncompressed portions of the data using MWM and skips scanning most of the data represented by the LZ77 pointers.
experimental results
EXPERIMENTAL RESULTS
  • Data Set

We collected HTTP pages encoded with GZIP taken from a list constructed from the Alexa website that maintains web traffic metrics and top-site lists.

  • Pattern Set

Our pattern-sets were gathered from two different sources: ModSecurity , an open source web application firewall and Snort, an open source network intrusion prevention system.

spc characteristics analysis
SPC Characteristics Analysis
  • In order to understand the impact of B and m we examined the character of skip ratio, Sr, the percentage of characters the algorithm skips.
  • The Snort pattern set contains many short patterns, specifically 410 distinct patterns of length ≤3, 539 of length 4 and 381 of length 5.
  • To circumvent this problem we inspected the containing rules. We can eliminate most of the short patterns by using longer pattern within the same rule or relying on specific flow parameters.