an ef cient regular expressions compression algorithm from a new perspective n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An Efficient Regular Expressions Compression Algorithm From A New Perspective PowerPoint Presentation
Download Presentation
An Efficient Regular Expressions Compression Algorithm From A New Perspective

Loading in 2 Seconds...

play fullscreen
1 / 19

An Efficient Regular Expressions Compression Algorithm From A New Perspective - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

An Efficient Regular Expressions Compression Algorithm From A New Perspective. Authors : Tingwen Liu , Yifu Yang , Yanbing Liu , Yong Sun , Li Guo Publisher : INFOCOM, 2011 Proceedings IEEE Presenter : 楊皓中 Date : 2014/05/28. Department of Computer Science and Information Engineering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

An Efficient Regular Expressions Compression Algorithm From A New Perspective


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an ef cient regular expressions compression algorithm from a new perspective

An Efficient Regular Expressions Compression Algorithm From A New Perspective

Authors : TingwenLiu,YifuYang,YanbingLiu,YongSun,LiGuo

Publisher : INFOCOM, 2011 Proceedings IEEE

Presenter : 楊皓中

Date : 2014/05/28

Department of Computer Science and Information Engineering

National Cheng Kung University, Taiwan R.O.C.

introduction
Introduction
  • The paper focus on reducing memory usage of composite DFAs by compressing transitions.
  • Previous works
    • based on the observation that there are many common outgoing transitions for the same character among some states
  • In this paper
    • we try to obtain memory reduction by exploiting transitions redundancy among states and transitions distribution inside states.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

definition of cluster
Definition of cluster
  • regular expression :.*A.{2}CD
  • Obviously, DFA is a digraph, thus we can get a uniquedeterminate trie-tree after traversing DFA by level ( breadthfirst traversal), if we stipulate that we traverse the son statesby the label from small to large.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

definition of cluster1
Definition of cluster
  • In the trie-tree
    • if state r has a transition to state s, we call r is the father state of s, conversely s is the son state of r. A states set is called a cluster if it is composed of all the son states of a certain state.
  • The son states set of state 0 is {1},
  • so {1} is a cluster. By the same token
  • the state sets {2, 3}, {4, 5}, {6, 7}, {8}~{13}

are clusters too

National Cheng Kung University CSIE Computer & Internet Architecture Lab

transition characteristics
Transition characteristics
  • For a state,its outing transitions transfer to many distinct ”next-states”.
    • this observation is useless to reduce memoryusage of DFAs, because the number of distinct ”next-states”is more than 25 on average
  • we observe that the average number of distinct clustersis usually less than 5 (Column 3), which is much smaller than that of distinct ”next-states”.
  • Furthermore,the transitions of each state concentratively transfer to the maximum two clusters (Top-2 clusters for short).
    • these transitions occupy more than 95% of all transitions from TableII.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

transition characteristics1
Transition characteristics

National Cheng Kung University CSIE Computer & Internet Architecture Lab

description
Description
  • In DFAs, each state has 2^8transitionsand each transition transfer to a certain determinatestate, thus it transfer to a determinate cluster from aboveanalysis.
  • We can divide all the transitions and store them into threedifferent matrixes T1, T2, T3 as shown in Fig.3
    • the maximum transitions that transfer to the same cluster wereput into matrix T1
    • the second maximum transitionsinto matrix T2.
    • the remaining transitions into matrix T3.
  • For example, all the transitions of state 1 transfer to state 2 or 3, both of which belong to cluster {2, 3}, thus they are all put into T1, and T2 and T3 have no transitions for state 1.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

cluster based splitting compression algorithm
CLUSTER-BASED SPLITTING COMPRESSIONALGORITHM
  • Cluster-based Splitting Compression Algorithm (CSCA)
  • A. Compression Process
    • 1) Rearranging and Dividing:
    • 2) Matrix Compressing:
  • B. Lookup
  • C. Complexity Analysis

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process rearranging and dividing
Compression Process:Rearranging and Dividing
  • We need to rearrange DFAlevel by level according with the character sequence in 2^8. Afterthis step, the trie-tree of DFA has the following properties:
  • The benefit is that
    • all son states of one state is in a continuoussequence.
    • Thus the state number in the same clusteris continuous.
    • the minus of the maximum and minimumstate number is less than 256
  • divide the DFA matrix.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing
Compression Process:Matrix Compressing
  • Matrixes T1 and T2 have the samestructure:
    • the transitions of each state transfer into the samecluster, so we use the same algorithm to compress them
  • use another different algorithm to compress T3.
    • Obviouslymatrix T3 is a sparse matrix, in this paper we do not discusshow to compress it but use classical sparse matrix compressionalgorithm
  • Combinativerow.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing1
Compression Process:Matrix Compressing
  • If rows r and s are combinative row, we process them
  • (1) for character c in , if M[s,c] = ‘X’, reset M[s,c] = M[r,c]; if M[s,c] = M[r,c] or M[r,c] = ‘X’, keep M[s,c] unchanged;
  • (2) add a new index array equal, and set equal [r] = s, meaning that row r now equals with row s, and we can get the value of row r from row s equally;
  • (3) delete row r in matrix M.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing2
Compression Process:Matrix Compressing
  • The main idea of compressing matrixes T1 and T2 is:
  • convert the matrix into an offset matrix, which can generatemany combinative rows, and then merge them in order toreduce memory usage.
  • As mentioned above, after rearrangingDFA level by level, the state number in the same cluster is ina continuous sequence,

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing3
Compression Process:Matrix Compressing

National Cheng Kung University CSIE Computer & Internet Architecture Lab

lookup
Lookup
  • Thefunction need to decide which part the next state is in first.
  • matrix T3 is compressed by sparse matrix compression algorithms, which can determine whether the element in T3 is effective or not by itself, thus we can save a bitmap by accessing T3 before T2, as shown in Algorithm 2.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

complexity analysis
Complexity Analysis
  • SCR (Spatial Compression Ratio)to evaluate the compression effect
  • R1 and R2 are offset matrixes,canstore each element in one byte
  • base1 (base2) is a int-type array,whose row number is n;
  • array equal1 (equal2) has n rows,its element can be stored in log2 n1 (log2 n2) bits
  • we only statistic theratio of effective elements in T3, represented byr.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

experiment results
EXPERIMENT RESULTS

National Cheng Kung University CSIE Computer & Internet Architecture Lab

experiment results1
EXPERIMENT RESULTS

National Cheng Kung University CSIE Computer & Internet Architecture Lab

orthogonal to previous schemes
ORTHOGONAL TO PREVIOUS SCHEMES

National Cheng Kung University CSIE Computer & Internet Architecture Lab