an ef cient regular expressions compression algorithm from a new perspective n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An Efficient Regular Expressions Compression Algorithm From A New Perspective PowerPoint Presentation
Download Presentation
An Efficient Regular Expressions Compression Algorithm From A New Perspective

Loading in 2 Seconds...

play fullscreen
1 / 19

An Efficient Regular Expressions Compression Algorithm From A New Perspective - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

An Efficient Regular Expressions Compression Algorithm From A New Perspective. Authors : Tingwen Liu , Yifu Yang , Yanbing Liu , Yong Sun , Li Guo Publisher : INFOCOM, 2011 Proceedings IEEE Presenter : 楊皓中 Date : 2014/05/28. Department of Computer Science and Information Engineering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Efficient Regular Expressions Compression Algorithm From A New Perspective' - juro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an ef cient regular expressions compression algorithm from a new perspective

An Efficient Regular Expressions Compression Algorithm From A New Perspective

Authors : TingwenLiu,YifuYang,YanbingLiu,YongSun,LiGuo

Publisher : INFOCOM, 2011 Proceedings IEEE

Presenter : 楊皓中

Date : 2014/05/28

Department of Computer Science and Information Engineering

National Cheng Kung University, Taiwan R.O.C.

introduction
Introduction
  • The paper focus on reducing memory usage of composite DFAs by compressing transitions.
  • Previous works
    • based on the observation that there are many common outgoing transitions for the same character among some states
  • In this paper
    • we try to obtain memory reduction by exploiting transitions redundancy among states and transitions distribution inside states.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

definition of cluster
Definition of cluster
  • regular expression :.*A.{2}CD
  • Obviously, DFA is a digraph, thus we can get a uniquedeterminate trie-tree after traversing DFA by level ( breadthfirst traversal), if we stipulate that we traverse the son statesby the label from small to large.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

definition of cluster1
Definition of cluster
  • In the trie-tree
    • if state r has a transition to state s, we call r is the father state of s, conversely s is the son state of r. A states set is called a cluster if it is composed of all the son states of a certain state.
  • The son states set of state 0 is {1},
  • so {1} is a cluster. By the same token
  • the state sets {2, 3}, {4, 5}, {6, 7}, {8}~{13}

are clusters too

National Cheng Kung University CSIE Computer & Internet Architecture Lab

transition characteristics
Transition characteristics
  • For a state,its outing transitions transfer to many distinct ”next-states”.
    • this observation is useless to reduce memoryusage of DFAs, because the number of distinct ”next-states”is more than 25 on average
  • we observe that the average number of distinct clustersis usually less than 5 (Column 3), which is much smaller than that of distinct ”next-states”.
  • Furthermore,the transitions of each state concentratively transfer to the maximum two clusters (Top-2 clusters for short).
    • these transitions occupy more than 95% of all transitions from TableII.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

transition characteristics1
Transition characteristics

National Cheng Kung University CSIE Computer & Internet Architecture Lab

description
Description
  • In DFAs, each state has 2^8transitionsand each transition transfer to a certain determinatestate, thus it transfer to a determinate cluster from aboveanalysis.
  • We can divide all the transitions and store them into threedifferent matrixes T1, T2, T3 as shown in Fig.3
    • the maximum transitions that transfer to the same cluster wereput into matrix T1
    • the second maximum transitionsinto matrix T2.
    • the remaining transitions into matrix T3.
  • For example, all the transitions of state 1 transfer to state 2 or 3, both of which belong to cluster {2, 3}, thus they are all put into T1, and T2 and T3 have no transitions for state 1.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

cluster based splitting compression algorithm
CLUSTER-BASED SPLITTING COMPRESSIONALGORITHM
  • Cluster-based Splitting Compression Algorithm (CSCA)
  • A. Compression Process
    • 1) Rearranging and Dividing:
    • 2) Matrix Compressing:
  • B. Lookup
  • C. Complexity Analysis

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process rearranging and dividing
Compression Process:Rearranging and Dividing
  • We need to rearrange DFAlevel by level according with the character sequence in 2^8. Afterthis step, the trie-tree of DFA has the following properties:
  • The benefit is that
    • all son states of one state is in a continuoussequence.
    • Thus the state number in the same clusteris continuous.
    • the minus of the maximum and minimumstate number is less than 256
  • divide the DFA matrix.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing
Compression Process:Matrix Compressing
  • Matrixes T1 and T2 have the samestructure:
    • the transitions of each state transfer into the samecluster, so we use the same algorithm to compress them
  • use another different algorithm to compress T3.
    • Obviouslymatrix T3 is a sparse matrix, in this paper we do not discusshow to compress it but use classical sparse matrix compressionalgorithm
  • Combinativerow.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing1
Compression Process:Matrix Compressing
  • If rows r and s are combinative row, we process them
  • (1) for character c in , if M[s,c] = ‘X’, reset M[s,c] = M[r,c]; if M[s,c] = M[r,c] or M[r,c] = ‘X’, keep M[s,c] unchanged;
  • (2) add a new index array equal, and set equal [r] = s, meaning that row r now equals with row s, and we can get the value of row r from row s equally;
  • (3) delete row r in matrix M.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing2
Compression Process:Matrix Compressing
  • The main idea of compressing matrixes T1 and T2 is:
  • convert the matrix into an offset matrix, which can generatemany combinative rows, and then merge them in order toreduce memory usage.
  • As mentioned above, after rearrangingDFA level by level, the state number in the same cluster is ina continuous sequence,

National Cheng Kung University CSIE Computer & Internet Architecture Lab

compression process matrix compressing3
Compression Process:Matrix Compressing

National Cheng Kung University CSIE Computer & Internet Architecture Lab

lookup
Lookup
  • Thefunction need to decide which part the next state is in first.
  • matrix T3 is compressed by sparse matrix compression algorithms, which can determine whether the element in T3 is effective or not by itself, thus we can save a bitmap by accessing T3 before T2, as shown in Algorithm 2.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

complexity analysis
Complexity Analysis
  • SCR (Spatial Compression Ratio)to evaluate the compression effect
  • R1 and R2 are offset matrixes,canstore each element in one byte
  • base1 (base2) is a int-type array,whose row number is n;
  • array equal1 (equal2) has n rows,its element can be stored in log2 n1 (log2 n2) bits
  • we only statistic theratio of effective elements in T3, represented byr.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

experiment results
EXPERIMENT RESULTS

National Cheng Kung University CSIE Computer & Internet Architecture Lab

experiment results1
EXPERIMENT RESULTS

National Cheng Kung University CSIE Computer & Internet Architecture Lab

orthogonal to previous schemes
ORTHOGONAL TO PREVIOUS SCHEMES

National Cheng Kung University CSIE Computer & Internet Architecture Lab