1 / 27

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection. Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz Publisher: ANCS'06, December 3–5, 2006 Present: Yu-Tso Chen Date: November, 6, 2007.

liseli
Download Presentation

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast and Memory-Efficient Regular ExpressionMatching for Deep Packet Inspection Authors:Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz Publisher: ANCS'06, December 3–5, 2006 Present:Yu-Tso Chen Date:November, 6, 2007 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

  2. Outline • 1. Introduction • 2. Definitions and problem description • 3. Matching of Individual Patterns • 4. Selective Grouping of Multiple Patterns • 5. Evaluation Result • 6. Conclusion

  3. Introduction • Three unique complex features • 1) Large numbers of wildcards can cause DFA to grow exponentially • 2) Wildcard are used with length restriction(‘?’, ‘+’) will increase the resource • 3) Groups of characters are also commonly used such interaction can result in highly complex state machine(ex.”^220[\x09-]*ftp”)

  4. Introduction (cont.) • Make following contributions • 1) Analyze the computational and storage cost of building individual DFAs • 2) Two rewrite rules for specific regular expressions • 3) Combine multiple DFAs into a small number of group

  5. Outline • 1. Introduction • 2. Definitions and problem description • 3. Matching of Individual Patterns • 4. Selective Grouping of Multiple Patterns • 5. Evaluation Result • 6. Conclusion

  6. Regular Expression Patterns • Compares the regular expressions used in two networking applications (Snort, Linux L-7 filter & XML filtering) • 1)Both types of app. Use wildcards (‘.’,’?‘,’+’,’*’) contain larger numbers of them • 2) Classes of characters (“[ ]”) are used only in packet scanning applications • 3) High percentage of scanning app. Have length restrictions on some of the classes or wildcards

  7. Regular Expression Patterns

  8. Solution Space for Regular Expression Matching • A single regular expression of length n can be expressed as an NFA with O(n) • When the NFA is converted into a DFA, it may generate states • The processing complexity for each character in the input is O(1) in DFA, but is O(n2) for an NFA when all n states are active at the same time

  9. Solution Space for Regular Expression Matching (cont.) • To handle m regular expressions, two choices are possible: • Processing them individually in m automata • Compiling them into a single automaton

  10. Problem Statement • DFA-based approaches in this paper • Our goal is to achieve O(1) computation cost • The focus of the study is to reduce memory overhead of DFA • There are two sources of memory usage in DFAs:states and transitions • We consider the number of states as the primary factor

  11. Outline • 1. Introduction • 2. Definitions and problem description • 3. Matching of Individual Patterns • 4. Selective Grouping of Multiple Patterns • 5. Evaluation Result • 6. Conclusion

  12. Design Considerations • Define Completeness of Matching Results: • Exhaustive Matching:M(P,S)={substring S’ of S | S’ is accepted by the DFA of P} • It is expensive and often unnecessary to report all matching substrings • We propose a new concept, Non-overlapping Matching, that relaxes the requirements of exhaustive matching • Non-overlapping Matching: • Ex:ab* if input abbb non-overlapping matching will report one match instead of three • Exhaustive Matching will report, ab, abb, abbb

  13. Design Considerations (cont.) • Define DFA Execution Model for Substring Matching:We focus on patterns without ‘^’ attached at the beginning • Repeater searches • One-pass search – this approach can truly achieve O(1) computation cost per character

  14. DFA Analysis for Individual Regular Expressions • The study is based on the use of exhaustive matching & one-pass search

  15. Case 4:DFA of Quadratic Size • The DFA needs to remember the number of Bs it has seen and their locations

  16. Case 5:DFA of Exponential Size • An exponential number of states (22+1)are needed to represent these two wildcard characters AAB(AABBCD) is different from ABA(ABABCD) because a subsequence input BCD

  17. Regular Expression Rewrites • Rewrite Rule(1) • “^SEARCH\s+[^\n]{1024}” to “^SEARCH\s [^\n]{1024}” • “^A+[A-Z]{j}” to “^A [A-Z]{j}” • We can prove match “^A+[A-Z]{j}” also match “^A [A-Z]{j}”

  18. Regular Expression Rewrites (cont.) • Rewrite Rule(2) • We don’t need to keep track of the second AUTH\s • If there is a ‘\n’ within the next 100 bytes, the return character must also be within 100 bytes to the second AUTH\s • If there is no ‘\n’ within the next 100 bytes, the first already matched the pattern • “([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99}\n)*AUTH\s[^\n]{100}”

  19. Outline • 1. Introduction • 2. Definitions and problem description • 3. Matching of Individual Patterns • 4. Selective Grouping of Multiple Patterns • 5. Evaluation Result • 6. Conclusion

  20. Selective Grouping of Multiple Patterns • The composite DFA may experience exponential growth in size, although none of the individual DFA has an exponential component

  21. Regular Expressions Grouping Algorithm • Definition of interaction:two patterns interact with each other if their composite DFA contains more states than the sum of two individual ones

  22. Grouping Algorithm

  23. Outline • 1. Introduction • 2. Definitions and problem description • 3. Matching of Individual Patterns • 4. Selective Grouping of Multiple Patterns • 5. Evaluation Result • 6. Conclusion

  24. Evaluation Result • Effect of Rule Rewriting

  25. Evaluation Result (cont.)

  26. Outline • 1. Introduction • 2. Definitions and problem description • 3. Matching of Individual Patterns • 4. Selective Grouping of Multiple Patterns • 5. Evaluation Result • 6. Conclusion

  27. Conclusion • Rewriting techniques – memory-efficient DFA-based approaches are possible • Selectively groups patterns together – speed up the matching process

More Related