1 / 28

Polygraph: Automatically Generating Signatures for Polymorphic Worms

Polygraph: Automatically Generating Signatures for Polymorphic Worms. Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter: Abhishek Karnik. Background. IDSes block Internet Worm flows based on signatures based on a worms payload using strings matched on:

glynn
Download Presentation

Polygraph: Automatically Generating Signatures for Polymorphic Worms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Polygraph:Automatically Generating Signatures for Polymorphic Worms Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter: Abhishek Karnik

  2. Background • IDSes block Internet Worm flows based on signatures based on a worms payload using strings matched on: • Fixed payload offsets • Arbitrary payload offsets • Regular expressions • Signatures generated manually by experts based on hours or days of observation • Recently researchers are giving attention to automating this slow process. [Honeycomb, Autograph, EarlyBird] • Automated signatures produced by extracting common byte patters across different suspicious flows

  3. Previous Automated Methods • Signatures based on a single contiguous substring of sufficient length from a worms payload • Assumptions: • There exists a single payload substring that will remain invariant across worm connections specific to the worm • Invariant string is sufficiently long to be specific and does not occur in any non-worm payloads

  4. Motivation • Future worms may by polymorphic and thus may evade such signatures based on single substrings. • Polymorphic obfuscator available which are capable of leaving nearly no multi-byte regions in common across its outputs.

  5. Goal of Polygraph • Present algorithms and identify methods to generate automatic signatures suited for matching polymorphic worm payloads • Evaluate such algorithms to demonstrate that Polygraph produces signatures that exhibit low false negatives and false positives

  6. Assumptions • A worm must exploit one or more specific server software vulnerabilities • A real-world exploit contain multiple disjoint invariant substrings in all variant payloads • Invariant bytes include protocol framing bytes, which allows the server to branch down the code path where a vulnerability exists and possibly overwrite a jump address

  7. Approach – Exploits • Within a worm there are three classes of bytes: • Invariant bytes • Wildcard bytes • Code bytes • Over 15 software vulnerabilities spanning various OS’s and applications surveyed. Nearly All require invariant content in any exploit. • Two sources of Invariant content • Invariant Exploiting Frame • Invariant Overwrite Values

  8. Approach - Examples • Apache-Knacher exploit

  9. Approach - Examples • Lion Exploit

  10. Architecture • Flow classifier reassembles flows and classifies them based on same IP and port number into innocuous and suspicious flows

  11. Architecture • Identifying anomalous or suspicious traffic classified by use of honeypots or port scan activity. • Assumptions for Flow Classifier: • There maybe noise introduced during classification • Flow classifier does not distinguish between different worms this suspicious pool may contain a mixture of worms which may or may not be polymorphic

  12. Signature Generator Goals • Signature quality – low false +ve’s for innocuous traffic and low –ve’s for wrm instances • Efficient signature generation • Efficient signature matching • Generation of small signature sets – small number of signatures • Robust against noise and multiple worms • Robust against evasion and subversion

  13. Signature Algorithms • All signatures are built from substrings called tokens • Each signature is made of one or more tokens • Following algorithms extract and analyze tokens which are then used to create signatures • Token extraction eliminates irrelevant parts of suspicious flows • Preprocessing • Extract distinct substrings of minimum length ‘α’ that occur in at least K out of n samples in the suspicious pool – longest substring algorithms • Represent each suspicious flow as a sequence of tokens, and remove the rest of the payload.

  14. Signature Algorithms • Conjunction Signatures • A signature that consists of all tokens in the set found in any order. • Matches multiple invariant tokens and is more specific than matching only one token alone. • The signature is the set of tokens. • Token-subsequence signatures • A signature that consists of an ordered set of tokens • Can be expressed using regular expressions • A signature is generated if the ordered subsequence of tokens is present in every sample in the suspicious pool.

  15. Signature Algorithms • xxonexxxtwox – string 1 • oneyyyyyyytwoyyy – string2 • Longest subsequence is onetwo • String alignment used x x o n e x x x - - t w o x – - - - o n e y y y y y t w o y y y • Regular expression “.*one.*two.*” • An alignment is assigned a score by adding 1 and subtracting a gap penalty of Wg • “.*o.*n.*e.*z.*” has a value 4 – 3*.8 = 1.6 • “.*two.* has a value 3 – 0*.8 = 3

  16. Signature Algorithms • Bayes Signatures • A probabilistic matching method • A signature consisting of a set of tokens each associated with a score and an overall threshold • Matching and construction is less rigid compared to conjunction and token based methods • Allows signatures to be learned from suspicious pools that contain samples of unrelated and innocuous worms • Classify a flow by the distribution from which its token set is more likely to be generated

  17. Signature Algorithms • Pr[worm|x] / Pr[~worm|x] • Set a threshold so that the classifier reports +ve only if its surface is sufficiently far away from the decision boundary- Helps handling noise • Each item is assigned a score based on its probability or being from a certain pool. • Scores are added together and if the total is greater than the threshold the sample is classified as a worm.

  18. Generating Multiple Signatures • Suspicious flows could contain more than one type of worm • Suspicious pool is divided into clusters each containing similar flows. • Signatures outputted per cluster • Quality of clusters • Clusters should not be too general • Clusters should not be too specific

  19. Hierarchical Clustering • Used for token subsequence and conjunction algos. • Given s clusters initially, s signatures generated • Iteratively merge clusters producing a more sensitive signature • Determine what the merged signature might be and use innocuous flows to estimate false positives • Lower false +ve rate more specific the signature, more similar the two clusters • Stop clustering when any two clusters give a high false +ve rate of there is only one cluster

  20. Experiments • K = 3 • α = 2 • Minimum cluster size = 3 • Network traces: Intel Research Pittsburg in October 2004 • DNS traces from a major academic institution • Intel Pentium III running on Linux 2.4.20

  21. Results – Apache-Knacker

  22. Results – BIND Lion Exploit

  23. Polymorphic with Noise

  24. Polymorphic with Noise

  25. Conclusions • Polygraph works for polymorphic worms • Content variability is limited by nature of the software vulnerability • Use multiple, disjoint strings that are invariant across copies of a worm • Accurate signatures can be automatically generated for polymorphic worms • Demonstrated low false positives with real exploits, on real traffic traces

  26. Strengths • A new concept in the area of Intrusion Detection which must be explored further • Well written paper covering almost all possible aspects and providing 3 algorithms

  27. Weakness • Vulnerable to Overtraining Attacks • Long-Tail Attacks

  28. Potential Extensions • Applying Polygraph to a distributed IDS • Adapting to IPv6

More Related