Extraction method: The Interactive Disassembler (IDA):IDA is a commercial Disassembler widely used for reverse engineering meaning, it is able to receive a binary file and reverse it back to the assembler code. Using a dedicated plug-in, IDA can identify, extract and normalize all the functions in the file. Data mining:Using classifier which takes a training set of bytes' segments and classify if it an end, start or neither, then classify segments of bytes from a suspicious file, and determine if these segments are start, end or neither. That way we are able to extract functions from a given file. Selection methods: Random Selector:Choose a signature randomly from the candidates. Minimum Entropy Selector:The selector calculates the entropy of the candidates and selects the one with the minimum entropy. Cluster Selector:This Selector creates groups of candidates by their distance from each other, and will score each cluster by the chance it will contain the best signature. Each cluster will get score that will reflect this chance with the following formula: Probability Selector:Key idea: estimate the probability that each of the candidate signatures will match a randomly chosen block of bytes in the code of a randomly chosen programSelect one or more signatures with the lowest estimated False Positive probabilities of all the candidates which is less than pre-defined threshold. Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant growth, viruses' renewal and improvement is done much faster than the update time of the anti-virus software selling today. Our solution focuses on the signature generation process. We have developed an automatic system, which its goal is to extract simple, unique and optimal signatures for malicious files.This way any IDS/IPS will be able to neutralize a hostile code in real-time. In addition we have developed an evaluation environment - its objective is to determine the best configuration for generating an optimal signature for malicious files. Project description Server Algorithms Server Engine Generally, the Signature Builder system operation is:Building a common functions library (CFL), Given a malicious file, extract its functions and filter the common ones using the CFL, generate signature and at last Choosing from the remaining functions (candidates), the best one to act as the malicious file’s signature. The system extracts functions from the malwares by several algorithms, and provide a signature for each malware. How does it work ? • Let S be a string/signature. • Sc character in S • |Sc| the number of times Sc appears at S. • The Entropy of S will be as follows: Initialize the system Initialize Configuration CFL Handling Receive File from Client Extracting Functions Filter Common Functions • Cs denotes Cluster size in bytes • Fs denotes File’s Size • Fc denotes number of functions in cluster • T denotes total number of function in file • Fl denotes the sum of function’s length in cluster Generate Candidates Select Best Candidate Architecture Return Signature Malicious Files CFL Evaluation Environment • For a given sequence of S bytes B=B1B2…BS estimate the probability p(B) for B to occur in a large body of normal uninfected code: • TS - number of S-byte sequences in a large corpus of uninfected programs • f(B) - number of occurrences of B in Ts Signature Bulder (Server) Statistics Technology Signature Builder Team Ido Levin Ofir Nissel Yotam Katzman Language: IDE: Operation System: Academic Advisor: Dr. Yuval EloviciProfessional Advisor: Mr. Asaf Shabtai Evaluation Environment - evaluates the different configurations of the signature builder, in order to decide about the quality of the signature. The main idea is checking if a signature of a malicious file appears in control group- benign files. Of course, a good signature which belongs to a malicious file – should not appear in benign files. The output consists the following: • Processed - The number of malware files that the system managed to generate a signature for them. • Processed (%) - Processed / Total Malware Files. • Signature Hits - The number of malware files that gives at least one False Alarm, which means the number of unique malware files that produced False Alarm. • Signature Hits (%) - Signature Hits / Processed. • Unique Signature - The number of unique signatures that didn’t produced FA. • Different Files - The number of distinct files in the Control Group that has at least one hit. • Different Files (%) – Different Files / Total Control Group Files. Evaluation Environment Each configuration consist the following input: • CFL size in MB • maximum signature length in byte • Function similarity threshold • Offset size in byte • Function Extractor • Function selection.