300 likes | 508 Views
Suspicious Behavior-based Malware Detection Using Artificial Neural Network 基於可疑行為 及 類 神經 網路之 惡意軟體偵測機制. 指導教授:王國禎 博士 學生: 蔡薰儀 國立交通大學網路工程研究所 行動計算與寬頻網路實驗室. Outline. Introduction Related Work Problem Statement Background Sandboxes Design Approach Suspicious Behaviors
E N D
Suspicious Behavior-based Malware DetectionUsing Artificial Neural Network基於可疑行為及類神經網路之惡意軟體偵測機制 指導教授:王國禎 博士 學生:蔡薰儀 國立交通大學網路工程研究所 行動計算與寬頻網路實驗室
Outline • Introduction • Related Work • Problem Statement • Background • Sandboxes • Design Approach • Suspicious Behaviors • Proposed ANN-MD System • Weight Adjusting • Malicious Degree • Evaluation • Training Phase • Testing Phase • Conclusion and Future Work • References
Introduction • In recent years, malware has been severe threats to cyber security • Viruses, Worms, Trojan horses, Botnets, … • Drawbacks oftraditional signature-based malware detection algorithms[1] [2] • Need human and time to approve • Need to update the signatures of malware frequently • Easily bypassed by obfuscation methods • Can not detect zero day malware • Increase false negative rate
Introduction (Cont.) • To conquer the shortcomings of signature-based malware detection algorithms, behavior-based malware detection algorithms were proposed • Behavior-based malware detection algorithms [3] [4] • Detect unknown malware or variations of known malware • Decreasefalse negative rate (FNR) • However, existing behavior-based malware detection algorithms may have a higher false positive rate (FPR) • Benign software may have some behaviors which are alike with malware
Introduction (Cont.) • We proposed a behavioral artificial neural network (ANN)-based malware detection (ANN-MD) algorithm • Detect unknown malware and variations of known malware • Decrease FNR and FPR
Related Work • MBF [3] • File, process, network, and registry actions • Malicious Behavior Feature (MBF) MBF = <Feature_id, Mal_level, Bool_expression> • Three malicious level: high, warning, and low • RADUX [4] • Reverse Analysis for Detecting Unsafe eXecution (RADUX) • API function call sequences, e.g. load register: RegOpenKey, RegCreateKey, RegSetValue, RegCloseKey • Collected 9 common suspicious behaviors • Use Bayes’ theorem to compute suspicious degree (malicious degree). Combine the appearance probability of each behavior all together not individually
Problem Statement • Given • Several sandboxes • iknown malware M = {M1,M2, …, Mi} for training • jknown malware N = {N1, N2, …, Nj} for testing • k benign software O = {O1, O2, …, Ok} for training • l benign softwareP = {P1, P2, …, Pl} for testing • Objective • m behaviors B = {B1,B2, …, Bm} • mweights W = {ω1,ω2, …, ωm} • Malicious Degree (MD) expression
Problem Statement (Cont.) MD Threshold Number of Samples Benign Ambiguous Malicious Try to find the optimal MD threshold to make FPR and FNR as small as possible.
Background – Sandboxes • A sandbox is a testing environment which can isolate unknown sample from make changes to the operating system • It can interact with samples and record all the runtime behaviors of samples • Web-based sandboxes • GFI Sandbox [5] • Norman Sandbox [6] • Anubis Sandbox [7]
Design Approach – Suspicious Behaviors • Choose the behaviors in the intersection of the behaviors these sandboxes investigate • Choose the behaviors which are not in the intersection but have high appearance frequency, i.e. • Creates Mutex • Creates Hidden File • Starts EXE in System • Checks for Debugger • Starts EXE in Documents • Windows/Run Registry Key Set • Hooks Keyboard • Modifies Files in System • Deletes Original Sample • More than 5 Processes • Opens Physical Memory • Deletes Files in System • Auto Start
Design Approach – Suspicious Behaviors (Cont.) Ulrich Bayer et al. [8] The behaviors we choose The behaviors which may cause false positive rate
Design Approach – Weight Adjusting Using ANN to train weights
Design Approach – Weight Adjusting (Cont.) • Neuron for ANN hidden layer: the first neuron
Design Approach – Weight Adjusting (Cont.) • Neuron for ANN output layer
Design Approach – Weight Adjusting (Cont.) Mean square error: Expected target value: ; output value: O (MD) Weight set: : learning factor; x: set of input values , Delta learning process
Design Approach – Malicious Degree • Malicious Degree Expression • Suspicious behaviors: • Weights: • Bias: • Transfer function: (tangent-sigmoid function)
Evaluation (Cont.) Use matlab 7.11.0 to implement ANN in our system Initial weights and bias: chosen by function initnwto distribute the weight of each neuron in the layer evenly [9] (according to the Nguyen-Widrowinitialization algorithm) Transfer function: tangent-sigmoid function Learning factor η () : 0.5
Evaluation (Cont.) Architecture of ANN (from matlab) :
Evaluation (Cont.) Malicious sample sources: Blast’s Security [10] and VX Heaven [11] websites Benign sample sources: Portable execution files under windows XP SP2 Training samples and testing samples
Evaluation – Training Phase Execution time: 3 seconds (training and testing phase) MD threshold (according to training samples)
Evaluation – Training Phase (Cont.) Choose MD threshold
Evaluation– Testing Phase Experiment results TP: True positive FN: False negative FP: False positive TN: True negative
Evaluation – Testing Phase (Cont.) Distribution of testing samples
Conclusion and Future Work • Conclusion • Collect 13 common behaviors of malwares • Construct Malicious Degree (MD) expression • FPR and FNR are as small as possible • Has a better effect on detecting unknown malwarecompared to the related work [14] [19] • Future work • Automate the proposed ANN-MD system • Implement PC-based sandboxes • Add more suspicious network behaviors • Classify malwares according to their typical behaviors
References [1] C. Mihai and J. Somesh, “Static analysis of executables to detect malicious patterns,” Proceedings of the 12th conference on USENIX Security Symposium, Vol. 12, pp. 169 - 186, Dec. 10-12, 2006. [2] J. Rabek, R. Khazan, S. Lewandowskia, and R. Cunningham, “Detection of injected, dynamically generated, and obfuscated malicious code,” Proceedings of the 2003 ACM workshop on Rapid malcode, pp. 76 - 82, Oct. 27-30, 2003. [3] W. Liu, P. Ren, K. Liu, and H. X. Duan, “Behavior-based malware analysis and detection,” Proceedings of Complexity and Data Mining (IWCDM), pp. 39 - 42, Sep. 24-28, 2011. [4] C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, “Malware detection based on suspicious behavior identification,” Proceedings of Education Technology and Computer Science, Vol. 2, pp. 198 - 202, Mar. 7-8, 2009. [5] GFI Sandbox. http://www.gfi.com/malware-analysis-tool [6] Norman Sandbox. http://www.norman.com/security_center/security_tools [7] Anubis Sandbox. http://anubis.iseclab.org/
References (Cont.) [8] U. Bayer, I. Habibi, D. Balzarotti, E. Krida, and C. Kruege, “A view on current malware behaviors,” Proceedings of the 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats : botnets, spyware, worms, and more, pp. 1 - 11, Apr. 22-24, 2009. [9] Neural Network Toolbox. http://dali.feld.cvut.cz/ucebna/matlab/toolbox/nnet/initnw.html [10] Blast's Security. http://www.sacour.cn [11] VX heaven. http://vx.netlux.org/vl.php