CAMP: Content-Agnostic Malware Protection

20th Annual Network & Distributed System Security Symposium (NDSS 2013) CAMP: Content-Agnostic Malware Protection NielsProvos, Moheeb Abu Rajab, Lucas Ballard, NoeLutz and Panayiotis Mavrommatis Google Inc. 左昌國 2013/04/01 Seminar @ ADLab, NCU-CSIE

X-agnostic • Without the knowledge of X • Content-agnostic malware protection • The protection operates without the knowledge of the malware content

Outline • Introduction • Related Work • System Architecture • Reputation System • Evaluation • Conclusion

Introduction • Malware distribution through web browsers • Drive-by Downloads • I will not talk about it in this paper • Social Engineering • Fake Anti-Virus • The defense? • Blacklists / Whitelists • Signature-based solution • CAMP • Reputation system • Low false positive

Related Work • Content-based Detection • Anti-virus software • CloudAV • Blacklist-based Protection • Google Safe Browsing API • McAfee Site Advisor • Symantec Safe Web • Whitelist-based Schemes • Bit9 • CoreTrace • Reputation-based Detection • SNARE • Notos and EXPOSURE • Microsoft SmartScreen

System Architecture Client Server

System Architecture – Binary Analysis

System Architecture – Binary Analysis • Producing labels (benign or malicious) for training purpose • To classify binaries based on static and dynamic analysis • The labels are also used to decide thresholds • Goal: low false positive

System Architecture – Client

System Architecture – Client • Doing local checks before asking the server for decision • In blacklists? Google Safe Browsing API • Potentially harmful? e.g. DMG files in Mac OS X • In whitelists? Trusted domains and trusted signing certificates • If no results in the local decision • Extracting features from the downloaded binary • Final download URL / IP address • Referrer URL / (corresponding) IP address • Size / hash • Signature • Sending the features to the server

System Architecture – Client • The returned decision

System Architecture – Client • ~70% of all downloads are considered benign due to policy or matching client-side whitelists • (on server side) Regularly analyzing binaries hosted on the trusted domains or signed by trusted signers

System Architecture – Client

System Architecture – Server

System Architecture – Server • The server receives the client request and renders a reputation verdict • The server uses the information to update its reputation data • BigTable and MapReduce

System Architecture – Frontend and Data Storage

System Architecture – Frontend and Data Storage • Frontend • RPC to reputation system • URL as index? • Popular URLs  timestamp(request to the URL) : Reverse-Ordered hexadecimal string

System Architecture – Spam Filtering

System Architecture – Spam Filtering • Velocity controls on the user IP address • The spam filter is employed to fetch binaries from the web that have not been analyzed by the binary classifier • Filter: only binaries that exhibit sufficient diversity of context • The analysis may complete a long time after a reputation decision was made

System Architecture – Aggregator

System Architecture – Aggregator • Aggregate • Forming the reputation data • 3-dimensional index • From where • Features • Categories: reputation / urls / hash • client | site:foo.com | reputation (6, 10) • analysis | ip:1.2.3.4/24 | urls (0, 3) • Value • (a, b) • a: the number of interesting observations • b: the total number of observations

Reputation System • Feature Extraction • IP address: single or netblock • URL: direct download or host/domain/site • Sign/Hash

Reputation System – Decision

Reputation System – Decision • Threshold • Thresholds are chosen according to the precision and recall for each AND gate • Precision and recall are determined from a labeled training set • Training set: matching (hash from requests) with (hash from binary analysis) • Binary analysis provides the label (benign or malicious) • Request provides the features • 4000 benign requests / 1000 malicious requests • Precision and recall • http://en.wikipedia.org/wiki/Precision_and_recall

Reputation System – Decision

Evaluation • Google Chrome • Targeting Windows executables • Accuracy of Binary Analysis • Compared against VirusTotal • 2200 samples selected • 1100 were labeled clean by binary analysis component • 1100 were labeled malicious • Submitting to VirusTotal and waiting for 10 days • 99% of the malicious labeled binaries were flagged by 20%+ of AV engines on VirusTotal • 12% of the clean labeled binaries were flagged 20%+ of AV engines on VirusTotal

Evaluation – Accuracy of CAMP • Feb. 2012 ~ July 2012 • Total 200 million users • Each day, 8~10 million request • 200~300 thousand labeled as malicious • Total 3.2 billion aggregates • , , , • Overall accuracy

Evaluation – Accuracy of CAMP

Evaluation – Comparison to other systems • A random sample of 10,000 binaries labeled as benign • 8,400 binaries labeled as malicious

Evaluation – Comparison to other systems

Evaluation – Case Study

Conclusion • This paper presents a content-agnostic malware protection system, CAMP • This paper performed a large scale of evaluation, and show that the detection approach is both accurate and good performance(processing requests in less 130ms)

CAMP: Content-Agnostic Malware Protection

CAMP: Content-Agnostic Malware Protection

Presentation Transcript

Malware – Cont.

Malware Hunting with the Sysinternals Tools

Malware Protection

Virus Primer

Camp La-No- Che

Alisha the Agnostic

Mobile Malware

PandaLabs Evolving Protection

Southern Synod Youth Camp

Welcome to Camp Day!

Camp Staff Youth Protection

CSC 382: Computer Security

INTRODUCTION TO COMPUTING

Mason Dixon Scholarship Camp Child Protection Policy

Malware and Phishing

3R IN OUR CAMP

CIT 380: Securing Computer Systems

Malware Reverse E ngineering

Practical Malware Analysis

XML Marks the Spot