Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang

Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu

A worm is malicious code that propagates over a network, with or without human assistance • worm authors are looking for new ways to acquire vulnerable targets • search worms propagates automatically by copying itself to target systems • search worms can severely harm search engines • worms send carefully crafted queries to search engines which evade identification mechanisms that assume random scanning

Search worms generate search queries, analyze search results and infects identified targets • return as many unique targets as possible using a list of prepared queries • search for popular domains to extract email addresses • prune search results, remove duplicates, ignore URLs that belong to the search engine itself • exploit identified targets, reformat URLs to include the exploit and bootstrapping code

MyDoom.O, a type of search worm requires human intervention to spread • spreads via email containing an executable file as an attachment • searches local hard drive for email addresses • figure below shows the number of infected hosts and the number of MyDoom.O queries that Google received per second • Peak scan rate, more than 30,000 queries per second.

Santy is the first search worm to propagate automatically, without any human intervention • written in Perl, exploits a bug in phpBB bulletin board system • after injecting arbitrary code into Web server running phpBB, uses google to search for more targets and connects infected machine to an IRC botnet • graph below shows a time-line of infected IP addresses • for three different Santy variants in December 2004 each variant manages to infect about four thousand different IP addresses.

Graphical description of the dependencies between different Santy variants using a honeypot • shows the dependency between Santy variants from August 2005 to May 2006 • each node is labelled by the filename downloaded to the infected host, two nodes are connected with an edge if their line difference computed via diff is minimal in respect to all other variants • this graph shows that some variants of Santy have been continuously modified for over six months

architecture of the worm mitigation system is split into three phases: • Anomaly identification step • Signature generation step • Index based filtering

Identifying abnormal traffic automatically blocks parts of the worm traffic after observing IP addresses • classify the IP addresses responsible for abnormal traffic • maintaining a map of frequent words which are used to compute the compound probability for a query • flag an IP address abnormal which sends too many low probability queries

signature generation step generates signatures based on Polygraph • extracts tokens from bad queries to create signatures matching the bad traffic • hierarchical clustering is used to merge signatures until a predefined false positive threshold is reached • false positives are computed by matching signatures against a good query set. • following signature was generated in an experiment token extraction on a cluster of 85 2.4 GHz Intel Xeon machines • GET /search\?q=.*\+-modules&num=[0-9][0-9]+&start=

Index-based filtering modifies search index to handle multiple search queries mapping to similar result pages • search worm relies on a search engine to obtain a list of potentially vulnerable targets. If the search engine does not provide any vulnerable targets in the • search results, the worm fails to spread • tag all pages that seem to contain vulnerable information while crawling • query results are not returned if they have pages from many hosts and when majority of them are tagged as vulnerable

Conclusion • worms spread by querying a search engine for new targets to infect and uses the information collected by search engines • signature generation along with anomaly identification is not effective in preventing a worm from spreading • proposed solution is CPU efficient and is query independent as well as classifies web pages as vulnerable if they belong to an exploitable server or contain potential infection targets

Pros and Cons • Pros • query independent index-based filtering • using word based features(tokenization), Phishing URLs contain several suggestive word tokens. • Cons • signature-based approach is a good option if given good seed queries • cannot find new attacks for which we have no prior knowledge • lacks a module which could analyze malicious pages to automatically extract the searches which in turn can help in finding vulnerable targets

Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang

Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang

Presentation Transcript

Expanding Square Search Pattern

Caenorhabditis elegans (C. elegans) An elegant worm

Search Engine

Dracunculiasis (Guinea Worm Disease)

Worms

Keyword-based Search and Exploration on Databases

Chapter Overview Search

Chapter 46

Search Engine Optimization (SEO)

Faceted Metadata for Information Architecture and Search CHI Course - April 24, 2006 Session I

Chapter 25 Worms and Mollusks

Mike Bisset / 毕楷杰

159.741 STATE-SPACE SEARCH

2006 RMU Town Hall Meetings

Collaborators

Worms and Mollusks

TCSS 342, Winter 2006 Lecture Notes

US Discovery Workshop 13 th March – 24 th March 2006

Animal Evolution –The Invertebrates

Chapter 46

Chapter Overview Search