Host-based, Run-time Win32 Bot Detection

Host-based, Run-time Win32 Bot Detection Liz Stinson John Mitchell Stanford University

Exploit botnet characteristic: ongoing command-and-control • Network-based approaches: • Filtering (protocol, port, host, content-based) • Look for traffic patterns (e.g. DynDNS – Dagon) • Hard (encrypt traffic, permute to look like ‘normal’ traffic, …); botwriters control the arena. • Host-based approaches: • Ours: Have more info at host level. Since the bot is controlled externally, use this meta-level behavioral signature as basis of detection

Our approach • Look at the syscalls made by a program • In particular at certain of their args – our sinks • Possible sources for these sinks: • local: { mouse, keyboard, file I/O, … } • remote: { network I/O } • An instance of external control occurs when data from a remote source reaches a sink • Surprisingly works really well: for all bots tested (ago, dsnx, evil, g-sys, sd, spy), every command that exhibited external control was detected

Big picture S O U R C E S Remote sources Local sources mouse file I/O network I/O keyboard ? ? ? ? S I N K S CreateProcessA(…) NtCreateFile(…) bind(…) ...

Design Init CLEAN (S_0) Taint instantiators Taint propagators TAINTED (S_1) Taint checker on gate G execute G ERROR

2 modes • Cause-and-effect semantics: • Tight relationship between receipt of some data over network and subsequent use of some portion of that data in a sink • Correlative semantics: looser relationship • Use of some data that is the same as some data received over the network • Why necessary?

Behaviors: ideally disjoint; @ lowest level in call stack MoveFile{Ex}{A,W}, MoveFileWithProgress{Ex}{A,W}, DeleteFile{A,W}, ReplaceFile{A,W}, Win32DeleteFile, ... CreateFile{A,W}, OpenFile,CopyFile{Ex}{A,W},fopen, _open, _lopen, _lcreat, ... ShellExecute{Ex}{A,W}, CreateProcess{A,W}, WinExec send, sendto, WSASend, WSASendTo

Results • Looked at 6 bots: agobot, dsnxbot, evilbot, g-sysbot, sdbot, spybot • At least 4 have totally indep code bases • g-sys non-trivially extends sd • Spybot borrows only syn flood implem from sd • Wide variation in implementation • Every cmd that exhibited external control detected; almost every instance external control flagged (3 false negatives)

create proc connect port kill proc connect IP file create sendto port sendto IP file open tainted send bind IP

Correlative semantics • Why necessary • Why bots with C library functions statically linked in ~= unconstrained OOB copies • In general almost as good as cause-and-effect semantics (stat vs. dyn link) • Exceptions: cmds that format recv’d params (e.g. via sprintf)

connect port create process file create connect IP sendto port kill process tainted send sendto IP file open derived send

agobot cause-and-effect (dyn) vs. correlative semantics (stat) – continued tainted send derived send connect port connect IP sendto IP bind port bind IP

Benign program testing • Tested against some benign programs that interact with the network • Firefox, mIRC, Unreal IRCd • 3 contextual false positives • IRCd: sent on X heard on Y • Firefox: dereferencing embedded links • Artificial false positives: quite a few • mIRC: DCC capabilities • Firefox: saving contents to a file, …

False positives • contextual false positives – not present in bots • external control heuristic correctly detected but these actions under these circumstances widely accepted as non-malicious • artificial false positives – not present in bots • def of external control implies no user input agreeing to particular behavior • but we don’t track “explicitly clean” data (that received via kb, mouse) • spurious false positives • any other incorrect flagging of external control

Evading detection • Don’t do anything: dormant bots not detected • Convert parameterized bot commands to take no params (occurs at cost of granularity of control) • Encrypt using private crypto fxns • Maybe we can do better here. • Get out of the detours sandbox (next slide) • Write commands to file; read back into memory in way that defeats our OOB-resilience mechs • Different implem would render this obsolete

Future directions • Minor enhancements • Add resilience to near-match use of params • Build directly on current implementation • Identify high-level cmds (from component behavs) • Identify executable as a variant of some family • Apply to other bots or to other malware • Broaden def of untrusted sources (files, USB drives) • Big changes: • Move tainting to lower level (e.g. via x86 emulator) • Improve resilience to OOB copies via stat anal • Track explicitly-clean data; sanitize uses of it

Our mechanism – review • Single behavioral meta-signature detects wide variety of behaviors on majority of Win32 bots • Resilient to differences in implementation • Resilient in face of unconstrained OOB copies • Resilient to encryption – w/some constraints • Resilient to changes in command-and-control protocol (e.g. from IRC to HTTP) and parameters (e.g. for rendezvous point)

Addt’l sections • Motivation, general issues and observations • Implementation details • More results + false negatives/positives

Motivation, general issues • Bots as a research problem • Exploiting reliance on ongoing C&C • Network- vs. Host-based approaches • HIPS issues

Bots as a research problem • @point of infection • by other bots which scan for vulns then spread (like a worm) • as stage N of a virus’s/worm’s payload/script • unwittingly by users • @lifetime on host • @ Observation: (1) and (3) well explored already; let’s look at (2).

Host-based approaches: anti-{virus,worm} vs. anti-bot • Much anti-malware research: how to tell when a good program has been coerced into behaving badly/out-of-spec (viruses, worms) • executes impossible sequence of syscalls • contains sequence of bytes (virus sig) • is xferring control to a remotely-supplied addr • Bots pose a slightly different problem: how to tell a good program from a bad one in a way that is useful and robust?

General issues for a host-based IPS • How can program evade detection while operating within confines of sandbox? • How can program escape the sandbox? • Not hard. But wouldn’t be so much harder with a kernel-space interposition mechanism either. • How can program attack the sandbox? • State-holding attacks Solutions(?): implement detect mech as part of VMM Sophisticated versions of the bots we looked at ship with anti-emulator, anti-VM, anti-debugger mechs.

Implementation – addt’l info • Basic idea • Interposition mechanism • Types of tainted things • Tainting functions: • Taint instantiators • Taint propagators • Deciding whether a memory region is tainted • Taint checkers • Gates

Basic idea • Taint data from untrusted sources (e.g. network sockets) • Transitively taint memory regions to which tainted data propagates • Also keep track of tainted strings (the contents of a tainted memory region) • At gates (selected syscalls), check whether certain arguments are tainted • In some cases, whether a numeric value is a known tainted value

User-space interposition • API interposition via detours library • Hunt/Brubacher, MSR • http://research.microsoft.com/sn/detours/ • Overwrite first 5 bytes of memory image of target function with an unconditional jump to replacement function • Don’t need src; no changes to target binary • Applies to calls made by calls • Link time irrelevant

Pictorally – c/o detours folks Before: 1. Call Start Target 2. Return After: 1. Call 2. Jump 3. Call 4. Jump Start Target Detour Trampoline Target 6. Return 5. Return

Two types of tainted things • Memory regions:[addy, addy + len) tuples • Plus info for false positive mitigation (to handle case where some tainted memory region written to OOB with untainted data) • Values: strings, numbers • Why strings? Resilience against OOB copies • Why numbers?atoi (port #s, PIDs), ntohs/htons (IPs, port #s); numeric values used in pass-by-value manner

Taint instantiation: network receive functions recv( addy )  len for every w_i ε [addy, addy + len) S_0 −= w_i S_1 U= w_i

Taint instantiators: network receive functions • NtDeviceIoControlFile : asynch • So: recv(…), recvfrom(…), WSARecv(…), WSARecvFrom(…), WSARecvDisconnect(…), … • Do two things here: • Create cached copy of the received buffer (used in false positive mitigation – later) • Add this [base, base + bounds) pair to tainted addrs

Taint propagation Copy X  Y X ε S_1 ? NO YES S_0 −= Y S_1 U= Y S_0 U= Y S_1 −= Y

Taint propagators: 3 types • Take character buffer input, output character buffer that is sub- or super-set of input • buffer-formatting fxns, mem-copying fxns, realloc, … • Take character buffer, output numeric value • atoi, gethostbyname, … • Take character buffer • Improves resilience to OOB copies

Deciding whether amemory region is tainted • Overlaps with a tainted memory region? • Yes: perform false positive mitigation • if succeeds, add src to tainted strings • No: goto 2 • Is a tainted string? • Yes: transitively taint source mem region • For correlative semantics only: are the contents of the memory region a subset of some received over network?

Taint checking on arg X to gate G X ε S_1 ? NO YES FLAG nop Execute G

Gate functions • Process management: tainted filenames, proc names, PIDs • tainted program execution • tainted process termination • File management: tainted filenames • tainted file open • tainted file create • Network interaction: • binding tainted IP or port • connecting to tainted IP or port • sendto/WSASendTo tainted IP or port • tainted HttpSendRequest • tainted IcmpSendEcho • tainted send • derived send

Results – addt’l info • Results for ago, dsnx, evil, g-sys, sd, spy • False negatives • False positives

agobot – spamming, lots of DDoS, redirect options

Spybot 1.3 (4/5/03) – richest file mgmt command set

False negatives • In all cases, still detected associated cmd • Many commands match more than one behavior • HttpSendRequest • Private StringBuffer implementation in wininet.dll used to craft actual HTTP message • Spybot downloading file • Since no proactive tokenizing of tainted strings (e.g. creating a tainted hostname and tainted filepath string upon addition of a tainted URL)

False positives Def: external control heuristic incorrectly flagged • Doesn’t necessarily mean flagged action non-malicious • In fact in all cases, flagged behavior was part of implementation of some command (i.e. was malicious) • Bounded by # of cmds that make calls to gate functions without taking params for those fxns

“the most sophisticated piece of mal-ware we’ve seen…” (June ’06, APWG) “there are more than 3,000 variations of the Spybot worm” (Jan ’05, Kevin Hogan, Symantec)

Host-based, Run-time Win32 Bot Detection

Host-based, Run-time Win32 Bot Detection

Presentation Transcript

Xinwen Fu Tripwire Host Based Intrusion Detection System HIDS

Host – Based Intrusion Detection

Bayesian Bot Detection Based on DNS Traffic Similarity

Host-Based Intrusion Detection

OSSEC HIDS, Host Based Intrusion Detection System

Host Based Intrusion Detection: Analyzing System Logs

Serial Run-time Error Detection and the Fortran Standard

Host Based Security

Identification of Bot Commands By Run-time Execution Monitoring

Model-Based Run-Time Error Detection

Host-based, Run-time Win32 Bot Detection

Supporting Model-Based Validation at Run-time

Towards real-time camera based logos detection

Host-Based Intrusion Detection

Symptoms-Based Detection of Bot Processes

Run-Time...

New bot detection script

Bot Detection Script

Host-Based Intrusion Detection

Mimicry Attacks on Host-Based Intrusion Detection