1 / 40

Host-based, Run-time Win32 Bot Detection

Host-based, Run-time Win32 Bot Detection. Liz Stinson John Mitchell Stanford University. Exploit botnet characteristic: ongoing command-and-control. Network-based approaches: Filtering (protocol, port, host, content-based) Look for traffic patterns (e.g. DynDNS – Dagon)

clarice
Download Presentation

Host-based, Run-time Win32 Bot Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Host-based, Run-time Win32 Bot Detection Liz Stinson John Mitchell Stanford University

  2. Exploit botnet characteristic: ongoing command-and-control • Network-based approaches: • Filtering (protocol, port, host, content-based) • Look for traffic patterns (e.g. DynDNS – Dagon) • Hard (encrypt traffic, permute to look like ‘normal’ traffic, …); botwriters control the arena. • Host-based approaches: • Ours: Have more info at host level. Since the bot is controlled externally, use this meta-level behavioral signature as basis of detection

  3. Our approach • Look at the syscalls made by a program • In particular at certain of their args – our sinks • Possible sources for these sinks: • local: { mouse, keyboard, file I/O, … } • remote: { network I/O } • An instance of external control occurs when data from a remote source reaches a sink • Surprisingly works really well: for all bots tested (ago, dsnx, evil, g-sys, sd, spy), every command that exhibited external control was detected

  4. Big picture S O U R C E S Remote sources Local sources mouse file I/O network I/O keyboard ? ? ? ? S I N K S CreateProcessA(…) NtCreateFile(…) bind(…) ...

  5. Design Init CLEAN (S_0) Taint instantiators Taint propagators TAINTED (S_1) Taint checker on gate G execute G ERROR

  6. 2 modes • Cause-and-effect semantics: • Tight relationship between receipt of some data over network and subsequent use of some portion of that data in a sink • Correlative semantics: looser relationship • Use of some data that is the same as some data received over the network • Why necessary?

  7. Behaviors: ideally disjoint; @ lowest level in call stack MoveFile{Ex}{A,W}, MoveFileWithProgress{Ex}{A,W}, DeleteFile{A,W}, ReplaceFile{A,W}, Win32DeleteFile, ... CreateFile{A,W}, OpenFile,CopyFile{Ex}{A,W},fopen, _open, _lopen, _lcreat, ... ShellExecute{Ex}{A,W}, CreateProcess{A,W}, WinExec send, sendto, WSASend, WSASendTo

  8. Results • Looked at 6 bots: agobot, dsnxbot, evilbot, g-sysbot, sdbot, spybot • At least 4 have totally indep code bases • g-sys non-trivially extends sd • Spybot borrows only syn flood implem from sd • Wide variation in implementation • Every cmd that exhibited external control detected; almost every instance external control flagged (3 false negatives)

  9. create proc connect port kill proc connect IP file create sendto port sendto IP file open tainted send bind IP

  10. Correlative semantics • Why necessary • Why bots with C library functions statically linked in ~= unconstrained OOB copies • In general almost as good as cause-and-effect semantics (stat vs. dyn link) • Exceptions: cmds that format recv’d params (e.g. via sprintf)

  11. connect port create process file create connect IP sendto port kill process tainted send sendto IP file open derived send

  12. agobot cause-and-effect (dyn) vs. correlative semantics (stat) – continued tainted send derived send connect port connect IP sendto IP bind port bind IP

  13. Benign program testing • Tested against some benign programs that interact with the network • Firefox, mIRC, Unreal IRCd • 3 contextual false positives • IRCd: sent on X heard on Y • Firefox: dereferencing embedded links • Artificial false positives: quite a few • mIRC: DCC capabilities • Firefox: saving contents to a file, …

  14. False positives • contextual false positives – not present in bots • external control heuristic correctly detected but these actions under these circumstances widely accepted as non-malicious • artificial false positives – not present in bots • def of external control implies no user input agreeing to particular behavior • but we don’t track “explicitly clean” data (that received via kb, mouse) • spurious false positives • any other incorrect flagging of external control

  15. Evading detection • Don’t do anything: dormant bots not detected • Convert parameterized bot commands to take no params (occurs at cost of granularity of control) • Encrypt using private crypto fxns • Maybe we can do better here. • Get out of the detours sandbox (next slide) • Write commands to file; read back into memory in way that defeats our OOB-resilience mechs • Different implem would render this obsolete

  16. Future directions • Minor enhancements • Add resilience to near-match use of params • Build directly on current implementation • Identify high-level cmds (from component behavs) • Identify executable as a variant of some family • Apply to other bots or to other malware • Broaden def of untrusted sources (files, USB drives) • Big changes: • Move tainting to lower level (e.g. via x86 emulator) • Improve resilience to OOB copies via stat anal • Track explicitly-clean data; sanitize uses of it

  17. Our mechanism – review • Single behavioral meta-signature detects wide variety of behaviors on majority of Win32 bots • Resilient to differences in implementation • Resilient in face of unconstrained OOB copies • Resilient to encryption – w/some constraints • Resilient to changes in command-and-control protocol (e.g. from IRC to HTTP) and parameters (e.g. for rendezvous point)

  18. Addt’l sections • Motivation, general issues and observations • Implementation details • More results + false negatives/positives

  19. Motivation, general issues • Bots as a research problem • Exploiting reliance on ongoing C&C • Network- vs. Host-based approaches • HIPS issues

  20. Bots as a research problem • @point of infection • by other bots which scan for vulns then spread (like a worm) • as stage N of a virus’s/worm’s payload/script • unwittingly by users • @lifetime on host • @ Observation: (1) and (3) well explored already; let’s look at (2).

  21. Host-based approaches: anti-{virus,worm} vs. anti-bot • Much anti-malware research: how to tell when a good program has been coerced into behaving badly/out-of-spec (viruses, worms) • executes impossible sequence of syscalls • contains sequence of bytes (virus sig) • is xferring control to a remotely-supplied addr • Bots pose a slightly different problem: how to tell a good program from a bad one in a way that is useful and robust?

  22. General issues for a host-based IPS • How can program evade detection while operating within confines of sandbox? • How can program escape the sandbox? • Not hard. But wouldn’t be so much harder with a kernel-space interposition mechanism either. • How can program attack the sandbox? • State-holding attacks Solutions(?): implement detect mech as part of VMM Sophisticated versions of the bots we looked at ship with anti-emulator, anti-VM, anti-debugger mechs.

  23. Implementation – addt’l info • Basic idea • Interposition mechanism • Types of tainted things • Tainting functions: • Taint instantiators • Taint propagators • Deciding whether a memory region is tainted • Taint checkers • Gates

  24. Basic idea • Taint data from untrusted sources (e.g. network sockets) • Transitively taint memory regions to which tainted data propagates • Also keep track of tainted strings (the contents of a tainted memory region) • At gates (selected syscalls), check whether certain arguments are tainted • In some cases, whether a numeric value is a known tainted value

  25. User-space interposition • API interposition via detours library • Hunt/Brubacher, MSR • http://research.microsoft.com/sn/detours/ • Overwrite first 5 bytes of memory image of target function with an unconditional jump to replacement function • Don’t need src; no changes to target binary • Applies to calls made by calls • Link time irrelevant

  26. Pictorally – c/o detours folks Before: 1. Call Start Target 2. Return After: 1. Call 2. Jump 3. Call 4. Jump Start Target Detour Trampoline Target 6. Return 5. Return

  27. Two types of tainted things • Memory regions:[addy, addy + len) tuples • Plus info for false positive mitigation (to handle case where some tainted memory region written to OOB with untainted data) • Values: strings, numbers • Why strings? Resilience against OOB copies • Why numbers?atoi (port #s, PIDs), ntohs/htons (IPs, port #s); numeric values used in pass-by-value manner

  28. Taint instantiation: network receive functions recv( addy )  len for every w_i ε [addy, addy + len) S_0 −= w_i S_1 U= w_i

  29. Taint instantiators: network receive functions • NtDeviceIoControlFile : asynch • So: recv(…), recvfrom(…), WSARecv(…), WSARecvFrom(…), WSARecvDisconnect(…), … • Do two things here: • Create cached copy of the received buffer (used in false positive mitigation – later) • Add this [base, base + bounds) pair to tainted addrs

  30. Taint propagation Copy X  Y X ε S_1 ? NO YES S_0 −= Y S_1 U= Y S_0 U= Y S_1 −= Y

  31. Taint propagators: 3 types • Take character buffer input, output character buffer that is sub- or super-set of input • buffer-formatting fxns, mem-copying fxns, realloc, … • Take character buffer, output numeric value • atoi, gethostbyname, … • Take character buffer • Improves resilience to OOB copies

  32. Deciding whether amemory region is tainted • Overlaps with a tainted memory region? • Yes: perform false positive mitigation • if succeeds, add src to tainted strings • No: goto 2 • Is a tainted string? • Yes: transitively taint source mem region • For correlative semantics only: are the contents of the memory region a subset of some received over network?

  33. Taint checking on arg X to gate G X ε S_1 ? NO YES FLAG nop Execute G

  34. Gate functions • Process management: tainted filenames, proc names, PIDs • tainted program execution • tainted process termination • File management: tainted filenames • tainted file open • tainted file create • Network interaction: • binding tainted IP or port • connecting to tainted IP or port • sendto/WSASendTo tainted IP or port • tainted HttpSendRequest • tainted IcmpSendEcho • tainted send • derived send

  35. Results – addt’l info • Results for ago, dsnx, evil, g-sys, sd, spy • False negatives • False positives

  36. agobot – spamming, lots of DDoS, redirect options

  37. Spybot 1.3 (4/5/03) – richest file mgmt command set

  38. False negatives • In all cases, still detected associated cmd • Many commands match more than one behavior • HttpSendRequest • Private StringBuffer implementation in wininet.dll used to craft actual HTTP message • Spybot downloading file • Since no proactive tokenizing of tainted strings (e.g. creating a tainted hostname and tainted filepath string upon addition of a tainted URL)

  39. False positives Def: external control heuristic incorrectly flagged • Doesn’t necessarily mean flagged action non-malicious • In fact in all cases, flagged behavior was part of implementation of some command (i.e. was malicious) • Bounded by # of cmds that make calls to gate functions without taking params for those fxns

  40. “the most sophisticated piece of mal-ware we’ve seen…” (June ’06, APWG) “there are more than 3,000 variations of the Spybot worm” (Jan ’05, Kevin Hogan, Symantec)

More Related