Virus vs Anti-Virus: The Arms Race

Virus vs Anti-Virus:The Arms Race Patrick Graydon Qiuhua Cao

Outline • Viruses • Anti-Viruses • Discussion

Viruses • A virus is “a program that can ‘infect other programs by modifying them to include a possibly evolved copy of itself.” - Fred Cohen • Fred Cohen seems to have been the first to define the term virus, but the concept had been discussed earlier and there were some viruses out in the wild before he began his research. • Link to virus history

Example of a virus • In his 1984 Turing award acceptance speech to the ACM, Ken Thompson related the story of how he modified the C compiler to insert a backdoor into the UNIX login program and to insert his modifications into any C compiler compiled using his modified compiler. • Slick—no trace of the backdoor remains in any source code!

Viruses example • The WM.Nuclear Microsoft Word macro virus infects Word documents during opening, saving, and printing by adding a set of macros to them. On April 5th it attempts to overwrite critical system files, and it occaisonally adds the text "STOP ALL FRENCH NUCLEAR TESTING IN THE PACIFIC!" to the current document. (Information from Symantec’s security bulletin.)

Worms are not viruses • The VBS.SST@mm “Anna Kournikova” malware is a worm, not a virus, because it e-mails copies of itself but does not infect any other documents. (Information about VBS.SST@mm from Symantec’s security bulletin.)

Malware terminology • We found a web site listing 56 different terms related to viruses and malware, including: • backdoor • boot sector viruses • Encrypted virus • Hoax • Micro virus • …

Virus statistics • Here are some statistics from 2000 we found on the web: • Over 85% of all the known viruses are for Microsoft platforms (nearly all the self-propagating worms are as well) • Slightly less than 52,000 are viruses for DOS/Windows/NT platforms- about 6000 of these are Word macro viruses- about 150-200 of these are known to be widespread "in the wild"- in 1999, approximately 650 new viruses were reported each month (more than 20 a day)

Virus statistics • More statistics from the same site • A few hundred are for Javascript, Hypercard, Perl, and other scripting languages. Few of these can spread beyond a few machines without active support of the users • 150 are for the Atari • 31 are native to the Macintosh, and only two of them are known to exist anymore • 2 or 3 are viruses native to OS/2

Virus statistics (cont’d) • More statistics from the same site • About 5 are for Linux/Unix/etc, but none have been found in quantity "in the wild", nor would they be likely to spread very far if they were "loose" • None are for BeOS, ErOS, or other small-population systems. • Question: can we reduce the risk of getting a virus infection by not using Microsoft products?

Example virus • Fred Cohen’s example virus: program virus := { 1234567; subroutine infect-executable := { loop:file = get-random-executable-file; if first-line-of-file = 1234567 then goto loop; prepend virus to file; } subroutine do-damage := { whatever damage is to be done } subroutine trigger-pulled := { return true if some condition holds } main-program := { infect-executable; if trigger-pulled then do-damage; goto next;} next: }

More about viruses • Viruses aren’t necessarily hard to write • Cohen reports that his first virus took only 8 hours for an experienced programmer to write. • Viruses aren’t necessarily big • Cohen reports on a UNIX shell script virus that was only 7 lines long

Viruses aren’t necessarily malware • Cohen describes a hypothetical virus that compresses executables to conserve disk space.

Viruses can be malicious in many ways • Virus payloads could: • Carry out a denial of service attack • Crash the machine • Randomly destroy data • Install a trojan horse program • Perform password cracking • … and basically any other nasty thing you can think of.

Making matters worse… • Virus payloads may not trigger immediately. If a virus has few detectable side effects, it could spread without notice and become widespread before the payload is triggered. • Question: is it possible that there are viruses in the wild today that have infected large numbers of systems but have gone unnoticed because they have few if any side effects and have not yet triggered their destructive payloads?

Isolation • One way to protect against infection is to isolate systems, users, and/or information to make it difficult or impossible for a virus to spread widely. • Total isolation is a sure cure. • Total isolation probably isn't practical for most users… • Imagine life without google … without BitTorrent … without Amazon.com …

Partitioning • If we can’t isolate systems and users from each other completely, maybe we can erect partitions to limit the spread of malware. • It was thought that the Bell-LaPadula model might help limit the spread of viruses, but Cohen reports that “viruses demonstrated the ability to cross users boundaries and move from a given security level to a higher security level.”

Partitioning (continued) • According to Cohen, the Biba and Bell-LaPadula models, if combined, would tend to create partitions. • Unfortunately: “When we mix the Biba and Bell-LaPadula models, we find that the resulting isolationism secures us from viruses, but doesn’t permit any user to write programs that can be used throughout the system.” – Cohen

Bad news about partitioning • Transitivity is a problem: • “If there is a path from user A to user B, and there is a path from user B to user C, then there is a path from user A to user C with the witting or unwitting cooperation of user B.” – Cohen • The military uses a category system in which users can only access information needed for their current duties. But, some users have simultaneous access to multiple categories…

More bad news… • According to Cohen “a precise system for integrity is NP-complete” and “any non-NP complete solution must tend toward isolationism.” • If a system restricts user’s actions unnecessarily, it will be unpopular…

And the hits just keep on coming… • Cohen notes that flow distance and flow list models may limit virus spread. • Flow distance restrictions limit how far information can travel. • Flow lists allow more arbitrary expressions for accessibility based on the list of users who have had an effect on an object. • BUT: “tracing exact information flow requires NP-complete time, and maintaining markings requires large amounts of space.”

Prevention by law • Couldn’t we just make it against the law? “By simply telling users not to launch attacks, little is accomplished; users who can be trusted will not launch attacks; but users who would do damage cannot be trusted, so only legitimate work is blocked.” - Cohen

Limited interpretation • If a given document is interpreted, and the interpreter lacks commands like “write file,” it may be impossible for it to have a virus • Graphics files are probably immune • Except AnnaKournikova.jpg.vbs  • Documents that can hold scripts probably aren’t • Word documents can contain macro viruses such as WM.Nuclear

Detection • If we can’t limit the spread of a virus, maybe we can find it and quarantine infected files… • Unfortunately, no general algorithm for detecting virus behavior is possible. • Cohen argues this by proposing a virus that infects only when the detection algorithm thinks it isn’t a virus. • Anti-virus programs must make do with more limited solutions, such as scanning for a virus signature.

Virus detection problems • According to Cohen, the following are undecidable: • Detection of a virus by its appearance • Detection of a virus by its behavior • Detection of an evolution of a known virus • Detection of a triggering mechanism by its appearance • Detection of a triggering mechanism by its behavior • Detection of an evolution of a known triggering mechanism • Detection of a virus detector by its appearance • Detection of a virus detector by its behavior • Detection of an evolution of a known viral detector

Detection by signature • Rather than implement a general solution, virus scanners look for virus signatures. • These signatures could be as small as a few bytes or as large as the entire virus code. • If a virus scanner uses the whole virus code as a signature, it may not be able to find simple variants of a virus. • However, if a virus uses a very small signature, it may incorrectly infections that aren’t there.

Updated signatures • Anti-virus companies must release new signatures each time a new virus is discovered • A virus’s spread is unimpeded for a while… • According to Andreas Marx of AV-Test.org, it took Symantec 25h 5m to release an updated signature file in response to the W32/Sober.C worm attack.

The arms race • In order to make it hard for virus scanners to detect their vurises, virus writers can add morphing behavior to their creations: • “A polymorphicvirus ‘morphs’ itself in order to evade detection. … Metamorphic viruses attempt to evade heuristic detection techniques by using more complex obfuscations.”– Christodorescu and Jha

More bad news… • Cohen argues that no general solution for proving the equivalence of two programs is possible. • His argument follows the same form as his argument against a general algorithm for virus detection: he proposes a virus in which two different infection instances will behave differently when a watching antivirus program believes they are the same.

Morphing • A virus may morph itself by: • Encrypting part of itself using a different key for each infection • Changing variable names (in a script virus) • Binary obfuscation techniques (more on this later) • Polymorphic virus examples: • Chameleon -- first polymorphic virus, 90’s • A partial list of the viruses that can be called 100 percent polymorphic (late 1993): Bootache, CivilWar (four versions), Crusher, Dudley, Fly, Freddy, Ginger, Grog, Haifa, Moctezuma (two versions), MVF, Necros, Nukehard, PcFly (three versions), Predator, Satanbug, Sandra, Shoker, Todor, Tremor, Trigger, Uruguay (eight versions). – at link Virus-Scan-Software

Arming the virus writers • If virus author knew what the anti-virus programs look for, he or she could design a virus that they wouldn’t find… • Example: in the early 90s there were a few MS-DOS 'stealth' viruses that could interrupt a virus-scanning program's attempt to read the boot record and show it a clean versions rather than what was really there. • See Symantec’s description of the Stealth_boot virus. • "Frodo.4096" virus, first Stealth virus • “Beast.512" Stealth virus, less than a year after Frodo.4096 • More on this at Virus-Scan-Software

Extracting signatures • Christodorescu and Jha report on a technique for extracting the signature used by a given antivirus program. • Basically they obfuscate parts of the program and determine what has to remain unobfuscated for the antivirus program to find the virus. • FYI there is a typo in the paper: the conditions on the loop in the SignatureExtraction function cause it to never execute… • They say it “was successful in many cases.”

Binary obfuscation techniques • The goal of binary obfuscation is to make it difficult to obtain an assembly-language description of a program from its raw bytes • You need to turn raw bytes back into assembly code before you can decompile • You can obfuscate by: • Garbage insertion (more in a minute) • Variable renaming • Code reordering • Encapsulating/encrypting code or data • …

x86 binary obfuscation • If you create unused regions in the executable and fill them with garbage bytes, the variable-length nature of the x86 instruction set can cause disassemblers to think that the legitimate instructions following the garbage are in fact operands. • You can use a conditional branch instruction to do an unconditional jump—disassemblers assume no garbage bytes at the target address or following the branch instruction.

Better obfuscation • Linn and Debray describe obfuscation using a branch function • This function in turn branches to another target depending on where it is called from. • This makes determining which parts of the program are real by following the branch instructions difficult. • The function can return to an instruction one or more bytes after the usual return point, opening up a region to insert more garbage bytes into.

Advances in disassembly • Kruegel, Robertson, Valeur and Vigna describe a disassembler that is able to correctly disassemble most instructions from a program obfuscated by the obfuscator Linn and Debray describe.

Dissasembly in detail • Static analysis techniques • Linear sweep • GNU's objdump uses linear sweep • Gets confused by garbage bytes in unreachable areas • Recursive traversal following control flow • Drawback: indirect jumps • Doesn’t always “see” the whole binary • Speculative disassembly • Hybrid approach

Now for some good news • “This arms race is usually in favor of the de-obfuscator. The obfuscator has to devise techniques that transform the program without seriously impacting the run-time performance or increasing the binary's size or memory footprint while there are nosuch constraints for the de-obfuscator.”- Kruegel et al

AV tool resistance to obfuscation • Christodorescu and Jha claim “the state of the art for malware detectors is dismal!” • They propose a testing technique and then use it to show that the tested virus scanners were not generally able to identify the sampled viruses when they were obfuscated by code reordering or encapsulation.

AV tool resistance to obfuscation (cont’d) • This doesn’t mean that these products aren’t capable of detecting morphing viruses—the viruses in the sample set did not perform these morphs in the wild. • This does mean that in order to protect against a new virus that is just a simple modification of one of these existing viruses the AV companies would have to release a new signature file.

Known clean system • Some virus detection techniques require you to start from a clean system. • DOS users used clean boot disks to defeat stealth viruses… • But is it always possible to get to a known clean state? • What if every UNIX vendor had been infected with Ken Thompson’s C compiler virus? Even their “clean” distribution media would be infected…

Discussion • Obfuscation vs deobfuscation, who can win?

Discussion (cont’d) • Anti-virus can win in the future?

Questions? Thanks

Virus vs Anti-Virus: The Arms Race