You are on page 1of 44

Virus vs Anti-Virus: The Arms Race

Patrick Graydon Qiuhua Cao

Outline

Viruses Anti-Viruses Discussion

Viruses

A virus is a program that can infect other programs by modifying them to include a possibly evolved copy of itself. - Fred Cohen Fred Cohen seems to have been the first to define the term virus, but the concept had been discussed earlier and there were some viruses out in the wild before he began his research.

Link to virus history

Example of a virus

In his 1984 Turing award acceptance speech to the ACM, Ken Thompson related the story of how he modified the C compiler to insert a backdoor into the UNIX login program and to insert his modifications into any C compiler compiled using his modified compiler.

Slickno trace of the backdoor remains in any source code!

Viruses example

The WM.Nuclear Microsoft Word macro virus infects Word documents during opening, saving, and printing by adding a set of macros to them. On April 5th it attempts to overwrite critical system files, and it occaisonally adds the text "STOP ALL FRENCH NUCLEAR TESTING IN THE PACIFIC!" to the current document. (Information from Symantecs security bulletin.)

Worms are not viruses

The VBS.SST@mm Anna Kournikova malware is a worm, not a virus, because it emails copies of itself but does not infect any other documents. (Information about VBS.SST@mm from Symantecs security bulletin.)

Malware terminology

We found a web site listing 56 different terms related to viruses and malware, including:

backdoor boot sector viruses Encrypted virus Hoax Micro virus

Virus statistics

Here are some statistics from 2000 we found on the web:

Over 85% of all the known viruses are for Microsoft platforms (nearly all the self-propagating worms are as well) Slightly less than 52,000 are viruses for DOS/Windows/NT platforms - about 6000 of these are Word macro viruses - about 150-200 of these are known to be widespread "in the wild" - in 1999, approximately 650 new viruses were reported each month (more than 20 a day)

Virus statistics

More statistics from the same site

A few hundred are for Javascript, Hypercard, Perl, and other scripting languages. Few of these can spread beyond a few machines without active support of the users

150 are for the Atari 31 are native to the Macintosh, and only two of them are known to exist anymore 2 or 3 are viruses native to OS/2

Virus statistics (contd)

More statistics from the same site

About 5 are for Linux/Unix/etc, but none have been found in quantity "in the wild", nor would they be likely to spread very far if they were "loose" None are for BeOS, ErOS, or other smallpopulation systems.

Question: can we reduce the risk of getting a virus infection by not using Microsoft products?

Example virus

Fred Cohens example virus:


program virus := { 1234567; subroutine infect-executable := { loop:file = get-random-executable-file; if first-line-of-file = 1234567 then goto loop; prepend virus to file; } subroutine do-damage := { whatever damage is to be done } subroutine trigger-pulled := { return true if some condition holds } main-program := { infect-executable; if trigger-pulled then do-damage; goto next;} next: }

More about viruses

Viruses arent necessarily hard to write

Cohen reports that his first virus took only 8 hours for an experienced programmer to write. Cohen reports on a UNIX shell script virus that was only 7 lines long

Viruses arent necessarily big

Viruses arent necessarily malware

Cohen describes a hypothetical virus that compresses executables to conserve disk space.

Viruses can be malicious in many ways

Virus payloads could:

Carry out a denial of service attack Crash the machine Randomly destroy data Install a trojan horse program Perform password cracking and basically any other nasty thing you can think of.

Making matters worse

Virus payloads may not trigger immediately. If a virus has few detectable side effects, it could spread without notice and become widespread before the payload is triggered.

Question: is it possible that there are viruses in the wild today that have infected large numbers of systems but have gone unnoticed because they have few if any side effects and have not yet triggered their destructive payloads?

Isolation

One way to protect against infection is to isolate systems, users, and/or information to make it difficult or impossible for a virus to spread widely. Total isolation is a sure cure.

Total isolation probably isn't practical for most users Imagine life without google without BitTorrent without Amazon.com

Partitioning

If we cant isolate systems and users from each other completely, maybe we can erect partitions to limit the spread of malware. It was thought that the Bell-LaPadula model might help limit the spread of viruses, but Cohen reports that viruses demonstrated the ability to cross users boundaries and move from a given security level to a higher security level.

Partitioning (continued)

According to Cohen, the Biba and BellLaPadula models, if combined, would tend to create partitions.

Unfortunately: When we mix the Biba and BellLaPadula models, we find that the resulting isolationism secures us from viruses, but doesnt permit any user to write programs that can be used throughout the system. Cohen

Bad news about partitioning

Transitivity is a problem:

If there is a path from user A to user B, and there is a path from user B to user C, then there is a path from user A to user C with the witting or unwitting cooperation of user B. Cohen

The military uses a category system in which users can only access information needed for their current duties. But, some users have simultaneous access to multiple categories

More bad news

According to Cohen a precise system for integrity is NP-complete and any non-NP complete solution must tend toward isolationism.

If a system restricts users actions unnecessarily, it will be unpopular

And the hits just keep on coming

Cohen notes that flow distance and flow list models may limit virus spread.

Flow distance restrictions limit how far information can travel. Flow lists allow more arbitrary expressions for accessibility based on the list of users who have had an effect on an object. BUT: tracing exact information flow requires NPcomplete time, and maintaining markings requires large amounts of space.

Prevention by law

Couldnt we just make it against the law? By simply telling users not to launch attacks, little is accomplished; users who can be trusted will not launch attacks; but users who would do damage cannot be trusted, so only legitimate work is blocked. - Cohen

Limited interpretation

If a given document is interpreted, and the interpreter lacks commands like write file, it may be impossible for it to have a virus

Graphics files are probably immune

Except AnnaKournikova.jpg.vbs
Word documents can contain macro viruses such as WM.Nuclear

Documents that can hold scripts probably arent

Detection

If we cant limit the spread of a virus, maybe we can find it and quarantine infected files

Unfortunately, no general algorithm for detecting virus behavior is possible.

Cohen argues this by proposing a virus that infects only when the detection algorithm thinks it isnt a virus. Anti-virus programs must make do with more limited solutions, such as scanning for a virus signature.

Virus detection problems

According to Cohen, the following are undecidable:


Detection of a virus by its appearance Detection of a virus by its behavior Detection of an evolution of a known virus Detection of a triggering mechanism by its appearance Detection of a triggering mechanism by its behavior Detection of an evolution of a known triggering mechanism Detection of a virus detector by its appearance Detection of a virus detector by its behavior Detection of an evolution of a known viral detector

Detection by signature

Rather than implement a general solution, virus scanners look for virus signatures.

These signatures could be as small as a few bytes or as large as the entire virus code. If a virus scanner uses the whole virus code as a signature, it may not be able to find simple variants of a virus. However, if a virus uses a very small signature, it may incorrectly infections that arent there.

Updated signatures

Anti-virus companies must release new signatures each time a new virus is discovered

A viruss spread is unimpeded for a while According to Andreas Marx of AV-Test.org, it took Symantec 25h 5m to release an updated signature file in response to the W32/Sober.C worm attack.

The arms race

In order to make it hard for virus scanners to detect their vurises, virus writers can add morphing behavior to their creations:

A polymorphic virus morphs itself in order to evade detection. Metamorphic viruses attempt to evade heuristic detection techniques by using more complex obfuscations. Christodorescu and Jha

More bad news

Cohen argues that no general solution for proving the equivalence of two programs is possible.

His argument follows the same form as his argument against a general algorithm for virus detection: he proposes a virus in which two different infection instances will behave differently when a watching antivirus program believes they are the same.

Morphing

A virus may morph itself by:


Encrypting part of itself using a different key for each infection Changing variable names (in a script virus) Binary obfuscation techniques (more on this later)
Chameleon -- first polymorphic virus, 90s A partial list of the viruses that can be called 100 percent polymorphic (late 1993): Bootache, CivilWar (four versions), Crusher, Dudley, Fly, Freddy, Ginger, Grog, Haifa, Moctezuma (two versions), MVF, Necros, Nukehard, PcFly (three versions), Predator, Satanbug, Sandra, Shoker, Todor, Tremor, Trigger, Uruguay (eight versions). at link Virus-Scan-Software

Polymorphic virus examples:


Arming the virus writers

If virus author knew what the anti-virus programs look for, he or she could design a virus that they wouldnt find

Example: in the early 90s there were a few MS-DOS 'stealth' viruses that could interrupt a virus-scanning program's attempt to read the boot record and show it a clean versions rather than what was really there.

See Symantecs description of the Stealth_boot virus. "Frodo.4096" virus, first Stealth virus Beast.512" Stealth virus, less than a year after Frodo.4096 More on this at Virus-Scan-Software

Extracting signatures

Christodorescu and Jha report on a technique for extracting the signature used by a given antivirus program.

Basically they obfuscate parts of the program and determine what has to remain unobfuscated for the antivirus program to find the virus.

FYI there is a typo in the paper: the conditions on the loop in the SignatureExtraction function cause it to never execute

They say it was successful in many cases.

Binary obfuscation techniques

The goal of binary obfuscation is to make it difficult to obtain an assembly-language description of a program from its raw bytes

You need to turn raw bytes back into assembly code before you can decompile You can obfuscate by:

Garbage insertion (more in a minute) Variable renaming Code reordering Encapsulating/encrypting code or data

x86 binary obfuscation

If you create unused regions in the executable and fill them with garbage bytes, the variable-length nature of the x86 instruction set can cause disassemblers to think that the legitimate instructions following the garbage are in fact operands. You can use a conditional branch instruction to do an unconditional jumpdisassemblers assume no garbage bytes at the target address or following the branch instruction.

Better obfuscation

Linn and Debray describe obfuscation using a branch function

This function in turn branches to another target depending on where it is called from.

This makes determining which parts of the program are real by following the branch instructions difficult. The function can return to an instruction one or more bytes after the usual return point, opening up a region to insert more garbage bytes into.

Advances in disassembly

Kruegel, Robertson, Valeur and Vigna describe a disassembler that is able to correctly disassemble most instructions from a program obfuscated by the obfuscator Linn and Debray describe.

Dissasembly in detail

Static analysis techniques

Linear sweep

GNU's objdump uses linear sweep Gets confused by garbage bytes in unreachable areas Drawback: indirect jumps Doesnt always see the whole binary Hybrid approach

Recursive traversal following control flow


Speculative disassembly

Now for some good news

This arms race is usually in favor of the deobfuscator. The obfuscator has to devise techniques that transform the program without seriously impacting the run-time performance or increasing the binary's size or memory footprint while there are no such constraints for the de-obfuscator. - Kruegel et al

AV tool resistance to obfuscation

Christodorescu and Jha claim the state of the art for malware detectors is dismal!

They propose a testing technique and then use it to show that the tested virus scanners were not generally able to identify the sampled viruses when they were obfuscated by code reordering or encapsulation.

AV tool resistance to obfuscation (contd)

This doesnt mean that these products arent capable of detecting morphing virusesthe viruses in the sample set did not perform these morphs in the wild. This does mean that in order to protect against a new virus that is just a simple modification of one of these existing viruses the AV companies would have to release a new signature file.

Known clean system

Some virus detection techniques require you to start from a clean system.

DOS users used clean boot disks to defeat stealth viruses But is it always possible to get to a known clean state?

What if every UNIX vendor had been infected with Ken Thompsons C compiler virus? Even their clean distribution media would be infected

Discussion

Obfuscation vs deobfuscation, who can win?

Discussion (contd)

Anti-virus can win in the future?

Questions?

Thanks

You might also like