Computer Virus - Parts and Classification

A computer virus has three parts:

  • Infection mechanism - How a virus spreads, by modifying other code to contain a (possibly altered) copy of the virus. The exact means through which a virus spreads is referred to as its infection vector. This doesn't have to be unique - a virus that infects in multiple ways is called multipartite.
  • Trigger - The means of deciding whether to deliver the payload or not.
  • Payload - What the virus does, besides spread. The payload may involve damage, either intentional or accidental. Accidental damage may result from bugs in the virus, encountering an unknown type of system, or perhaps unanticipated multiple viral infections.

Except for the infection mechanism, the other two parts are optional, because infection is one of the key defining characteristics of a virus. In the absence of infection, only the trigger and payload remain, which is a logic bomb.

In pseudocode, a virus would have the structure below. The trigger function would return a boolean, whose value would indicate whether or not the trigger conditions were met. The payload could be anything, of course.

def virus () : infect () if trigger () is true: payload ()

Infection is done by selecting some target code and infecting it, as shown below. The target code is locally accessible to the machine where the virus runs, applying the definition of viruses.

Locally accessible targets may include code in shared network directories, though, as these directories are made to appear locally accessible. Generally, k targets may be infected each time the infection code below is run.

The exact method used to select targets varies, and may be trivial, as in the case of the boot-sector infectors. The tricky part of select_target is that the virus doesn't want to repeatedly re-infect the same code; that would be a waste of effort, and may reveal the presence of the virus.

Select_target has to have some way to detect whether or not some potential target code is already infected, which is a double-edged sword. If the virus can detect itself, then so can anti-virus software. The infect _code routine performs the actual infection by placing some version of the virus' code in the target.

def infect () : repeat k times: target = select_target() if no target: return infect_code(target)

Viruses can be classified in a variety of ways. The next two sections classify them along orthogonal axes: the type of target the virus tries to infect, and the method the virus uses to conceal itself from detection by users and anti-virus software. Virus creation need not be difficult, either; the virus classification is followed by a look at do-it-yourself virus kits for the programming-challenged.

Classification by Target

One way of classifying viruses is by what they try to infect. This section looks at three: boot-sector infectors, executable file infectors, and data file infectors (a.k.a. macro viruses).

Boot-Sector Infectors

Although the exact details vary, the basic boot sequence on most machines goes through these steps:

  1. Power on.
  2. ROM-based instructions run, performing a self-test, device detection, and initialization. The boot device is identified, and the boot block is read from it; typically the boot block consists of the initial block(s) on the device. Once the boot block is read, control is transferred to the loaded code. This step is referred to as the primary boot.
  3. The code loaded during the primary boot step loads a larger, more sophisticated program that understands the boot device's filesystem structure, and transfers control to it. This is the secondary boot.
  4. The secondary boot code loads and runs the operating system kernel.

A boot-sector infector, or BSI, is a virus that infects by copying itself to the boot block. It may copy the contents of the former boot block elsewhere on the disk first, so that the virus can transfer control to it later to complete the booting process.

One potential problem with preserving the boot block contents is that block allocation on disk is filesystem-specific. Properly allocating space to save the boot block requires a lot of code, a luxury not available to BSIs.

An alternate method is to always copy the original boot block to some fixed, "safe" location on disk. This alternate method can cause problems when a machine is infected multiple times by different viruses that happen to use that same safe location.

This is an example of unintentional damage being done by a virus, and has actually occurred: Stoned and Michelangelo were BSIs that both picked the same disk block as their safe location.

In general, infecting the boot sector is strategically sound: the virus may be in a known location, but it establishes itself before any anti-virus software starts or operating system security is enabled.

But BSIs are rare now. Machines are rebooted less often, and there is very little use of bootable media like floppy disks. From a defensive point of view, most operating systems prevent writing to the disk's boot block without proper authorization, and many a BIOS has boot block protection that can be enabled.

File Infectors

Operating systems have a notion of files that are executable. In a broader sense, executable files may also include files that can be run by a command-line user "shell."

A file infector is a virus that infects files which the operating system or shell consider to be executable; this could include batch files and shell scripts, but binary executables are the most common target.

There are two main issues for file infectors:

  1. Where is the virus placed?
  2. How is the virus executed when the infected file is run?

For BSIs, the answer to these questions was apparent. A BSI places itself in the boot block and gets executed through a machine's normal boot sequence. File infectors have a few more options at their disposal, though, and often the answers to these questions are interdependent.

The remainder of this section is organized around the answer to the first question: where is the virus placed?

  • Beginning of File - Older, very simple executable file formats like the .COM MS-DOS format would treat the entire file as a combination of code and data. When executed, the entire file would be loaded into memory, and execution would start by jumping to the beginning of the loaded file.

In this case, a virus that places itself at the start of the file gets control first when the infected file is run. This is called a prepending virus. Inserting itself at the start of a file involves some copying, which isn't difficult, but isn't the absolute easiest way to infect a file.

  • End of File - In contrast, appending code onto the end of a file is extremely easy. A virus that places itself at the end of a file is called an appending virus. How does the virus get control?

There are two basic possibilities:

  • The original instruction(s) in the code can be saved, and replaced by a jump to the viral code. Later, the virus will transfer control back to the code it infected. The virus may try to run the original instructions directly in their saved location, or the virus may restore the infected code back to its original state and run it.
  • Many executable file formats specify the start location in a file header. The virus can change this start location to point to its own code, then jump to the original start location when done.
  • Overwritten into File - An overwriting virus places itself a^op part of the original code. This avoids an obvious change in file size that would occur with a prepending or appending virus, and the virus' code can be placed in a location where it will get control.

Obviously, overwriting code blindly is almost certain to break the original code and lead to rapid discovery of the virus. There are several options, with varying degrees of complexity and risk.

  • The virus can look for, and overwrite, sections of repeated values in the hopes of avoiding damage to the original code. Such values would tend to appear in a program's data rather than in the code, so a mechanism for gaining control during execution would have to be used as well. Ideally, the virus could restore the repeated value once it has finished running.
  • The virus can overwrite an arbitrary part of a file if it can somehow preserve the original contents elsewhere, similar to the BSI approach. An innocentlooking data file of some kind, like a JPEG file, could be used to stash the original contents.

A less-portable approach might take low-level details into account: many filesystems overallocate space for files, and an overwriting virus could quietly use this extra disk space without it showing up in normal filesystem operations.

  • Space may be overallocated inside a file too. Parts of an executable file may be padded so that they are aligned to a page boundary, so that the operating system kernel can efficiently map the executables into memory. The net result is unused space inside executable files where a virus may be located.
  • Conceivably, a virus could compress a part of the original code to make space for itself, and decompress the original code when the virus has completed execution. However, room would have to be made for both the virus and the decompression code.

None of these options is likely to yield a large amount of space, so overwriting viruses must be small.

  • Inserted into File - Another possibility is that a virus can insert itself into the target code, moving the target code out of the way, and even interspersing small pieces of virus code with target code. This is no easy feat: branch targets in the code have to be changed, data locations must be updated, and linker relocation information needs modification. Needless to say, this file infection technique is rarely seen.
  • Not in File - A companion virus is one which installs itself in such a way that it is naturally executed before the original code. The virus never modifies the infected code, and gains control by taking advantage of the process by which the operating system or shell searches for executable files. Although this bears the hallmarks of a Trojan horse, a companion virus is a "real" virus by virtue of self-replication.

The easiest way to explain companion viruses is by example.

  • The companion virus can place itself earlier in the search path, with the same name as the target file, so that the virus will be executed first when an attempt is made to execute the target file.
  • MS-DOS searches for an executable named f oo by looking for f oo. com, f00. exe, and f oo. bat, in that order. If the target file is a .EXE file, then the companion virus can be a .COM file with the same name.
  • The target file can be renamed, and the companion virus can be given the target file's original name.
  • Windows associates file types (as determined by the filename's extension) with applications in the Registry. With strategic Registry changes, the association for .EXE files can be made to run the companion virus instead of the original executable. Effectively, all executable files are infected at once.
  • The ELF file format commonly used on recent Unix systems has an "interpreter" specified in each executable's file header - this invariably points to the system's run-time linker. A companion virus can replace the run-time linker, again causing all executables to be infected at once.
  • Companion viruses are possible even in GUI-based environments. A target application's icon can be overlaid with the icon for the companion virus. When a user clicks on what they think is the application's icon, the companion virus runs instead.

Macro Viruses

Some applications allow data files, like word processor documents, to have "macros" embedded in them. Macros are short snippets of code written in a language which is typically interpreted by the application, a language which provides enough functionality to write a virus.

Thus, macro viruses are better thought of as data file infectors, but since their predominant form has been macros, the name has stuck. When a macro-containing document is loaded by the application, the macros can be caused to run automatically, which gives control to the macro virus.

Some applications warn the user about the presence of macros in a document, but these warnings may be easily ignored. A proof-of-concept of macro viruses was published in 1989, in response to rumors of their existence.

Macro viruses didn't hit the mainstream until 1995, when the Concept virus was distributed, targeting Microsoft Word documents across multiple platforms.

Word has a persistent, global set of macros which apply to all edited documents, and this is Concept's target: once installed in the global macros, it can infect all documents edited in the future.

A document infected by Concept includes two macros that have special properties in Word.

  • AutoOpen - Any code in the AutoOpen macro is run automatically when the file is opened. This is how an infected document gains control.
  • FileSaveAs - The code in the FileSaveAs macro is run when its namesake menu item (File... Save As...) is selected. In other words, this code can be used to infect any as-yet-uninfected document that is being saved by the user.

From a technical standpoint, macro languages are easier to use than lowerlevel programming languages, so macro viruses drastically lower the barrier to virus creation.