Virus Verification, Quarantine, and Disinfection

Once a virus is detected, few people will want to have it remain on their computer. The tasks for anti-virus software that lie beyond detection are verification, quarantine, and disinfection. Compared to detection, these three tasks are performed rarely, and can be much slower and more resource-intensive if necessary.

Verification

Virus detection usually doesn't provide the last word as to whether or not code is infected. Anti-virus software will often perform a secondary verification after the initial detection of a virus occurs.

Verification is performed for two reasons. First, it is used to reduce false positives that might happen by coincidence, or by the use of short or overly general signatures. Second, verification is used to positively identify the virus.

Identification is normally necessary for disinfection, and to prevent being led astray; virus writers will sometimes deliberately make their virus look like another one. In the absence of verification, anti-virus software can misidentify the virus and do unintentional damage to the system when cleaning up after the wrong virus.

Verification may begin by transforming the virus so as to make more information available. One way to accomplish this, when an encrypted virus is suspected, is for the anti-virus software to try decrypting the virus body to reveal a larger signature. This process is called X-raying.

For emulation-based anti-virus software, X-raying is a natural side effect of operation. X-raying may be automated in easier ways than emulation, if some simplifying assumptions are allowed.

A virus using simple encryption or a static encryption key (with or without random encryption keys) does not hide the frequency with which encrypted bytes occur; these encryption algorithms preserve the frequency of values that was present in the unencrypted version.

Cryptanalysts were taking advantage of frequency analysis to crack codes as early as the 9th century CE, and the same principle applies to virus decryption. Normal, uninfected executables (i.e., the plaintext) tend to have frequently-repeated values, like zeroes.

Under the assumptions above, if the most frequently-occurring plaintext value is known, then the most frequently-occurring values in an encrypted version of code (ciphertext) should correspond to it.

For example, say that 99 is the most frequent value in plaintext, and 27 is most frequent in the ciphertext. For XOR-based encryption, the key must be 120 (99 xor 27). Back to verification, once all information is made available, verification may be done in a number of ways:

  • Comparing the found virus to a known copy of the virus. Shipping viruses with anti-virus software would be rather unwise, making this option only suitable for use in anti-virus labs.
  • Using a virus-specific signature, for detection methods that aren't signaturebased to begin with. If the initial detection was signature-based, then a longer signature can be used for verification.
  • Checksumming all or part of the suspected virus, and comparing the computed checksum to the known checksum of that virus.
  • Calling special-purpose code to do the verification, which can be written in a general-purpose or domain-specific programming language.

Except for special-purpose code, these are not viable solutions for metamorphic viruses, because they rely on the (unencrypted) virus body being the same for each infection.

Quarantine

When a virus is detected in a file, anti-virus software may need to quarantine the infected file, isolating it from the rest of the system. Quarantine is only a temporary measure, and may only be done until the user decides how to handle the file (e.g., giving approval to disinfect it).

In other cases, the anti-virus software may have generically detected a virus, but have no idea how to clean it. Here, quarantine may be done until an anti-virus update is available that can deal with the virus that was discovered.

Quarantine can simply be a matter of copying the infected file into a distinct "quarantine" directory, removing the original infected file, and disabling all permission to access the infected file. The problem is that the file permissions may be easily changed by a user, and files may be copied out of a quarantine directory in a virulent form.

A good solution limits further spread by accident, or casual copying, but shouldn't be elaborate, as accessing the infected file for disinfection will still be necessary. One solution is to encrypt quarantined files by some trivial means, like an XOR with a constant.

The virus is thereby rendered inert, because an executable file encrypted this way will no longer be runnable, and copying the file does no harm. Also, an encrypted, quarantined file is readily accessible for disinfection. Another solution is to render the files in the quarantine directory invisible - what can't be seen can't be copied.

Anti-virus software can accomplish this feat using file-hiding techniques like stealth viruses and rootkits use. However, this may not be the best idea, as viruses may then try to hide in the quarantine directory, letting the anti-virus software cloak their presence. There could also be issues with false positives produced by virus-like behavior from anti-virus software.

Disinfection

Disinfection does not mean that an infected system has been restored to its original state, even if the disinfection was successful. In some cases, like overwriting viruses that don't preserve the original contents, disinfection is just not possible. As with everything else anti-virus, there are different ways to do disinfection:

  • Restore infected files from backups. Because everyone meticulously keeps backups of their files, the affected files can be restored to their backed-up state. Some files are meant to change, like data files, and consequently restoring these files may result in data loss.

There are also viruses called data diddlers, which are viruses whose payload slowly changes files. By the time a data diddler has been detected, it can have made many subtle changes, and those changed files - not the original ones - would have been caught on the backups.

  • Virus-specific. Anti-virus software can encode in its database the information necessary to disinfect each known virus. Many viruses share characteristics, like relocating an executable's start address, so in many cases disinfection is a matter of invoking generic disinfection subroutines with the correct parameters.

Virus-specific information needed for disinfection can be derived automatically by anti-virus researchers, at least for relatively simple viruses. Goat files with different properties can be deliberately infected, and the resulting corpus of infected files can be compared to the originals.

This comparison can reveal where a virus puts itself in an infected file, how the virus gets control, and where any relocated bytes from the original file may be found. This can be likened to a chosen-plaintext attack in cryptography.

  • Virus-behavior-specific. Rather than customize disinfection to individual viruses, disinfection can be attempted based on assumptions about viral behavior. For prepending viruses, or appenders that gain control by modifying the program header, disinfection is a matter of: restoring the original program header; moving the original file contents back to their original location.

Anti-virus software can store some information in advance for each executable file on an uninfected system which can be used later for disinfection. The necessary information to store is the program header, the file length, and a checksum of the executable file's contents sans header.

This disinfection technique integrates well with integrity checkers, since integrity checkers store roughly the same information anyway. For an infected file, the saved program header can be immediately restored.

The tricky part is determining where the original file contents reside, because a prepending virus may have shifted them from their original location in the file.

The disinfector knows the checksum of the original file contents, however - it can iterate over the infected file, checksumming the same number of bytes as were used for the original checksum (the uninfected file length minus the header length).

If the new checksum matches the stored checksum, then the original file contents have been located and can be restored. The number of checksum iterations needed in the worst case is equivalent to the added length of the virus, the difference between the lengths of the infected and uninfected files.

This method naturally enjoys several built-in safety checks which guard against situations where this disinfection method is inapplicable. The computed virus length can be checked for too-small, or even negative, values. Failure to match the stored checksum in the prescribed number of iterations also flags inapplicability.

  • Using the virus' code:
  • Stealth viruses happily supply the uninfected contents of a file. Antivirus software can exploit this to disinfect a stealth virus by simply asking the virus for the file's contents.
  • Generic disinfection methods assume that the virus will eventually restore and jump to the code it infected. A generic disinfector executes the virus under controlled conditions, watching for the original code to be restored by the virus on the disinfector's behalf.

Cruder disinfection can be done by zeroing out the virus, or simply deleting the infected file. This will eradicate the virus, but won't restore the system at all.