Vista's Stability Improvements

Few things in this life are as frustrating as an operating system that won't operate, either because Windows itself has given up the ghost or because some program has locked up solid and taken Windows down with it. Unfortunately, computer problems, like the proverbial death and taxes, seem to be one of those constants in life.

Fortunately, each new version of Windows seems to be a little more stable and a little better at handling misbehaving programs than its predecessor, so at least we're heading in the right direction. It's still early, but it looks as though Windows Vista is continuing this positive trend. Vista will ship with a passel of new tools and technologies designed to prevent crashes and to recover from them gracefully if they do occur. The next few sections take you through the most important of these stability improvements.

I/O Cancellation

If you've used Windows for a while, you've probably come across a Windows Error Reporting dialog box similar to the one shown in below.. This error message is generated by the Windows Dr. Watson debugging tool, and it includes not only a description of the error, but also the option to send an error report to Microsoft.

This program continues with Vista's new Windows Feedback services. This is an opt-in error-reporting service designed to provide Microsoft and program developers with much more detailed information about program crashes.

That can only be a good thing because it's clear that these kinds of reports are useful. Microsoft has received and studied many such reports over the years, and we're starting to see the fruits of this labor in Windows Vista, which comes with built-in fixes for many of the most common causes of program crashes.

The most common of these by far is when a program has made an input/output (I/O) request to a service, resource, or another program, but that process is busy or otherwise incommunicado. In the past, the requesting program would often simply wait forever for the I/O data, thus resulting in a hung program and requiring a reboot to get the system running again.

To prevent this all-too-common scenario, Windows Vista implements an improved version of a technology called I/O cancellation, which can detect when a program is stuck waiting for an I/O request and then can cancel that request to help the program recover from the problem. Microsoft is also making I/O cancellation available to developers via an API, so programs, too, can cancel their own unresponsive requests and automagically recover themselves.

Reliability Monitor

In previous versions of Windows, the only way you could tell whether your system was stable was to think about how often in the recent past you were forced to reboot. If you couldn't remember the last time your system required a restart, you could assume that your system was stable. Not exactly a scientific assessment!

Windows Vista changes all that by introducing the Reliability Monitor. This new feature is part of the Windows Performance Diagnostic Console, which I discuss in more detail later. You load this Microsoft Management Console snap-in by pressing Windows Logo+R, typing perfmon.msc, and clicking OK.

In the console window that appears, click Reliability Monitor. Reliability monitor keeps track of the overall stability of your system, as well as reliability events, which are either changes to your system that could affect stability or occurrences that might indicate instability. Reliability events include the following:

  • Windows updates
  • Software installs and uninstalls
  • Device driver installs, updates, rollbacks, and uninstalls
  • Application hangs and crashes
  • Device drivers that fail to load or unload
  • Disk and memory failures
  • Windows failures, including boot failures, system crashes, and sleep failures

Reliability monitors graph these changes and generate a measure of system stability over time so that you can graphically see whether any changes affected system stability. The System Stability Chart shows the overall stability index. A score of 10 indicates a perfectly reliable system, and lower scores indicate decreasing reliability.

Service Recovery

A service is a program or process that works in the background to perform a specific, low-level support function for the operating system. You can see all the services on your system by opening Computer Management (right-click Computer or My Computer, and click Manage) and then selecting Services and Applications, Services.

On most systems you'll see more than 125 different services listed. Many services are mission-critical, and if any of these crucial services fail, it almost always means that the only way to recover your system is to shut down and restart your computer. With Windows Vista, however, every service has a recovery policy that enables Vista to restart not only the service, but also any other service or process that is dependent on the failed service.

Startup Repair Tool

When your computer won't start, it's bad enough that you can't get to your programs and data and that your productivity nosedives. What's even worse is that you can't get to your normal troubleshooting and diagnostics tools to see what the problem might be.

Yes, there are startup troubleshooting techniques, but they can often be time-consuming, hit-or-miss affairs. If Windows is in its own partition, or if there's a solid backup ready, many people would prefer to simply reinstall Windows than spend an entire day tracking down a startup problem.

Such drastic solutions could be a thing of the past, thanks to Vista's new Startup Repair Tool (SRT), which is designed to fix many common startup problems automatically. When a startup failure occurs, Vista starts the SRT immediately. The program then analyzes the startup logs and performs a series of diagnostic tests to determine the cause of the startup failure. The SRT looks for a number of possible problems, but three are the most common:

  • Incompatible or corrupted device drivers
  • Missing or corrupted startup configuration files
  • Corrupted disk metadata

If the SRT determines that the startup failure is being caused by one of these problems or some other common snag, the SRT attempts to fix the problem automatically. If it's successful, it lets you know what repairs it made and writes all changes to a log file so you can see exactly what transpired.

If the SRT can't fix the problem, it tries the system's Last Known Good Configuration. If that doesn't work, it writes all of its diagnostic data to a log and offers you support options to try to fix the problem yourself.