SMART Hardisk Technology
Self-Monitoring, Analysis, and Reporting Technology (SMART) is an industry standard providing failure prediction for disk drives. When SMART is enabled for a given drive, the drive monitors predetermined attributes that are susceptible to or indicative of drive degradation.
Based on changes in the monitored attributes, a failure prediction can be made. If a failure is deemed likely to occur, SMART makes a status report available so the system BIOS or driver software can notify the user of the impending problems, perhaps enabling the user to back up the data on the drive before any real problems occur.
Predictable failures are the types of failures SMART attempts to detect. These failures result from the gradual degradation of the drive's performance. According to Seagate, 60% of drive failures are mechanical, which is exactly the type of failures SMART is designed to predict.
Of course, not all failures are predictable, and SMART can't help with unpredictable failures that occur without any advance warning. These can be caused by static electricity; improper handling or sudden shock; or circuit failure, such as thermal-related solder problems or component failure.
SMART was originally created by IBM in 1992. That year IBM began shipping 3 1/2'' hard disk drives equipped with Predictive Failure Analysis (PFA), an IBM-developed technology that periodically measures selected drive attributes and sends a warning message when a predefined threshold is exceeded.
IBM turned this technology over to the ANSI organization, and it subsequently became the ANSI-standard SMART protocol for SCSI drives, as defined in the ANSI-SCSI Informational Exception Control (IEC) document X3T10/94-190. Interest in extending this technology to ATA drives led to the creation of the SMART Working Group in 1995.
Besides IBM, other companies represented in the original group were Seagate Technology, Conner Peripherals (now a part of Seagate), Fujitsu, Hewlett-Packard, Maxtor, Quantum, and Western Digital.
The SMART specification produced by this group and placed in the public domain covers both ATA and SCSI hard disk drives and can be found in most of the more recently produced drives on the market. The SMART design of attributes and thresholds is similar in ATA and SCSI environments, but the reporting of information differs.
In an ATA environment, driver software on the system interprets the alarm signal from the drive generated by the SMART "report status" command. The driver polls the drive on a regular basis to check the status of this command and, if it signals imminent failure, sends an alarm to the operating system where it is passed on via an error message to the end user.
This structure also enables future enhancements, which might allow reporting of information other than drive failure conditions. The system can read and evaluate the attributes and alarms reported in addition to the basic "report status" command.
SCSI drives with SMART communicate a reliability condition only as either good or failing. In a SCSI environment, the failure decision occurs at the disk drive and the host notifies the user for action. The SCSI specification provides for a sense bit to be flagged if the drive determines that a reliability issue exists. The system then alerts the end user via a message.
Note that traditional disk diagnostics such as Scandisk work only on the data sectors of the disk surface and do not monitor all the drive functions that are monitored by SMART. Most modern disk drives keep spare sectors available to use as substitutes for sectors that have errors.
When one of these spares is reallocated, the drive reports the activity to the SMART counter but still looks completely defect-free to a surface analysis utility, such as Scandisk. Drives with SMART monitor a variety of attributes that vary from one manufacturer to another.
Attributes are selected by the device manufacturer based on their capability to contribute to the prediction of degrading or fault conditions for that particular drive. Most drive manufacturers consider the specific set of attributes being used and the identity of those attributes as vendor specific and proprietary.
Some drives monitor the floating height of the head above the magnetic media. If this height changes from a nominal figure, the drive could fail. Other drives can monitor different attributes, such as ECC circuitry that indicates whether soft errors are occurring when reading or writing data.
Some of the attributes monitored on various drives include the following:
-
Head floating height
-
Data throughput performance
-
Spin-up time
-
Reallocated (spared) sector count
-
Seek error rate
-
Seek time performance
-
Drive spin-up retry count
-
Drive calibration retry count
Each attribute has a threshold limit that is used to determine the existence of a degrading or fault condition. These thresholds are set by the drive manufacturer, can vary among manufacturers and models, and can't be changed.
The basic requirements for SMART to function in a system are simple: You just need a SMART-capable hard disk drive and a SMART-aware BIOS or hard disk driver for your particular operating system. If your BIOS does not support SMART, utility programs are available that can support SMART on a given system.
These include Norton Utilities from Symantec, EZ Drive from StorageSoft, and Data Advisor from Ontrack. An excellent free utility called SMARTDefender can be downloaded from Hitachi Global Storage (formerly IBM). This program monitors the SMART status of drives in the background and can be manually run to check the SMART status of a drive.
It includes a SMART Monitor system tray application that performs SMART tests and capacity monitoring based on SMARTDefender settings. A SMARTDefender icon appears in the system tray when the monitor is running.
You can disable the background monitoring (SMART Monitor) and run SMARTDefender manually to check the current health of a hard disk drive as well as run these tests:
-
SMART Status Check. Performs a quick check of the SMART status of a hard disk.
-
SMART Short Self-Test. Performs a short (about 90-second) self-test of a hard disk.
-
SMART Extended Self-Test. Performs a comprehensive self-test of a hard disk to identify impending failures. This can take a long time to complete.
Any drives reporting a SMART failure should be considered likely to fail at any time. Of course, you should back up the data on such a drive immediately, and you might consider replacing the drive before any actual data loss occurs.
When sufficient changes occur in the monitored attributes to trigger a SMART alert, the drive sends an alert message via an IDE/ATA or a SCSI command (depending on the type of hard disk drive you have) to the hard disk driver in the system BIOS, which then forwards the message to the operating system.
The operating system then displays a warning message as follows:
Immediately back up your data and replace your hard disk drive. A failure may be imminent.
The message might contain additional information, such as which physical device initiated the alert; a list of the logical drives (partitions) that correspond to the physical device; and even the type, manufacturer, and serial number of the device.
The first thing to do when you receive such an alert is to heed the warning and back up all the data on the drive. You also should back up to new media and not overwrite any previous good backups you might have, just in case the drive fails before the backup is complete.
After backing up your data, what should you do? SMART warnings can be caused by an external source and might not actually indicate that the drive itself is going to fail. For example, environmental changes such as high or low ambient temperatures can trigger a SMART alert, as can excessive vibration in the drive caused by an external source.
Additionally, electrical interference from motors or other devices on the same circuit as your PC can induce these alerts. If the alert was not caused by an external source, a drive replacement might be indicated. If the drive is under warranty, contact the vendor and ask whether they will replace it.
If no further alerts occur, the problem might have been an anomaly, and you might not need to replace the drive. If you receive further alerts, replacing the drive is recommended.
If you can connect both the new and existing (failing) drive to the same system, you might be able to copy the entire contents of the existing drive to the new one, saving you from having to install or reload all the applications and data from your backup.