CYCLES FOR SCIENCE

Pentium: Pride and Prejudice

Can you trust your computer? This is the residual question that remains for many of us after the Intel Pentium affair.

Possibly a few readers haven't been following the Intel saga in the news. In brief, a subtle design error in the Pentium processor chip introduced errors in division of real (floating point) numbers. Only numbers containing certain bit patterns are affected. The patterns occur only very rarely in randomly distributed data, so most users will never notice a problem.

At least that was Intel's thinking. The company allegedly detected the problem last summer but sat on the information until an academic researcher discovered the bug and brought it to world attention. Intel initially stonewalled with an estimate that most users would never see an error in thousands of years of computer use. Intel would only provide a corrected chip to users who could "prove"that the error would affect their work--typically scientific users relying on floating point intensive calculations.

The market thought otherwise. Independent evaluations showed that some users would encounter errors unacceptably often. Consumers and equipment vendors who had paid premium prices for millions of "state of the art"Pentium computers demanded corrected chips. Eventually, but not until after much Internet rancor and bad press, Intel relented with an offer to replace any defective processor on request.

To keep the Pentium affair in perspective, remember that modern computer chips are incredibly complicated and logic errors have been discovered in past designs, despite careful testing. Most galling was that after Intel uncovered the flaw the company continued to ship its product in volume, did nothing to notify its customers, and after the fault became public tried to put the burden of proof on the same customers.

Some of the fallout has been useful. Competition in the PC world may have improved as the public realizes that there are alternatives to "Intel inside," in particular the Apple-IBM-Motorola PowerPC chip. In the future, we can expect Intel to be more aware of customer relations, beyond producing TV ads with flying chips.

More interestingly, the Pentium affair highlights how we deal with risk. This has received a lot of discussion in the comp.risks newsgroup, which I commend to readers. I am also indebted to EE Prof. Richard Barker for talks on these points. There are a number of questions for thought.

For vendors, what is the right strategy to manage design and fabrication risk? If a complex chip is to have any chance for correct operation, its design must be exhaustively simulated and tested. A judgment must be made, however, how much is enough analysis. Simulation is expensive, and there is an enormous opportunity cost when the product is late to market. On the other hand, a defect discovered after the product is shipping in volume is vastly more expensive to resolve than when it is found in the design stage. A failure in the field also loses important intangibles--prestige and good will. Psychological reactions are multiplied if, like Intel, the vendor does not deal with the issue promptly and decisively.

As users we do have to ask whether we can trust our computer's results. A rational list of risks must include whether data is entered correctly, whether files might be lost if a disk crashes, whether software might be flawed, etc. We would like to assume that deep down the CPU (central processing unit) does really do arithmetic correctly, but we should understand, especially for scientific work, that computer numbers are only approximations of reality and that we must always guard against loss of significance.

In critical numerical work we need to prove that there are no errors by comparing test cases with analytic solutions, by using reasonableness and sensitivity checks, and perhaps by comparing results of independent algorithms. Running the same program on different machine types--on a Pentium PC and on a Power Macintosh or Sun workstation--can catch many subtle problems.

The Pentium episode will have been useful if the public and the computing community have a healthier skepticism of computer data and if it helps to bring more competition into the personal computer chip market.

No, you cannot trust your computer very far. Let "caveat calculator"be your rule. Martin Ewing is director of the Science and Engineering Computing Facility. Send comments and questions to martin.ewing@yale.edu, or call 432-4243.

[Back to Index]