|
Understanding the Role
Precision Plays in HALT
by Ted Kalal, Flextronics, and Mark Levin, Teradyne
For successful HALT testing, repeatability is as important as accuracy.
Generally, one of the more important objectives when implementing an experiment
is test reFpeatability. An experiment must be repeatable to validate the test
results. In fact, a well-written test report contains all the information needed
for anyone to repeat a given test and verify the results. When performing a
highly accelerated life testing (HALT) experiment, the same is true, but test
repeatability is defined differently. HALT is a destructive test performed only on a small sample size, typically
three to five devices. Test repeatability is defined as the capability to
reproduce a failure mode, but the exact stress level required to reproduce the
failure will vary. Reproducibility of a failure mode at a precise stress level
is not expected nor required. Highly accelerated stress screening (HASS) is a nondestructive screen where
screened production hardware is delivered to the end user. For that reason, when
performing HASS, the distribution of stress levels that induce failures is
important and characterized before production implementation. Repeatability in HALT has a different meaning than repeatability in HASS. In
both cases, we do not expect repeatability of a failure mode to be reproduced at
a precise stress level. Instead, we expect it to be reproducible over a stress
range. As a result, tight control of the environmental chamber is not required
and represents an unnecessary constraint. What do we mean by test repeatability? The two most common terms used to
describe test repeatability are accuracy and precision. The familiar bull’s-eye
chart is used to describe the difference between the two. Accuracy is described
as the capability to be within a particular measure from an expected value
(Figure 1). Here we have good accuracy if the requirement is ±3% but the
precision is not as good.
|

Figure 1. Accuracy Bull’s-Eye |
Precision is the capability to get repeatable results independent of the fact
that the results are accurate (Figure 2).
|

Figure 2. Tightly-Grouped Results Indicating Higher Precision |
This brings us back to test repeatability and the role it plays first in HALT.
During design validation, HALT is performed to identify the weak points in the
design. It is a form of Elephant test where the product is increasingly stressed
beyond the product specification limits to identify its operational soft and
hard failure limits. The soft failures are points where the product fails to function properly when
under stress but returns to operating normally when the stress is reduced or
removed. Hard failures are observed when the product fails to function properly
when under stress and does not return to operating normally when the stress is
reduced or removed. The hard and soft failure points identify the initial precipitation and
detection limits for HASS. Knowing the hard and soft failure limits allows you
to optimize the environmental stress that can be imposed in HASS early in
manufacturing to identify reliability escapes, manufacturing issues, and
supplier component problems. However, there is a problem with this theory. The soft and hard limits are not
points but distributions. The distribution is the result of variability in
components, manufacturing, design sensitivities, and stress. The hard and soft failures identified in HALT must be reproducible if you expect
to make design improvements. If a failure mode can be reproduced, then design
improvements intended to remove the failure mode can be evaluated. This is one
of the golden rules in experimental design. Like any important rule, it can be
misinterpreted. Some practitioners of HALT try to apply the golden rule of test repeatability to
failures found from combinational stress such as temperature and vibration
stress levels. They consider it important that a device failure be repeatable
based on a particular stress level. They want to be able to expose subsequent
devices to the same stress levels and get the same results. Take, for example, a device that has a failure mode at 70°C. For simplicity,
we’ll consider the failure is to be a design issue. Since it is a design issue,
we expect other devices to fail at 70°C as well. However, a second device will likely behave differently due to tolerance
stack-up, component variability, variability in the assembly process, and
environmental stress variability. It would be remarkable for several like
devices to exhibit the same design failure at precisely 70°C. The result is
shown in Figure 3 where five devices are tested for temperature step stress
only.
|

Figure 3. Distribution of HALT Failures |
We expect subsequent devices to exhibit the same failure mode, assuming it is a
design issue, but at different stress levels. The more components the device has
in its design, the greater the expected variability. We also might find a couple
of different failure modes that interact and cause even greater variability. This example was for only one stress. What about vibration levels? The same
arguments hold for vibration stress levels. In addition, consider that when
using combined stress such as vibration and temperature the effect of one stress
relative to the other is more complex. For some failure mechanisms, the combined stress accelerates failures but the
reverse also is possible. HALT typically is a combinational stress test. If we
considered the combinational stress of temperature and voltage, the results
would look like the familiar bull’s-eye chart in Figure 4. This is what really
happens in HALT.
|

Figure 4. Varying Stress Sensitivity Among Failure Modes |
From another point of view, place two like devices in the HALT chamber at the
same time and run the HALT temperature stress profile. If there is a design
issue, they both eventually will fail. The slower the rate the temperature is increased, the greater the time between
the first and second failure. So what does this mean when it comes to precision
control of temperature of the HALT chamber? As we have shown, it is not
necessary to buy an expensive HALT chamber capable of very tight temperature
control. An environmental chamber that controls temperature to within a degree
or two will work well for HALT.
Remember that the important result of a HALT test is what failed, not the stress
level required to precipitate the failure, assuming the failure is not the
result of a material changing phase states. The intent of HALT is to increase
the stress to a device until it fails. The process identifies the weak links in
the design. Root-cause analysis is performed to establish the failure mechanism
and understand the physics of the failure. Based on this knowledge, a design change can be made to remove the failure
mechanism. A subsequent HALT test is performed to verify that the design change
removed the failure mechanism and that no new failure modes were injected. In selecting a HALT chamber, there are many other things to consider.1 Most HALT
chambers are relatively similar in performance parameters. So other factors like
reliability, service support, flexibility, ease of use, turnkey capability, and
supplier longevity are more important. Be sure to do your due diligence when
selecting the right chamber for your HALT needs. Reference
1. Levin, M. and Kalal, T., Product Reliability, 2003. About the Authors
Ted Kalal is the director of product assurance and reliability engineering at
Flextronics. The University of Wisconsin graduate has held many positions as a
contract engineer and consultant. He has also authored several papers on
electronic circuitry and holds a patent in the field of power electronics. Flextronics,
640 Shiloh Rd., Plano, TX 75074, 469-229-2404, e-mail:
ted.kalal@flextronics.com
Mark Levin is Teradyne’s reliability manager for product development at
Semiconductor Test. He has a B.S. in electrical engineering from the University
of Arizona and an M.S. in technology management from Pepperdine University and
is a graduate student at the University of Maryland’s reliability engineering
program. Mr. Levin has held several management and research positions at Hughes
Aircraft’s Missiles Systems Group, Hughes Aircraft’s Microwave Products
Division, General Medical Company, and Medical Data Electronics. Teradyne, 30801
Agoura Rd., Agoura Hills, CA 91301, 818-874-7155, e-mail:
Mark.levin@teradyne.com |