This article discusses the Weibull distribution and how it is used in the field of reliability engineering.
Reliability engineering uses statistics to plan maintenance, determine life cycle cost, predict failures, and determine warranty periods for products.
This is a common topic that is discussed in all fields of engineering and is seen often in power electronics, in particular. If you have to design a product for space, drugs, or other specialized fields, where subsystem failures can cause mission failure or loss of life, you should study the New Weibull Handbook, on which this article is based.
If you spend some time on reliability engineering, you are sure to come across the Weibull distribution. The Swedish engineer Waloddi Weibull introduced this probability distribution to the world in 1951 and it is still widely used today.
Before you start, you can read my first article introducing the concept of reliability engineering to get some basic information.
Route failures: the road to Weibull
Product families used in a similar way will fail on predictable timelines. This excludes failures due to external factors (electrostatic discharge, improper handling, intentional abuse, etc.).
Weibull charts record the percentage of products that have failed over an arbitrary period of time which can be measured in startup cycles, hours of run time, driven in miles, and others. The timescale should be based on logical conditions for the product. For example, an oscilloscope could be “run time hours”, while a vehicle instrument cluster could be measured in “highway miles” and a spring programmer in “number of times used”.
Data is recorded on a log-log graph.
Figure 1. This time to failure graph shows the percentage of a widget that has failed over time.
The slope of the graph is not linear, but a straight, best-fit line provides a decent approximation.
The slope of this line of best fit, β, describes the Weibull fault distribution.
- β <1.0 indicates infant mortality
- B = 1 means random failure
- β> 1 indicates a wear fault.
(See Chapter 2 of the New Weibull Handbook for more details.)
The time to failure of a particular percentage of a product is historically described as the B1, B10, B20 time, etc., where the number describes the percentage of products that have failed. For example, B10 is when 10% of the products have failed.
Some manufacturers use L-times (L1, L10, L20, etc.), where L stands for “service life”. Weibull distributions describe a wide range of products; B is believed to possibly represent “Bearing life”.
Figure 3. A Weibull CDF fitted to the sample data from the graph above. In this case, β = 1 and η = 2000.
The continuous Weibull distribution is a continuous statistical distribution described by the constant parameters β and η, where β determines the shape and η determines the scale of the distribution.
The continuous distributions show the relationship between the percentage of failure and time.
In Figure 3 (above), the form β = 1, and the scale η = 2000. The following graphs will illustrate how changing one of these variables at a time will affect the shape of the graph.
As η changes, the Weibull graph shifts and spreads along the horizontal axis.
Figure 4. The Weibull CDF graph shows a change in η with β = 1
As β changes, the slope and shape of the graph change as shown below in Figure 5.
Figure 5. Weibull CDF Plot shows the effect of changing β as η = 2000
Also, some sources introduce the variable μ, which shifts the graph along the horizontal time axis (t-μ).
Unfortunately, the equation is represented with different variables by different sources, α, β, η, λ, κ, etc. The convention adopted in this article models the New Weibull Handbook.
Probability density function
The accumulation of the failures shown above over time generates a probability density function (PDF). This new equation shows how many products will fail at any one time.
If you ran a data center, this chart would provide useful information for determining how many spare parts to have on hand or for scheduling preventive maintenance.
This probability density function describes the frequency of failures over time.
Two cool things to keep in mind about the above equation:
- First, when β = 1, the equation simplifies to a simple exponential equation.
- Second, when β ≈ 3.4, the graph looks like a normal distribution, although there is some deviation.
The scale parameter η is equal to the mean time to failure (MTTF) when the slope β = 1. The discussion of what happens when β ≠ 1 is outside the scope of this article. Interested readers should check the New Weibull Handbook or other online resources again.
How are failure rates determined?
If you look at the crash data, you will occasionally come across MTTF times that are, well, ridiculous. For example, Linear Devices’ GaN HEMT wafer process technology reliability data provides an MTTF of 15,948,452,200 hours. I assure you that Linear did not begin testing their wafers 1.8 million years ago, when Homo sapiens discovered fire.
So how was this number calculated?
Manufacturers accelerate the breakdown of their products by exposing them to excessive heat and excessive voltage. These accelerated failure tests can be used with specific equations to calculate how long a device will last.
Imagine placing a chocolate bar directly on top of a campfire. The closer the chocolate is to the fire, the more heat energy is transferred to it and the faster it melts. But if the chocolate bar is kept at a proper distance, it will never melt and will last virtually forever.
Temperature acceleration exposes devices to high temperatures (125 ° C, 150 ° C and above) and relates the MTTF use temperature to the MTTF test temperature using the Arrhenius equation.
Where Ttest and Tuse are the MTTF, k is Boltzmann’s constant
and Euna is the activation energy for a specific failure mechanism. The Linear Technology Reliability Manual provides the value of 0.8 eV for failures due to silicon bonding and oxidation defects, and 1.4 eV due to contamination.
Manufacturers sometimes expose their devices to excessive voltage. There, an acceleration factor is calculated with a different equation.
Where γ is the voltage acceleration constant derived from the time-dependent dielectric breakdown test and Vt & Vtu are the test and use voltages.
Highly Accelerated Stress Testing
When manufacturers are in a real rush to find faults, they can subject their devices to high pressure, high humidity and high temperature environments for prescribed periods of time. They can cycle fast and extreme in temperature, expose your devices to electromagnetic energy, vibration, shock and other factors.
All of these tests can be mathematically interpreted to provide real MTTFs that reliability engineers can use in their calculations.
Reliability engineers use statistics and mathematical analysis to predict how long your devices will run. By knowing how long a device should operate, they can predict warranty periods, plan for preventive maintenance, and order replacement parts before they’re needed.
This is just a brief introduction to the field. If you are a reliability engineer and know of other sources of information, please let us know in the comments below.