1 / 70

accelerated life testing alt, predictive modeling and the probabilistic design-for-reliability approach short cour

Reliability Engineering .

arleen
Download Presentation

accelerated life testing alt, predictive modeling and the probabilistic design-for-reliability approach short cour

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. ACCELERATED LIFE TESTING (ALT), PREDICTIVE MODELING AND THE PROBABILISTIC DESIGN-FOR-RELIABILITY APPROACH Short Course “You can see a lot by observing” Yogi Berra “It is easy to see, it is hard to foresee” Benjamin Franklin E. Suhir, University of California, Santa Cruz, CA, University of Maryland, College Park, MD, and ERS Co., 727 Alvina Ct. Los Altos, CA, 94024 Tel. 650-969-1530, cell. 408-410-0886, e-mail: suhire@aol.com, IEEE CPMT ASTR’07, College Park, MD, October 31, 2007

    3. Failure modes and mechanisms, “root” causes, corrective actions “Vision without action is a daydream. Action without vision is a nightmare” Japanese saying Failure mode identifies what happened, the way the product failed; what has been detected, observed and/or reported as a functional, mechanical or environmental failure (shorts and opens; low or distorted output signal; materials failure; loss of structural integrity; loss of coupling efficiency, elevated attenuation in a light-guide, etc.) Failure mechanism identifies what physical, chemical, mechanical, thermal or manufacturing processes and phenomena resulted in a failure (voltage breakdown, corrosion, fatigue, electro-migration, excessive heat, elevated stress or displacement, insufficient fracture toughness, division by zero, etc.) Failure site identifies the location at which the product failure occurred The information of a failure could be obtained during the fulfillment of qualification tests, during screening tests carried out in the process of manufacturing, during burn-in, at the customer site at the product arrival, or in the field “Root” cause identifies why a particular failure happened; is it because of a poor design, defects, selection of an inappropriate material, overstress, inadequate technology, misuse or abuse of equipment, manufacturing deficiencies, human error, or something else? After a root cause is identified, a corrective action should be taken, so that further failures are prevented

    4. Reliability is a complex property “If you bet on a horse, that’s gambling. If you bet you can make three spades, that’s entertainment. If you bet the device will survive for twenty years, that’s engineering. See the difference?” Unknown Reliability Engineer Reliability is, in effect, part of applied probability and includes the item's (system's) dependability, durability, maintainability, reparability, availability, testability, etc., i.e., probabilities of the corresponding events or characteristics Each of these characteristics is measured as a certain probability and could be of a greater or lesser importance depending on the particular function and operation conditions of the item or the system, and consequences of failure

    5. Three classes of engineering products “In a long run we are all dead” John Maynard Keynes Class I. The product has to be made as reliable as possible. Failure should not be permitted (some military or space objects) Class II. The product has to be made as reliable as possible, but only for a certain level of demand. Failure is a catastrophe (civil engineering structures, bridges, ships, aircrafts, cars) Class III. The reliability does not have to be very high. Failures are permitted (consumer products, commercial electronics, etc.)

    6. Reliability, cost, and time-to-market “The probability of anything happening is in inverse ratio to its desirability” John W. Hazard Reliability, cost and time-to-market considerations play an important role in the design, materials selection and manufacturing decisions. They are key issues in competing in the global market-place. A company cannot be successful, if its products are not cost effective, or do not have a worthwhile lifetime and service reliability to match the expectations of the customer. Too low a reliability can lead to a total loss of business. Product failures have an immediate, and often dramatic, effect on the profitability and even the very existence of a company. Profits decrease as the failure rate increases. This is due not only to the increase in the cost of replacing or repairing parts, but, more importantly, to the losses due to the interruption in service, not to mention the “moral losses”. These make obvious dents in the company’s reputation and, as the consequence of that affect its future sails.

    7. Adequate (required) reliability level “Be thankful for problems. If they were less difficult, someone with less ability might have your job” Reliability Engineer There is a permanent “struggle” between the recognition of the industry that a high and well-predicted product reliability is a “must” and a strong business pressure that tends to compromise product’s reliability in order to shorten the time-to-market and to reduce manufacturing costs. It is always a challenge to establish, for each particular product and a particular situation, the most reasonable balance between the level of reliability and market demands, in terms of schedule and cost. It used to be said that of quality, schedule and price, the customer could have any two. Today, none of these items could be compromised for the other two: the best engineering and business decisions should consider the best trade-off among them. Businesses should establish a minimum level of product testing or inspection that will provide a level of reliability, which they feel is adequate for the market they serve.

    8. Overall approach to reliability should be optimized “It is tough to make predictions, especially for the future” Yogi Berra Each business, whether small or large, should try to optimize its overall approach to quality and reliability. The time to develop and time to produce products is rapidly decreasing. This circumstance places a significant pressure on reliability engineers. They are supposed to come up with a reliable product and to confirm its long-term reliability in a short period of time to make their device a product and to make this product successful in the marketplace. A business must understand the cost of reliability, both its “direct” cost, i.e., the cost of its own operations, and the “indirect” cost, i.e., the cost to its customers and their willingness to make future purchases and to pay more for more reliable products.

    9. Reliability should be taken care of on the permanent basis “Nothing is impossible. It is often merely for an excuse that we say things are impossible” Francois De La Rochefoucould The reliability evaluation and assurance cannot be delayed until the device is made (although it is often the case in many actual industries). Reliability should be “conceived” at the early stages of its design (a reliability and optical engineers should start working together from the very beginning of the optical device engineering), Reliability should be implemented during manufacturing (quality control is certainly an important part of a manufacturing process), Reliability should be qualified and evaluated by electrical, optical, environmental and mechanical testing (both the customer requirements and the general qualification requirements are to be considered), Reliability should be thoroughly checked (screened) during production, and, if necessary, Reliability should be maintained in the field during the product’s operation, especially at the early stages of the product’s use.

    10. New products, qualification tests and accelerated life tests “All life is an experiment. The more experiments you make the better” Ralph Waldo Emerson New products present natural reliability concerns, as well as significant challenges at all the stages of their design, manufacture and use. These concerns and challenges have to do with the evaluation and assurance of the functional (electrical and optical) performance, the structural (mechanical) reliability and the environmental durability of the product. One of the major challenges, associated with new product development and reliability, is design and implementation of the adequate accelerated qualification tests (QTs) and accelerated life tests (ALTs). It is primarily the QTs that make a device into a product. But it is the ALTs that enable a reliability specialist to understand the engineering and science behind the product. It is the ALTs that enable a reliability engineer to create, on the basis of the developed understanding, a viable and a reliable product with the predicted and sufficiently low probability of failure.

    11. What one should/could do to prevent failures-1 “Well done is better than well said” Benjamin Franklin Develop an in-depth understanding of the possible modes and mechanisms of failure in the design. Understand that no failure statistics, nor the most effective ways to accommodate failures, can replace good understanding of the physics of failure and good (robust) physical design. Assess the likelihood (the probability) that the anticipated modes and mechanisms might occur in service conditions and minimize the likelihood of a failure by selecting the best materials and the best physical design of your product Understand and distinguish between different aspects of reliability: operational (functional) performance, structural/mechanical reliability (caused by mechanical loading) and environmental durability (caused by harsh environmental conditions)

    12. What one should/could do to prevent failures-2 “Well done is better than well said” Benjamin Franklin Distinguish between the materials and structural reliability and assess the effect of the mechanical and environmental behavior of the materials and structures in his/her design on the functional performance of the product Understand the difference between the requirements of the qualification specifications and standards, and the actual operation conditions Understand well the qualification test conditions and design the product not only that it would be able to withstand the operation conditions on the short- and long-term basis, but also to pass the qualification tests Understand the role and importance of ALTs and HALTs

    13. Accelerated tests “Golden rule of an experiment: the duration of an experiment should not exceed the lifetime of the experimentalist” Unknown Physicist Shortening of product design and product development time does not allow for time-consuming reliability evaluations. To get maximum information and maximum reliability-and-quality in minimum time and at minimum cost is the major goal of a manufacturer. In order to accelerate the material’s (device’s) degradation and/or failure, one has to deliberately “distort” one or more parameters (temperature, humidity, load, current, voltage, etc.) affecting the device’s functional and/or mechanical performance. Accelerated tests use elevated stress level and/or higher stress-cycle frequency to precipitate failures over a much shorter time frame. The “stress” does not necessarily have to be a mechanical or a thermo-mechanical one: it can be electrical current or voltage, high (or low) temperature, high humidity, high frequency, high pressure or vacuum, cycling rate, or any other factor responsible for the reliability of the device or the equipment. In accelerated tests one applies a high level of stress over a short period of time to a device/product presuming/assuming that there will be no “shift” in the failure modes and mechanisms.

    14. Accelerated test levels “If you do not raise your eyes, you will think that you are at the highest point” Antonio Porchia Accelerated tests can be performed at the part level, at the component level, at the module level, at the equipment level and even at the system level. Accelerated testing is usually conducted at the part (assembly) or at the component (device) level In each particular case, the decision should be made on how to break down the equipment of interest, so that the number of failure modes of the object under testing would not be very large. If the reliability characteristics of all the components are established, then the reliability characteristics of the equipment or the system can be evaluated using theoretical methods of probabilistic (statistical) analyses (predictive modeling). Different reliability criteria are (and should be) used depending on whether it is an assembly, a component, a subsystem, a piece of equipment or a large system.

    15. Dependability and availability “A pinch of probability is worth a pound of perhaps” James Thurber While the probability of failure (dependability) might be the right criterion for a non-reparable component, a piece of equipment should be characterized by its availability, i.e. the probability that this piece of equipment will be available to the user, when it is needed. As to a large and a complex system (say, a switching system or a highly complex communication/transmission system, in which its “end-to-end reliability” is important, including the “reliability” of software), it is the “operational availability” that is of importance.

    16. Repairability and availability “A road to success is always under repair”, Common knowledge To achieve high availability one does not have to necessarily keep the dependability of a particular component or even of a subsystem at a very high level. He/she can run a highly available system by achieving high repairability, reasonable redundancy, high-level of trouble shooting, etc. Because of that, there is a rather wide spectrum of reliability requirements, ranging from very high requirements for large and complex systems down to simple consumer products, for which the consequences of failure are not as catastrophic as they are for large systems. The reliability (availability) of the contemporary communication networks is as high as 0.999. For consumer products, however, it is the cost and time-to-market that are the major driving forces, and their reliability (typically, dependability) should only be adequate for customer acceptance and reasonable satisfaction. No wonder that in reliability communities one can find a variety of opinions, attitudes and approaches to, and actual practices in, reliability assurance. It depends on the driving market forces and a particular business, whether it is component/device making, equipment manufacturing, or service provision.

    17. Accelerated tests

    18. Qualification tests-1 “By asking impossible obtain the best possible” Italian Saying The today’s qualification programs, standards and specifications (such as, say, Telcordia requirements) enable one to “reduce to a common denominator” different products, as well as similar products, but produced by different manufacturers. These standards reflect, to a great extent, the state-of-the-art in a particular field of engineering, as well as more or less typical requirements for the performance of a product intended for a particular application. Industry cannot do without accelerated qualification tests and qualification standards. However, qualification standards and requirements are only good for what they are intended - to confirm that the given product (provided that it passed the tests) is indeed “qualified” to serve in a particular capacity. In some cases, especially for new products and new technologies, when no experience has been yet accumulated, the general qualification standards, based on the previous generations of the device or on other, “similar”, devices and components, might be too stringent.

    19. Qualification tests-2 “By asking impossible obtain the best possible” Italian Saying If a product passed the standardized qualification tests, it is not always clear why this product was good, and if the product failed, it is equally unclear what could be done to improve its reliability. Since qualification tests are not supposed to be destructive, they are unable to provide the most important ultimate information about the reliability of the product - the information about the probability of its failure after the given time in service under the given conditions of operation. If a product passed the qualification tests, it does not mean that there will be no failures in the field, and it is unclear how likely or unlikely these failures might be.

    20. Accelerated life tests (ALTs) “Plus usus sine doctrina, quam citra usum doctrina valet” (“Practice without theory is more valuable than theory without practice”) Latin Proverb ALTs are aimed at the revealing and understanding the physics of the expected or occurred failures. Unlike QTs, ALTs are able to detect the possible failure modes and mechanisms. Another objective of the ALT’s is to accumulate sufficiently representative reliability/failure statistics. Thus, ALT’s deal with the two major areas of Reliability Engineering – physics and statistics of failure. ALT’s should be planned, designed and conducted depending on the projected lifetime of the product, the expected operational and non-operational loading conditions and environment, the frequency and duration of such loading and environmental conditions, consequences of failure, etc. Adequately planned, carefully conducted, and properly interpreted ALTs provide a consistent basis for the prediction of the probability of failure after the given time in service.

    21. Accelerated life tests (ALTs) “Plus usus sine doctrina, quam citra usum doctrina valet” (“Practice without theory is more valuable than theory without practice”) Latin Proverb This information can be extremely helpful in understanding of what and how should be changed in order to design a viable and reliable product. Indeed, any structural, materials and/or technological improvement can be “translated”, using the ALTs data, into a low enough probability of failure for the given duration of operation under the given service (environmental) conditions. Well-designed and thoroughly implemented ALTs can dramatically facilitate the solutions to many business-related problems, associated with the cost effectiveness and time-to-market. ALTs should be conducted in addition to, and, preferably, long before (or, at least, concurrently with), the qualification tests. There might be also situations, when accelerated testing can be used as an effective substitution for the qualification tests and standards, especially for very new products, when “reliable” (widely acceptable) qualification standards do not yet exist.

    22. Predictive modeling: ALT cannot do without it “The degree of understanding a phenomenon is inversely proportional to the number of variables used for its description” Unknown Physicist “Everything should be made as simple as possible, but not one bit simpler” Albert Einstein “Any equation longer than three inches is most likely wrong” Unknown Physicist ALTs cannot do without simple and meaningful predictive models. It is on the basis of such models that a reliability engineer decides which parameter should be accelerated, how to process the experimental data and, most importantly, how to bridge the gap between what one “sees” as a result of the accelerated testing and what he/she will possibly “get” in the actual operation conditions For a manufacturer, the existing qualification standards for the Class I and Class II are “the bible”, and, when implementing these standards, he/she can make his/her product qualified without even knowing the actual modes and mechanisms of failure. However, for an engineer who is developing qualification standards, predictive modeling is as important as the actual experimental data are. ALT models take inputs from various theoretical analyses, test data, field data, customer requirements, qualification spec requirements, state-of-the-art in the given field, consequences of failure for the given failure mode, etc.

    23. Predictive modeling: what one expects from it “A theory without an experiment is dead. An experiment without a theory is blind” Unknown Reliability Engineer These models are supposed to come up with brief and meaningful relationships that clearly indicate “what affects what and what is responsible for what” and that are able to quantitatively describe these effects. These relationships may or may not include time. Computation of the expected reliability at conditions other than the ALT ones can provide important information about the device performance after a certain time in service or during accelerated testing at the given conditions. By considering the fundamental physics that might constrain the final design, predictive modeling can result in significant savings of time and expense. Modeling can be very helpful in optimizing the performance and lifetime of the device. Although in some situations a particular model might not be 100% adequate for the given application or a new situation, it is important that it is amenable to updates and revisions, if necessary, and that it “reduces to the common denominator” the accumulated knowledge to provide continuity. A good predictive reliability model does not need to reflect all the possible situations, but rather should be simple, should clearly indicate what affects what in the given phenomenon or structure, and be suitable/flexible for new applications, with new environmental conditions and technology developments.

    24. Predictive modeling: requirements for a good model “It is always better to be approximately right than precisely wrong” Unknown Reliability Engineer A predictive model does not have to be comprehensive, but has to be sufficiently generic, and should include all the major variables affecting the phenomenon (failure mode) of interest. A good model should contain all the most important parameters that are needed to describe and to characterize the phenomenon of interest, while parameters of the second order of importance should not be included into the model. A good life test model should be suitable for the accumulation, on its basis, the reliability statistics and should be flexible enough to account for the role of materials, structures, loading (environmental) conditions, new designs, etc. The scope of the model depends on the type and the amount of information available.

    25. Power law For some failure mechanisms the analytical models that are used to predict reliability (as represented by the time-to-failure, or cycles-to-failure) have a power law structure: T=Csn, where s is the applied stress, and C and n are material parameters. The power law is used, for instance, to describe degradation in lasers, when the injection current or the light output power are used as acceleration parameters.

    26. Boltzmann-Arrhenius equation-1 If Boltzmann-Arrhenius equation is used, the mean time-to-failure, t=tau, is proportional to an exponential function, in which the argument is a fraction, where the activation energy, Ua, eV, is in the numerator, and the product of the Boltzmann’s constant, k=8.6174?10-5eV/şK, and the absolute tempereature, T, is in the denominator: The equation was first obtained by the German physicist L. Boltzmann in the statistical theory of gases, and then applied by the Swedish chemist S. Arrhenius to describe the inversion of sucrose. Boltzmann-Arrhenius equation is applicable, when the failure mechanisms are attributed to a combination of physical and chemical processes. Since the rates of many physical processes (such as, say, solid state diffusion, many semiconductor degradation mechanisms) and chemical reactions (such as, say, battery life) are temperature dependent, it is the temperature that is used as an acceleration parameter.

    27. Boltzmann-Arrhenius equation-2 The Boltzmann-Arrhenius equation can be used to model temperature induced degradation in many electronic and photonic products, including lasers. It is presumed that the rate of degradation in lasers is due to diffusion, precipitation, oxidation and other temperature dependent phenomena, so the degradation, D, rate can be described by the equation Solid-state diffusion can form brittle intermetallic compounds, weaken local areas, cause high electrical impedance. The effect of the relative humidity (RH) can be accounted for, if the relationship representing Boltzmann-Arrhenius law is used, by multiplying the right part of this relationship by the factor 1/(RH)^n , where n is an empirical parameter. This relationship can be used, for instance, to describe the results of ALTs for planar lightwave circuit (PLC) devices. Typically, the activation energy and the temperature sensitivity threshold are not known in advance and should be established experimentally for a particular application. As to the time constant, it does not have to be determined, if it is the acceleration factor that is of interest.

    28. Coffin-Manson equation (inverse power law)-1 The Coffin-Manson equation (inverse power law) is applicable when the lifetime of the material or a structure is inversely proportional to the applied stress. In accordance with this equation, the median number-of-cycles-to-failure in the low-cycle fatigue conditions can be found as where s is the cyclic mechanical stress range (s=? s= smax - smin) and C and m are material’s constants. This formula was applied by many investigators to evaluate the life-time of solder joints in micro- and opto-electronics. W. Engelmaier suggested the following formula to predict the lifetime of solder joint interconnections where er is the plastic strain, ef =0.325 is the fatigue ductility coefficient, is the fatigue ductility exponent, f is the cyclic frequency (1<f < 1000 cycles per day), and Ts is the mean cyclic temperature.

    29. Coffin-Manson equation (inverse power law)-2 In random vibration tests, the mean-time-to-failure can be found in accordance with the Steinberg formula where s is the stress at the resonant frequency, and C and m are material’s constants. This formula indicates that the mean-time-to-failure is proportional to the square root of the stress, induced at the resonance frequency. Inverse power law is used also to model aging in lasers in the cases of current or power acceleration: The inverse power law is used also to describe the “static fatigue” (delayed fracture) of silica material used in optical light-guides. In this case, the time-to-failure T can be found as a function of the applied stress, where the exponent n is negative: n=m/2=18-20.

    30. Paris-Erdogan equation-1 This equation establishes a relationship between the fatigue crack growth rate and the variation in the cyclic stress intensity factor: where da/dN is the crack growth rate, A and m are material constants, is the stress intensity factor, a is the crack length, s is the nominal stress, and the factor G is a function of geometry. The stress intensity factor range, ?K, in the above formula is where sr is the nominal stress range.

    31. Paris-Erdogan equation-2 The Paris-Erdogan’s formulas are applicable when the stress intensity factor range is larger than a certain threshold for the given material, below which no crack growth can occur, or below which the crack growth rate is very low. Generally, in most electronic and optoelectronic devices under normal use conditions, the initial cracks are very small, and so is the nominal stress range. It could be expected that in normal operating conditions, the value is smaller than, but not far below, the threshold value. In such a case, the fatigue life is dominated by crack initiation only. However, if the stress range increases, as it takes place in an accelerated test, then the stress intensity factor range my increase beyond the threshold value, and the failure mechanism might shift from crack initiation to crack propagation.

    32. Bueche-Zhurkov equation Bueche-Zhurkov’s equation contains not only the absolute temperature, but also the applied stress as an acceleration factor: where ? is the stress sensitivity factor, which depends on the structure of the material and the degree of the accumulated damage. The experimentally found stress sensitivity factor for a non-oriented condition of a polyamide is about =1.3x10-27 cub.m. For an oriented condition it can be significantly lower. The Bueche-Zhurkov’s formula underlies the kinetic approach to the evaluation of the strength of materials. In accordance with this approach, it is the random thermal fluctuations of particles (atoms) that are primarily responsible for the materials strength (failure), while the role of the external stress is reduced simply to lowering the activation energy. In many practical applications, it is only the governing relationships of the Bueche-Zhurkov’s formula type that is considered, while the numerical values of the parameters are evaluated experimentally for a particular application.

    33. Eyring equation In the Bueche-Zhurkov’s formula the effect of the external stress is considered indirectly, by reducing the level of the activation energy. This effect is considered directly in the Eyring equation: Unlike in the Bueche-Zhurkov’s formula, the stress in the Eyring equation does not have to be necessarily a mechanical stress: it could be voltage, humidity, etc.

    34. Peck and Black equations Peck’s equation is, in effect, Eyring equation expanded and modified for modeling the time-to-failure in the temperature humidity bias conditions: Here RH is the percent relative humidity. In Black’s equation, the RH is substituted with the current density J, the tş value is a constant related to the geometry of the conductor, and n is a parameter related to the current density, which accounts for the effects of current flow other than joule heating of the conductor.

    35. Linear fatigue damage model (Miner-Palmgren’s rule) Fatigue damage model (the law of linear accumulation of damages) can be formulated as the following Miner-Palmgren’s “linear damage” rule (LDR): Here D is the cumulative damage, ni is the actual number of cycles applied at the i-th stress level, Ni is the number of cycles to failure under this stress, and m is the total number of different stress levels. Miner’s rules is usually assumed to hold in accelerated fatigue tests. The rule implies that the order, in which loads are applied, is not significant., whether there are step-up, step-down stress sequences or random stress applications. The law of linear accumulation of damages is applicable for stresses, not exceeding the yield stress. Although this linear law is, strictly speaking, not applicable for the assessment of the low-cycle-fatigue lifetime, it is nonetheless often used to estimate the number-of-cycles-to-failure, when a wide range of applied stresses, both below and above the yield point, are likely, as well as in random fatigue assessments. Nonlinear models predict fatigue lives that are always shorter than predictions based on the linear damage rule.

    36. Creep rate equations-1 Assuming that the (static) creep rate is constant throughout the test, and that the phenomenon is dominated by the secondary stage, one can evaluate the strain rate due to creep as (Norton creep law) where s is the applied stress, and A and n are material’s parameters. Norton's formula is capable to represent (more or less) the entire creep curve. If the creep phenomenon is heavily dominated by the tertiary stage, Norton’s formula might not be adequate. Another widely used relationship for creep rate is Prandtl’s law (also known as Garofalo-Arrhenius law): where A is the amplitude of the normal strain.

    37. Creep rate equations-2 The following Graham-Walles equation was suggested to represent all the creep stages: where A, n, a,b and c are experimentally determined constants. If the stress is due to the thermal expansion mismatch of the dissimilar materials in the structure, it is not an independent variable, but is a function of the temperature T. Creep tests is much easier to conduct than stress relaxation tests. On the other hand, phenomena associated with stress relaxation (time dependent stress for the given deformation) can be predicted, with sufficient accuracy, theoretically, if creep (time dependent deformation for the given stress) test data are available.

    38. Weakest link models The weakest link model assumes that the material (device) failure originates from the weakest point. This model is applicable, when the physics of the failure phenomenon confirms that this is indeed the case. Failures due to crack generation and propagation, and dielectric breakdown are examples of weakest link failures.

    39. Stress-strength models These models, at least in the steady-state approximations, are widely used in various problems of structural (physical) design. In these models the interaction of the probability density functions for the strength and stress distributions is considered. In aerospace, civil, ocean and other structures, the probability density functions are steady state, i.e. do not change with time. In lasers, however, one can assume that the stress distribution function is indeed time independent, but the strength distribution function becomes broader and shifts towards the stress distribution function, when time progresses. At the initial moment of time the two functions are well separated, and the distance between their end points provides an appreciable margin of safety. At a certain moment of the lifetime, the right end of the strength distribution “touches” the left end of the stress distribution (the marginal state). When the time of operation exceeds the moment of time that corresponds to the marginal state, the two curves start to overlap, and the probability of failure is not zero anymore. It does not mean, however, that the device cannot be operated beyond the marginal point of time, provided that the probability of failure can be predicted with sufficient accuracy, and be made low enough for the required (specified) time of operation.

    40. Probability of failure-1 “Probable is what usually happens” Aristotle, Greek Phylosopher “Probability is the very guide of life” Cicero, Roman Orator “In every big cause one should always leave something to a chance” Napoleon, French Emperor “To err is human, to forgive is divine, but to include errors in your design is statistical” Leslie Kisch “Probability is too important to be left to the mathematicians” Unknown Reliability Engineer Based on the accelerated test data, one can predict the probability of failure at the end of the given time of the device operation. Different approaches can be used to evaluate such a probability. The most typical (“parametric”) approach, used in engineering practice, is based on an assumption that not only the relationships of the previous section hold for both the accelerated and use conditions, but that the laws of the probability distributions for the parameter of interest do not change either.

    41. Probability of failure-2 Recently, there were suggested several (“non-parametric”) approaches, based on the extreme value distributions, that enable one to successfully process the ALT data, even if the lifetime distribution is stress level dependent. If, for instance, the Bueche-Zhurkov’s formula is used, the probability of failure (in a long run) can be found as If Engelmaier’s formula is applied, then the formula should be used to evaluate the probability of failure.

    42. PROBABILISTIC DESIGN-for-RELIABILITY APPROACH

    44. The probability of nonfailure-1 The “reliability” (actually, “dependability”) of a non-repairable item is defined as the probability of non-failure, P = P {C>D}, i.e., as the probability that the item’s bearing capacity (“strength”), C, during the time, t, of operation under the given loading/environmental conditions, will always be greater than the demand (“loading”), D. Although the probability of non-failure is never zero, it can be made, if a probabilistic approach is used, as low as necessary.

    45. The probability of nonfailure-2 If the probability distributions f (C) and g (D) (probability density functions) for the random variables C and D are known, then the probability, P, of non-failure (reliability, dependability) can be evaluated as where f(?) is the probability density function of the margin of safety ?=C-D, which is also a random variable. The probability density function f(?) can be found as the convolution (cumulative distribution) of the laws of the probability distributions of the random variables C (“capacity”) and D (“demand”).

    46. Safety factor (Safety index, reliability index) Direct use of the probability of non-failure is often inconvenient, since, for highly reliable items, this probability is expressed by a number which is very close to one, and, for this reason, even significant chan in the item’s (system’s) design, which have an appreciable impact on the item’s reliability, may have a minor effect on the probability of non-failure. In those cases when both the mean value, <?>, and the standard deviation, s, of the margin of safety (or any other suitable characteristic of the item’s reliability, such as stress, temperature, displacement, affected area, etc.), are available, the safety factor (safety index, reliability index) SF=d= <?>/s can be used as a suitable reliability criterion.

    47. Time-to-failure Usually the capacity (strength), C, and/or the demand (loading), D, change in time. Failure occurs, when the demand (loading), D, becomes equal or smaller than the bearing capacity (strength), C, of the item. This random event is the time-at-failure, and the duration of operation until this time takes place is the random variable known as time-to-failure. The corresponding safety factor, SF, can be determined as the ratio of the mean time-to-failure (MTTF) to the standard deviation, STD, of the time-to-failure: SF=MTTF/STD

    48. Physical meaning and significance of the safety factor Safety factor, SF, establishes both the upper limit of the reliability characteristic of interest (“mean value”, <?>, of the stress-at-failure, the highest possible temperature, the ultimate displacement, the electric current, the voltage, the light output, etc.), i.e., the numerator of the fraction SF=d= <?>/s that defines the safety factor, as well as the accuracy, with which this mean value is established, i.e., the denominator of this fraction. Both the numerator (significant difference between the capacity and demand) and the denominator (accuracy/reliability/confidence), with which this difference is established, are important from the standpoint of the reliability (non-failure) of the item.

    49. Safety factor (SF) and coefficient of variability (COV) Safety factor (SF) is reciprocal to the coefficient of variability (COV), which is defined as the ratio of the standard deviation to the mean value of the random variable of interest. While the COV is the characteristic of uncertainty of the random variable of interest, the SF is the characteristic of certainty of the random parameter (stress-at-failure, the highest possible temperature, the ultimate displacement, the affected area, etc.) that is responsible for the non-failure of the item. If the reliability characteristic of interest (for a non-repairable item) is a random variable that is determined by just two independent non-random quantities (say, the mean value and the standard deviation), then the safety factor, SF, determines completely the probability of non-failure (reliability). The larger is the safety factor, the higher is the probability of non-failure. The normal law is a “typical representative” of such a probabilistic distribution.

    50. Normal law If the reliability characteristic of interest (e.g., the safety margin, ?) is distributed in accordance with the normal law then the probability of non-failure is related to the safety factor SF as P=˝[1+?(SF)], where is the probability integral (Laplace function).

    51. Physical design for reliability: Example (space shuttle tile structure) Example: Mechanical and thermal characteristics (in the take-off and landing conditions) that determine the strength (structural integrity) of a space shuttle tile structure 1) Mechanical stresses in the tile, in the tile adhesive (both shearing and peeling) and in the underlying Orbiter structure due to the external aerodynamic loading caused by the boundary layer transitional behavior, 2) Thermal stresses in the tile, in the tile adhesive (both shearing and pealing) and in the underlying Orbiter structure due to the thermal loading caused by the thermal expansion mismatch of the tile material with the material of the underlying structure. 3) Out-of-plane displacements in the tile with respect to the underlying structure due to the external aerodynamic loading caused by the boundary layer transitional behavior. 4) Out-of-plane displacements in the tile with respect to the underlying structure due to the thermal loading caused by the thermal expansion mismatch of the tile material with the material of the underlying structure. 5) The maximum temperature in the adhesive layer and in the underlying structure. 6) The maximum possible difference between the above temperatures and the corresponding maximum thermal mismatch strain 7) Areas affected by high temperatures.

    52. Factors (Conditions) that Affect the Reliability Characteristics-1 Environmental and other factors (conditions) that affect the characteristics, which determine the strength (integrity) of the tile structure: For each of the mechanical and thermal characteristic that determine the short- and the long-term strength (integrity, reliability) of the tile structure (mechanical stresses, thermal stresses, out-of-plane displacements, maximum temperatures, size of the affected areas, etc.) develop a spreadsheet that establishes (separately, for take-off and landing processes, and for each moment of time during these processes) the environmental factors (conditions) that affect these characteristics Note: The environmental factors include, but might not be limited to, the initial conditions, entry trajectory, boundary layer transition conditions, vehicle behavior during entry, induced accelerations (in the Orbiter treated as a solid body, and/or as a result of its aero-elastic vibrations at high Mach numbers, etc.)

    53. Factors (conditions) that affect the reliability characteristics-2 Based on the physical nature of the given environmental factor and on the available information of it, decide, if this factor should be treated as a non-random (deterministic) value, or as a random variable Notes: At this stage of the reliability assessments we do not intend to consider random (stochastic) processes (such as, for instance, atmospheric turbulence), in which the characteristic of interest is a random function of time. In our evaluations, we treat time as an independent parameter. We consider random variables that may or may not be time dependent. We treat random characteristics of interest as nonrandom functions of random arguments, and establish these functions (relationships) using experimental data, or finite-element analyses (FEA), or analytical evaluations

    54. Probability distributions-1 Probability distributions of the environmental factors (conditions) that affect the characteristics, which determine the reliability of the tile structure For those environmental factors (conditions) that should be treated as random variables, establish (accept) the probability distribution laws for these factors When the actual experimental information is not available (such a situation seems to be typical) assume, based on general physical considerations, the most suitable (or the most conservative) laws of the probability distribution (e.g., uniform, exponential, normal, Weibull, Rayleigh, etc.)

    55. Probability distributions-2: notes 1) Since the exponential distribution has the largest entropy (the largest uncertainty) of all the distributions with the same mean, this distribution should be accepted if no information except the expected (mean) value is available. The exponentially distributed random variable is always positive. If the random process of failures can be treated as a simple Poisson flow with a constant intensity, then the time interval between two adjacent consecutive failures has an exponential distribution. The SF for a exponentially distributed random variable is always “one”. The most likely value of the exponentially distributed random variable, t, is at t=0. 2) If the physical nature of a random environmental factor is such that it can be only positive (i.e., acceleration during take off) or only negative (i.e., deceleration during landing), its most likely value is certainly non-zero. If only this value (or the mean) is available, then the Rayleigh law should be employed. If a normally distributed random variable has a finite variance and zero mean, and changes periodically with a constant or next-to-constant frequency, but with a random amplitude and random phase angle, then these amplitudes and the corresponding energies obey the Rayleigh law of distribution.

    56. Probability distributions-2: notes 3) If the expected (mean) value and the variance are known, and the physical nature of the random environmental factor is such that the probability density function is symmetric with respect to the mean value (which coincides with the median and the most likely value), then the normal distribution should be accepted, especially (but not necessarily) if the random variable can be either positive or negative. 4) If the expected (mean) value and the variance are known, and the physical nature of the random environmental factor is such that the probability density function is highly asymmetric then Weibull or some distributions associated with the normal law can be used. Examples are: distribution of the absolute value of a normal random variable, truncated normal distribution, or log-normal distribution.

    57. Probability distributions-3 For each mechanical or thermal characteristic of interest (mechanical or thermal stress, out-of-plane displacement, etc.) perform (deterministic) transient mechanical (structural) and thermal analyses of the response of this characteristic to the particular environmental parameter (consider experimental, FEA and analytical evaluations). This should be done separately for take off and landing processes and for each sequential moment of time in these processes Treating each reliability related characteristic of interest (stress, temperature, etc.) as a non-random function (output) of a random argument (input) due to a particular environmental factor, evaluate the probability density function of this characteristic for the assumed (accepted, determined) law of the probability distribution of the environmental factor. Time could enter as an independent parameter into the computed response. For some environmental factors, the input could be considered as a non-random (deterministic) value. The relevant procedures, with examples, could be found, e.g., in E. Suhir, “Applied Probability for Engineers and Scientists”, Mc-Graw-Hill, New York, 1997. Determine the cumulative probability distribution functions for all the probability density functions that affect the given mechanical or thermal characteristic of interest . Such a convolution of the constituent laws of distribution considers, in the most accurate and non-conservative way, the probabilistic input of each of the environmental parameters (environmental temperature, Orbiter acceleration, etc.) that affect the particular mechanical or thermal characteristic. In other words, it considered the likelihood that the maxima of different environmental factors might not occur simultaneously. The relevant procedure can be found, e.g., in E. Suhir, “Applied Probability for Engineers and Scientists”, Mc-Graw-Hill, New York, 1997.

    58. Probability distributions-4 Notes: 1) If the number of random variables does not exceed two, the convolution could be carried out analytically (see, e.g., E. Suhir, “Applied Probability for Engineers and Scientists”, Mc-Graw-Hill, New York, 1997). If the number of random variables is three or more, the process of obtaining a cumulative law of distribution should be computerized and the result will be obtain numerically, rather than analytically 2) Since the above distributions are based on the transient responses of the mechanical (thermal) characteristics of interest to the time-dependent environmental excitations (parameters), these distributions determine the probability that at the given moment of time the given characteristic is below/above the given value of this characteristic

    59. Reliability criteria Determine the safety factors and other reliability criteria for the characteristics that determine the mechanical and thermal performance of the tile structure : The safety factor (SF) for each reliability characteristic of interest at each point of time, after the given duration of flight during take off and landing The probability of non-failure, P (t), for the established (accepted) safety factor, at each point of time, after the given duration of flight during take off and landing, and The mean time-to-failure, MTTF, for the established (accepted) safety factor, standard deviation, STD, of the time-to-failure and safety factor SF=MTTF/STD for the time-to-failure at each point of time, after the given duration of flight during take off and landing

More Related