Reliability

RELIABILITY MANAGEMENT AN INTEGRATED APPROACH Wg Cdr Jayesh C S Pai CSDO, AF

The Arrow Government Regulators Company Management Operational Staff Work Actions Accident

Memory and learning 3/10/2010 CSDO 4 10% of whatweread 20% of whatwehear “Conventional Learning” 30% of whatwesee 50% of whatwesee & hear 70% of what we discuss “Active Learning” 80% of whatweexperience 95% of what we teach ,[object Object], than “conventional ones”

DEMING Quotes "There is no substitute for knowledge." "The most important things cannot be measured." "The most important things are unknown or unknowable.“ "Experience by itself teaches nothing.“ "I think that people here expect miracles. American management thinks that they can just copy from Japan—but they don't know what to copy!"

Defect Rate Analysis GOAL: GET AN OVERVIEW ABOUT THE AMOUNT AND STATE OF THE REPORTED DEFECTS. TO OBTAIN AN OVERVIEW OF THE PATTERN OF DEFECTS ARISING IN IT WITH A VIEW TO HIGHLIGHT POSSIBLE TROUBLE PRONE AGGREGATES FOR THE FUTURE AND SUGGEST REMEDIES FOR IMPROVEMENT.

DATA COLLECTION 1. DR/PWR DATA (RECEIVED FROM UNITS) - WING - AIRCRAFT SQUADRON - AIRCRAFT No. - DR/PWR No. & DATE - DIR/PWIR No. & DATE - NOMENCLATURE OF COMPONENT - PART NUMBER - SYSTEM - TRADE - TBO - LIFE COMPLETED SINCE LAST OVERHAUL - LIFE COMPLETED SINCE NEW - REPORTED DEFECT - DI AGENCY 2. DIR / PW IR DATA (Received from DI agency) - CONFIRMATION AND FINDINGS - RECOMMENDATIONS

PIE – CHART TRADE - AIRFRAME DEFECT DATA – AIRFRAME SYSTEMS OTHERS, (18) 10% AIR CONDITIONING SYSTEM (28) 15% TAKE OFF/LANDING SYSTEM (13) 7% FUEL SYSTEM (31) 17% PNEUMATIC SYSTEM (35) 19% HYDRAULIC SYSTEM (58) 32%

Recent NASA Accidents Genesis vehicle slammed into Utah desert, probably because Lockheed engineers installed four small switches backward Climate orbiter crashed into Mars because Lockheed used English measurement while NASA used metric Mars polar vehicle crashed when descent rockets shut off prematurely Television infrared observation satellite (TIROS) fell off its transport stand because an adapter plate was not properly secured

3/10/2010 CSDO 10 The Elephant is like …

Industrial Paradigm Production “Mass” “Flexible” “Reconfigurable” “Lean” 1913 1960 1980 2000 Responsiveness Objective : Variety “Knowledge Science” Quality Cost Computerization Production Management “Interchangeable Parts” Approach

EIGHT TQM TOOLS CHECK SHEET HISTOGRAM PARETO DIAGRAM CAUSE and EFFECT DIAGRAM SCATTER DIAGRAM CLUSTERING CONTROL CHART QUALITY FUNCTION DEPLOYMENT (QFD)

f 17 11 10 5 2 2 2 1 8.25 20.25 = 12.78 , SD = 2.31 Histogram

Pareto Diagram Date : Number of Inspection, N = 2160 100 (%) 200 75 100 50 Defective Items 25 0 Hd Bd Ld Md Cd 0

Scatter Diagram X X X X Yield X X X Reaction Temperature

Process Control Chart Upper control limit Process average Lower control limit 10 9 1 2 3 4 5 6 7 8 Sample number

Cause-and-Effect Diagram Machines Measurement Human Faulty testing equipment Out of adjustment Poor supervision Tooling problems Incorrect specifications Lack of concentration Old / worn Improper methods Inadequate training Quality Problem Inaccurate temperature control Defective from vendor Poor process design Ineffective quality management Not to specifications Dust and Dirt Deficiencies in product design Material- handling problems Process Environment Materials

Engineers—wake up! Most engineers need to understand they are a dying breed if they plan to maintain equipment using only the seat of their pants, qualitative data, for making decisions. Engineers must fluently use reliability data so they can reduce costs and avoid failures by using data to make wise decision. Without the numbers, we engineers will soon be viewed as technical amateurs with declining pay scales and unemployment as the byproduct. Use the data in your maintenance systems to solve technical problems and make improvements!

RELIABILITY vs MAINTENANCE ENGINEER The task of reliability engineers is to avoid failures which carry a requirement for solving problems with data. The task of maintenance engineers is to quickly restore equipment to operating conditions which requires understanding failure modes and the failure data. Both reliability engineers and maintenance engineers can use a common set of data for an excellent communication tool to solve the vitally few problems in the shortest interval of time using facts from the data to reduce costs.

Reliability Reliability Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time [IEEE 90]. In other words, it is the likelihood that the system or component will succeed within its identified mission time, with no failures. An aircraft mission is the perfect example to illustrate this concept. When an aircraft takes off for its mission, there is one goal in mind: complete the flight, as intended, safely (with no catastrophic failures).

Reliability Reliability data requires definition of a failure. Failures can be catastrophic failures or slow degradation—you decide by defining the failures. Units of measure for the data must be in units of the degradation— it can be hours, miles, cycles, and so forth—in short, whatever motivates the failure.

What is a Failure? Reliability is quantified as MTBF (Mean Time Between Failure) for repairable product and MTTF (Mean Time To Failure) for non-repairable product. MTBF is often quoted without providing a definition of failure. This practice is not only misleading but completely useless. MTBF impacts both reliability and availability.

What is a Failure? One could argue there are two basic definitions of a failure: 1) The termination of the ability of the product as a whole to perform its required function. 2) The termination of the ability of any individual component to perform its required function but not the termination of the ability of the product as a whole to perform.

Example If the inverter of a UPS fails and the UPS switches to static bypass, the failure does not prevent the UPS from performing its required function which is supplying power to the critical load. However, the inverter failure does prevent a component of the UPS from performing its required function of supplying conditioned power. According to definition 1, this is not a failure, but according to definition 2, it is a failure.

What is a Failure? In reality there are more then two definitions of failure, in fact they are infinite. Is customer misapplication considered a failure? Are shipping damages considered failures? Is the expected wear out of a consumable item such as a battery considered a failure if it failed prematurely? If an LED (Light Emitting Diode) on a computer were to fail is it considered a failure even though it hasn’t impacted the operation of the computer?

Definition of Failure Failure itself must also be thoroughly defined at system and module levels. It may be necessary to define more than one type of failure (for example, total system failure or degradation failure) or failures for different operating modes (for example, in flight or on ground) in order to describe all the requirements.

Definition of Failure MTBFs might then be ascribed to the different failure types. MTBFs and failure rates often require clarification as to the meaning of ‘failure’ and ‘time’. The latter may refer to operating time, revenue time, clock time, etc. Types of failure which do not count for the purpose of proving the reliability (for example, maintenance induced or environment outside limits) have also to be defined.

MAINTAINABILITY ESSENTIALLY, THE EASE AND SPEED WITH WHICH A FAILED EQUIPMENT CAN BE BROUGHT BACK INTO OPERATING CONDITIONS IS MAINTAINABILITY ALSO CALLED AS MEAN TIME TO REPAIR (MTTR) OF AN ITEM. 3/10/2010 CSDO 30

MTTR Mean Time to Repair (or Recover), is the expected time to recover a system from a failure. MTTR impacts Availability and not Reliability. The longer the MTTR, the worse off a system is. As the MTBF goes up, Availability goes up. As the MTTR goes up, Availability goes down.

MEAN TIME TO REPAIR DETECTION OF FAULT ALLOCATION OF MAINTENANCE TEAM DIAGNOSE FAULT OBTAIN SPARE PARTS (LOGISTIC DELAY) REPAIR TIME(MTTR-THIS IS THE MANUFACTURERS INFORMATION) TEST AND ACCEPT REPAIR CLOSING UP THE SYSTEM AND RETURNING TO NORMAL OPERATION. SKILL OF THE MAINTENANCE ENGINEERS AND THE MAINTENANCE STAFF AVAILABLE AT THE BASES 3/10/2010 CSDO 32

MEAN TIME TO REPAIR If a Mean Time To Repair (MTTR) or Down Time (MDT) is specified, then the meaning of repair time must be defined in detail. Mean time to repair is often used when it is mean down time which is intended.

MTBF Mean Time Between Failure, is a basic measure of a system’s reliability. It is typically represented in units of hours. The higher the MTBF number is, the higher the reliability of the product. MTBF: Misquoted and Misunderstood

MTBF A common misconception about MTBF is that it is equivalent to the expected number of operating hours before a system fails, or the “service life”. It is not uncommon, however, to see an MTBF number on the order of 1 million hours, and it would be unrealistic to think the system could actually operate continuously for over 100 years without a failure.

MTBF There are 500,000 humans aged 25-year-old in the sample population. Over the course of a year, data is collected on failures (deaths) for this population. The operational life of the population is 500,000 x 1 year = 500,000 people years. Throughout the year, 625 people failed (died). The failure rate is 625 failures / 500,000 people years = 0.125% / year. The MTBF is the inverse of failure rate or 1 / 0.00125 = 800 years. So, even though 25-year-old humans have high MTBF values, their life expectancy (service life) is much shorter and does not correlate.

It’s all about Assumptions! So, what is the MTBF of 25-year-old humans, 80 or 800? It’s both! But, how can the same population end up with two such drastically different MTBF values? If the MTBF of 80 years more accurately reflects the life of the product (humans in this case), is this the better method?

MTBF The biggest limitation is time. In order to do this, the entire sample population would have to fail, and for many products this is on the order of 10-15 years. In addition, even if it were sensible to wait this duration before calculating the MTBF, problems would be encountered in tracking products. For example, how would a manufacturer know if the products were still in service if they were taken out of service and never reported? Who would want the MTBF value of a product that has been superceded by several generations of technology updates?

Availability Availability, on the other hand, is the degree to which a system or component is operational and accessible when required for use [IEEE 90]. Availability is often looked at because, when a failure does occur, the critical variable now becomes how quickly the system can be recovered.

For Equation 1 and Equation 2 above to be valid, a basic assumption must be made when analyzing the MTBF of a system.

Unlike mechanical systems, most electronic systems don’t have moving parts.

As a result, it is generally accepted that electronic systems or components exhibit constant failure rates during the useful operating life.,[object Object]

METHODS OF PREDICTING AND ESTIMATING MTBF Reliability Prediction Methods :- MIL-HDBK 217, Telcordia, HRD5 RBD (Reliability Block Diagram) Markov Model FMEA / FMECA Fault Tree Highly Accelerated Life Testing (HALT)

METHODS OF PREDICTING AND ESTIMATING MTBF Reliability Estimation Methods :- Similar Item Prediction Method Field Data Measurement Method The FIDES Prediction Model (FIDES Guide 2004 Issue, A Reliability Methodology for Electronic Systems) was designed by the FIDES Group, a consortium of European companies from the aeronautics and defense fields, to apply to all domains using electronics, including the military, commercial industry, and telecommunications.

Failure Reporting Analysis and Corrective Action System (FRACAS) An effective FRACAS process provides for gathering and tracking failure data in a central database so that this data can be analyzed to determine underlying causes. Yet, in many organisations, the individuals participating in the process are distributed across multiple groups or locations, and are recording information in different sources or databases. As a result, the ability to quickly isolate failure trends is compromised. FRACAS Standard helps organisations to overcome this fragmented approach to reliability. 3/10/2010 CSDO 47

ASL Average Service Life = Sum of the hours logged after last overhaul of the components received as Cat D / DI during the period ____________________________________________ Total No of Components received as Cat D / DI during the period (The number of Engines / Rotables withdrawn for FOD / Bird Hit / Defect Not Confirmed are excluded)

Reliability

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Reliability

Similar a Reliability (20)

Más de Wg Cdr Jayesh C S PAI

Más de Wg Cdr Jayesh C S PAI (20)

Reliability