IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004

Failure Rate Modeling Using Equipment Inspection Data Richard E. Brown, Senior Member, IEEE, George Frimpong, Senior Member, IEEE, and H. Lee Willis, Fellow, IEEE

Abstract—System reliability models typically use average equipment failure rates. Even if these models are calibrated based on historical reliability indices, all-like components within a calibrated region remain homogeneous. This paper presents a new method of customizing failure rates using equipment inspection data. This allows available inspection information to be reflected in system models, and allows for calibration based on interruption distributions rather than mean values. The paper begins by presenting a method to map equipment inspection data to a normalized condition score, and suggests a formula to convert this score into failure probability. The paper concludes by applying this methodology to a test system based on an actual distribution system, and shows that the incorporation of condition data leads to richer reliability models. Index Terms—Equipment failure rate modeling, inspection-based ranking, predictive reliability assessment.

I. INTRODUCTION

P

OWER delivery companies are under increasing pressure to provide higher levels of reliability for lower cost. The best way to pursue these goals is to plan, engineer, and operate power delivery systems based on quantitative models that are able to predict expected levels of reliability for potential capital and operational strategies. Doing so requires both system reliability models and component reliability models. Predictive reliability models are able to compute system reliability based on system topology, operational strategy, and component reliability data. The first distribution reliability model, developed by EPRI in 1978 [1], was not widely used due to conservative design and maintenance standards and, to a lesser extent, a lack of component reliability data. Eventually, certain utilities became interested in predictive reliability modeling and started developing in-house tools [2]–[5]. Presently, most major commercial circuit analysis packages offer an integrated reliability module capable of predicting the interruption frequency and duration characteristics of equipment and customers. Advanced tools have extended this basic functionality to include momentary interruptions [6], [7] and risk assessment [8], [9]. The application of predictive reliability models has traditionally assigned average failure rate values to all components. Although simplistic, this approach produces useful results and can substantially reduce capital requirements while providing the same levels of predicted reliability [10]. Advanced tools have

Manuscript received August 12, 2003. R. Brown is with KEMA, Raleigh, NC 27607 USA (e-mail: rebrown@ kema.com). G. Frimpong and H. L. Willis are with the ABB, Raleigh, NC 27606 USA (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TPWRS.2004.825824

attempted to move beyond average failure rates by either calibrating failure rates based on historical system performance [11], or by using multistate weather models [12], [13]. A few attempts have been made to compute failure rates as a function of parameters such as age [14], maintenance [15], or combinations of features [16], but these models tend to be system specific and are not practical for a majority of utilities at this time. The use of average component failure rates in system reliability models is always limiting and is potentially misleading [17]. Although generally acceptable for capital planning, the use of average values has two major drawbacks. First, average values cannot reflect the impact of relatively unreliable equipment and may overestimate the reliability of customers experiencing the worst levels of service. Second, average values cannot reflect the impact of maintenance activities and, therefore, preclude the use of predictive models for maintenance planning and overall cost optimization. Most utilities perform regular equipment inspections and have tacit knowledge that relates inspection data to the risk of equipment failure. Integration of this information into component reliability models can improve the accuracy of system reliability models and extend their ability to reflect equipment maintenance in results. Ideally, each class of equipment could be characterized by an equation that computes failure rate as a function of critical parameters. For example, power transformers might be characterized as a function of age, manufacturer, voltage, size, through-fault history, maintenance history, and inspection results. Unfortunately, in most cases, the sample size of failed units is far too small to generate an accurate model, and other approaches must be pursued. This paper presents a practical method that uses equipment inspection data to assign relative condition rankings. These rankings are then mapped to a failure rate function based on worst-case units, average units, and best-case units. The paper then presents recommended failure rate models for a broad range of equipment, presents a method of calibration based on historical customer interruptions, and concludes by examining the impact of these techniques on a test system based on an actual distribution system. II. INSPECTION-BASED CONDITION RANKING Typical power delivery companies perform periodic inspection on a majority of their electricity infrastructure. Utilities have various processes for collecting and recording inspection results. Paper forms stored in a multitude of departments make obtaining comprehensive system inspection results problematic. Many utilities, however, have migrated their inspection

0885-8950/04$20.00 © 2004 IEEE

BROWN et al.: FAILURE RATE MODELING USING EQUIPMENT INSPECTION DATA

and maintenance programs to computerized maintenance management systems (CMMS) and data management systems that can be used as central warehouses for equipment inspection results. After a population of similar equipment has been inspected, it is desirable to rank their relative condition. Consider a piece of . Furequipment with inspection item results ther, suppose that each inspection item result is normalized so that values correspond to the following: best inspection outcome; average inspection outcome; worst inspection outcome. based Each inspection item result is assigned a weight on its relative importance to overall equipment condition. These weights are typically determined by the combined opinion of equipment designers and field service personnel, and are sometimes modified based on the particular experience of each utility. The final condition of a component is then calculated by taking the weighted average of inspection item results. By definition, a weighted average of 0 corresponds to the best possible concorresponds to average condition, a weighted average of dition, and a weighted average of 1 corresponds to the worst possible condition Condition Score

783

TABLE I INSPECTION FORM FOR POWER TRANSFORMERS

(1)

After each piece of equipment is assigned a condition score between 0 and 1, equipment using the same inspection item weights can be ranked and prioritized for maintenance (typically considering cost and criticality as well as condition). This approach has been successfully applied to several utilities by the authors, and inspection forms and weights for most major pieces of power delivery equipment have been developed. In addition, inspection items have guidelines that suggest scores for various inspection outcomes. To illustrate, an inspection form for power transformers is shown in Table I and the scoring guideline for “Age” is shown in Table II. It should also be noted that inspection items can also be related to external factors. For example, overhead lines can include inspection items related to vegetation, animals, and lightning. Scores for these items will reflect both the external condition (e.g., lightning flash density) and system mitigation efforts (e.g., arrestors, shield wire, and grounding). Although useful for prioritizing maintenance activities, relative equipment condition ranking is less useful for rigorous reliability analysis. Since reliability assessment models require equipment failure rates, inspection results would ideally be mapped into a failure rate through a closed-form equation derived from regression models. As mentioned earlier, this is not presently feasible for most classes of equipment due to limited historical data. III. FAILURE RATE MODEL Although there is not enough historical data to map inspection results to failure rates through regression-based equations, interpolation is capable of providing approximate results. At a minimum, interpolation requires failure rates corresponding to

TABLE II GUIDELINE FOR POWER TRANSFORMER “AGE”

the worst and best condition scores. Practically, it requires one or more interior points so that nonlinear relationships can be determined. After exploring a variety of mapping functions, the authors have empirically found that an exponential model best describes the relationship between the normalized equipment condition of (1) and equipment failure rates. The specific formula chosen is failure rate condition score

(2)

Three data pairs are required to solve for the parameters A, B, and C. The previous section has developed a condition ranking methodology that, by definition, results in best, average, and worst condition scores of 0, , and 1, respectively. Therefore, , and . three natural data pairs correspond to

784

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004

TABLE III REPRESENTATIVE FAILURE RATE MODEL PARAMETERS ( VALUES IN FAILURES PER YEAR)

can be approximated by taking the average failure rate across many components or by using average failure rates docand are more difficult umented in relevant literature. to determine, but can be derived through benchmarking, statistical analysis, or heuristics. Given these three values, function parameters are determined as follows:

Failure rate graphs for some of the equipment in Table III are shown in Fig. 1. These are simply plots of (2) using the stated A, B, and C parameters of the displayed equipment. It is interesting to see that the range of failure rates of certain types of equipment is large, while other types have a more moderate range. This reflects the ranges found in a broad literature search which forms the basis of Table III. IV. MODEL CALIBRATION

(3) A detailed benchmarking of equipment failure rates is found in [18]. These results document low, typical, and high failure rates corresponding to system averages across a variety of systems. Assuming that (1) best-condition equipment have failure rates that are half that of best system averages, (2) average-condition equipment have failure rates of typical system averages, and (3) worst-condition equipment have failure rates twice that of best system averages, parameters for a variety of equipment are shown in Table III. These parameters, based on historical failure studies such as [14], are useful in the absence of system specific data, but should be viewed as initial conditions for calibration, which is discussed in the next section.

After creating a system reliability model, it is desirable to adjust component reliability data so that predicted system reliability is equal to historical system reliability [11]. This process is called model calibration, and can be generalized as the identification of a set of parameters that minimize an error function. Traditionally, reliability parameters (such as equipment failure rates) either remained uncalibrated or were adjusted based on average system reliability. For example, it may be known that an analysis area has an average of 1.2 interruptions per customer per year. Based on this number, failure rates can be adjusted until the predicted average number of customer interruptions is equal to this historical value. After failure rates are calibrated, switching and repair times can be adjusted until predicted average interruption duration is also equal to historical values.

BROWN et al.: FAILURE RATE MODELING USING EQUIPMENT INSPECTION DATA

785

Fig. 2.

Historical versus predicted customer interruptions.

Fig. 1. Selected equipment failure rate functions. TABLE IV CALIBRATION RESULTS FOR OVERHEAD LINE PARAMETERS

Calibrating based on system averages is useful, but does not ensure that the predicted distribution of customer interruptions is equal to the historical distribution. That is, it does not ensure that either the most or least reliable customers are accurately represented—only that the average across all customers reflects history. This is a subtle but important point; since customer satisfaction is largely determined by customers receiving below-average reliability, calibration of reliability distribution is arguably more important than calibration of average reliability. A system model with homogeneous failure rates will produce a distribution of expected customer reliability levels (e.g., customers close to the substation will generally have better reliability than those at the end of the feeder). If components on this same system are assigned random failure rates such that average system reliability remains the same, the variance of expected customer reliability will tend to increase. That is, the best customers will tend to get better, the worst customers will tend to get worse, and fewer customers can expect average reliability. The distribution of expected customer reliability is critical to customer satisfaction and should, if possible, be calibrated to historical data. A practical way to accomplish this objective is to calibrate condition-mapping parameters so that a distribution-based error function is minimized. Such an error function can be based on one of three levels of granularity: 1) individual customer reliability, 2) histograms of customer reliability, or 3) statistical measures of customer reliability. An error function can be defined based on the difference between each customer’s historical versus predicted reliability. This approach calibrates reliability to the customer level and utilizes historical data at the finest possible granularity. However, historical customer reliability is stochastic in nature and will vary naturally from year to year. An error function can be defined based on the difference between each customer’s historical versus predicted reliability. This approach calibrates reliability to the customer level and utilizes historical data at the finest possible granularity. However, historical customer reliability is stochastic in nature and will vary naturally from year to year. This is especially problematic with frequency measures. Although customers on average may experience one interruption

per year, a large number of customers will not experience any interruptions in a given year. Calibrating these customers to historical data is misleading, making about ten years of historical data for each customer desirable. Unfortunately, most feeders change enough over ten years to make this method impractical. An error function can also use a histogram of customer interruptions as its basis. The historical histogram could be compared to the predicted histogram and parameters adjusted to minimize the chi-squared error (4) where is the number of bins, is the historical bin value, and is the predicted bin value. Using the chi-squared error is attractive since it emphasized the distribution of expected customer reliability which is strongly correlated to customer satisfaction. Histograms will vary stochastically from year to year, but the large number of customers in typical calibration areas prevents this from becoming a major concern. Last, an error function can be based on statistical measures such as mean value and standard deviation . The error function will typically consist of a weighted sum similar to the following: Error

(5)

error, this function allows relative weights to Unlike the be assigned to mean and variance discrepancies ( and ). For example, a relatively large value will ensure that predicted average customer reliability reflects historical average customer reliability while allowing relatively large mismatches in standard deviation.

786

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004

Fig. 3. Visualization of calibrated results.

Once an error measure is defined, failure rate model parameters can be adjusted so that error is minimized. Since this process is over determined, the authors suggest using Table III for initial parameter values and using gradient descent or hill-climbing techniques for parameter adjustment. Calibration is computationally intensive since error sensitivity to parameters must be computed by actual parameter perturbation, but calibration need only be performed once. V. APPLICATION TO TEST SYSTEM The methodologies described in the previous two sections have been applied to a test system derived from an actual overhead distribution system in the Southern U.S. This model consists of three substations, 13 feeders, 130 mi of exposure, and a peak load of 100-MVA serving 13 000 customers. The analytical model consists of 4100 components. Customer historical failures are computed from four-year historical averages. Equipment condition for this system was not available, and was therefore assigned for randomly for individual components based on a normal distribution with a 0.5 mean and a 0.2 standard deviation. Calibration for this test system is performed based on the chisquared error of customer interruptions. Initial failure rates for values in Table III, all components are assigned based on and initial failure rates are computed based on condition and the parameters in Table III. Calibration is performed by a variable-step local search that guarantees local optimality. A summary of calibration results for overhead lines is shown in Fig. 2, and a visualization of calibrated results is shown in Fig. 3. The shape of the uncalibrated histogram is similar to the historical histogram, but with a mean and mode worse than historical values. After calibration, the modes align, but the predicted histogram retains a slightly smaller variance. In fact, the

Fig. 4.

Overhead line failure rate as a function of condition.

historical histogram is subject to stochastic variance, and the inability of the expected value calibration to match this variance is immaterial and perhaps beneficial. Uncalibrated and calibrated failure rate parameters are shown in Table IV, and corresponding failure rate functions are shown in Fig. 4. In effect, the calibration for this system did not change the failure rates for lines with good condition (less than 0.2), but drastically reduced the failure rates for lines with worse-thanaverage condition (greater than 0.5). These results are not unexpected, since this particular service territory is relatively homogeneous in both terrain and maintenance practice, and extremely wide variations in overhead line failure rates have not been historically observed. It is important to note that, in this case, equipment conditions were assigned randomly, and some were very high. Even though actual equipment for this system may never reach this poor condition state, the calibration process compensated by ratcheting down the failure rates assigned to equipment with the highest condition scores.

BROWN et al.: FAILURE RATE MODELING USING EQUIPMENT INSPECTION DATA

Once a system has been modeled and calibrated, it can be used as a base case to explore the impact of issues that may impact equipment condition such as equipment maintenance. Once the expected impact that a maintenance action will have on inspection items is determined, the system impact of maintenance can be quantified based on the new failure rate. This allows the cost-effectiveness of maintenance to be determined and directly compared to the cost-effectiveness of system approaches such as new construction, added switching and protection, and system reconfiguration. VI. CONCLUSION Equipment failure rate models are required for electric utilities to plan, engineer, and operate their system at the highest levels of reliability for the lowest possible cost. Detailed models based on historical data and statistical regression are not feasible at the present time, but this paper presents an interpolation method based on normalized condition scores and best/average/worst condition assumptions. The equipment failure rate model developed in this paper allows condition heterogeneity to be reflected in equipment failure rates. Doing so more accurately reflects component criticality in system models, and allows the distribution of customer reliability to be more accurately reflected. Further, a calibration method has been presented that allows condition-mapping parameters to be tuned so that the predicted distribution of reliability matches the historical distribution of reliability. Finally, the use of this condition-based approach allows the impact of maintenance activities on condition to be anticipated and reflected in system models, enabling the efficacy of maintenance budgets to be compared with capital and operational budgets. This model is heuristic by nature, but adds a fundamental level of richness and usefulness to reliability modeling, especially when parameters are calibrated to historical data. In the short run, gathering more detailed information on equipment failure rates and condition will strengthen this approach. In the long run, this same information can ultimately be used to develop explicit failure rate models that eliminate the normalized condition assessment requirement. REFERENCES [1] Development of Distribution Reliability and Risk Analysis Models, Aug. 1981. [2] S. R. Gilligan, “A method for estimating the reliability of distribution circuits,” IEEE Trans. Power Delivery, vol. 7, pp. 694–698, Apr. 1992. [3] G. Kjolle and K. Sand, “RELRAD—An analytical approach for distribution system reliability assessment,” IEEE Trans. Power Delivery, vol. 7, pp. 809–814, Apr. 1992. [4] R. E. Brown, S. Gupta, S. S. Venkata, R. D. Christie, and R. Fletcher, “Distribution system reliability assessment using hierarchical Markov modeling,” IEEE Trans. Power Delivery, vol. 11, pp. 1929–1934, Oct. 1996. [5] Y.-Y. Hsu et al., “Application of a microcomputer-based database management system to distribution system reliability evaluation,” IEEE Trans. Power Delivery, vol. 5, pp. 343–350, Jan. 1990.

787

[6] C. M. Warren, “The effect of reducing momentary outages on distribution reliability indices,” IEEE Trans. Power Delivery, vol. 7, pp. 1610–1615, July 1992. [7] R. Brown, S. Gupta, S. S. Venkata, R. D. Christie, and R. Fletcher, “Distribution system reliability assessment: Momentary interruptions and storms,” in Proc. IEEE Power Eng. Soc. Summer Meeting, Denver, CO, June 1996. [8] R. E. Brown and J. J. Burke, “Managing the risk of performance based rates,” IEEE Trans. Power Syst., vol. 15, pp. 893–898, May 2000. [9] L. V. Trussell, “Engineering analysis in GIS,” in Proc. DistribuTECH Conf., Miami, FL, Feb. 2002. [10] R. E. Brown and M. M. Marshall, “Budget-constrained planning to optimize power system reliability,” IEEE Trans. Power Syst., vol. 15, pp. 887–892, May 2000. [11] R. E. Brown and J. R. Ochoa, “Distribution system reliability: Default data and model validation,” IEEE Trans. Power Syst., vol. 13, pp. 704–709, May 1998. [12] M. A. Rios, D. S. Kirschen, D. Jayaweera, D. P. Nedic, and R. N. Allan, “Value of security: Modeling time-dependent phenomena and weather conditions,” IEEE Trans. Power Syst., vol. 17, pp. 543–548, Aug. 2002. [13] R. N. Allen, R. Billinton, I. Sjarief, L. Goel, and K. S. So, “A reliability test system for educational purposes—Basic distribution system data and results,” IEEE Trans. Power Syst., vol. 6, pp. 813–820, May 1991. [14] R. M. Bucci, R. V. Rebbapragada, A. J. McElroy, E. A. Chebli, and S. Driller, “Failure predic-tion of underground distribution feeder cables,” IEEE Trans. Power Delivery, vol. 9, pp. 1943–1955, Oct. 1994. [15] D. T. Radmer, P. A. Kuntz, R. D. Christie, S. S. Venkata, and R. H. Fletcher, “Predicting vegetation-related failure rates for overhead distribution feeders,” IEEE Trans. Power Delivery, vol. 17, pp. 1170–1175, Oct. 2002. [16] S. Gupta, A. Pahwa, R. E. Brown, and S. Das, “A fuzzy model for overhead distribution feeders failure rates,” in Proc. 34th Annu. North Amer. Power Symp., Tempe, AZ, Oct. 2002. [17] J. B. Bowles, “Commentary-caution: Constant failure-rate models may be hazardous to your design,” IEEE Trans. Rel., vol. 51, pp. 375–377, Sept. 2002. [18] R. E. Brown, Electric Power Distribution Reliability. New York: Marcel Dekker, 2002.

Richard E. Brown (SM’00) received the B.S.E.E., M.S.E.E., and Ph.D. degrees from the University of Washington, Seattle, and the M.B.A. degree from the University of North Carolina at Chapel Hill. Currently, he is a Principal Consultant with KEMA, Raleigh, NC, and specializes in distribution reliability and asset management. He is the author or co-author of many technical papers and the book Electric Power Distribution Reliability.

George K. Frimpong (SM’02) received the B.S.E.E. degree from the Massachusetts Institute of Technology, Cambridge, and the M.S.E.E. and Ph.D. degrees from the Georgia Institute of Technology, Atlanta. Currently, he is a Principal Consultant with ABB, Raleigh, NC, and specializes in diagnostic services for power systems and the implementation of reliability-centered maintenance programs for the utility industry.

H. Lee Willis (F’92) received the M.S.E.E. degree from Rice University, Houston, TX, in 1971. Currently, he is the Vice President of Technology and Strategy for ABB Consulting, Raleigh, NC. He is the author of many papers and several books including Power Distribution Planning Reference Book.