You are on page 1of 4

Proceedings of the 2012 9th International Pipeline Conference IPC2012 September 24-28, 2012, Calgary, Alberta, Canada

IPC2012-90440

IMPROVED COMPARISON OF ILI DATA AND FIELD EXCAVATIONS


William V. Harper, PhD, PE Otterbein University, Mathematical Sciences Westerville, Ohio, USA Thomas A. Bubenik, PhD, PE Det Norske Veritas (USA), Inc. Dublin, Ohio, USA David J. Stucki Otterbein University, Mathematical Sciences Westerville, Ohio, USA David A. R. Shanks, P. Eng. Det Norske Veritas (Canada) Ltd. Calgary, Alberta, Canada

Clifford J. Maier Det Norske Veritas (USA), Inc. Dublin, Ohio, USA Neil A. Bates, P. Eng. Det Norske Veritas (Canada) Ltd. Calgary, Alberta, Canada

ABSTRACT The importance of comparing in-line inspection (ILI) calls to excavation data should not be underestimated. Neither should it be undertaken without a solid understanding of the methodologies being employed. Such a comparison is not only a key part of assessing how well the tool performed, but also for an API 1163 evaluation and any subsequent use of the ILI data. The development of unity (1-1) plots and the associated regression analysis are commonly used to provide the basis for predicting the likelihood of leaks or failures from unexcavated ILI calls. Combining such analysis with statistically active corrosion methods into perhaps a probability of exceedance (POE) study helps develop an integrity maintenance plan for the years ahead. The theoretical underpinnings of standard regression analysis are based on the assumption that the independent variable (often thought of as x) is measured without error as a design variable. The dependent variable (often labeled y) is modeled as having uncertainty or error. Pipeline companies may run their regressions differently, but ILI to field excavation regressions often use the ILI depth as the x variable and field depth as the y variable. This is especially the case in which a probability of exceedance analysis is desired involving transforming ILI calls to predicted depths for a comparison to a threshold of interest such as 80% wall thickness. However, in ILI to field depth regressions, both the measured depths can have error. Thus, the underlying least squares regression assumptions are violated. Often one common result is a

regression line that has a slope much less than the ideal 1-1 relationship. Reduced Major Axis (RMA) Regression is specifically formulated to handle errors in both the x and y variables. It is not commonly found in the standard literature but has a long pedigree including the 1995 text book Biometry by Sokal and Rohlf in which it appears under the title of Model II regression. In this paper we demonstrate the potential improvements brought about by RMA regression. Building on a solid comparison between ILI data and excavations provides the foundation for more accurate predictions and management plans that reliably provide longer range planning. This may also result in cost savings as the time between ILI runs might be lengthened due to a better analysis of such important data. INTRODUCTION Linear regression is an important tool in many disciplines. In pipeline integrity it is used in various ways, with one of the most common being an assessment of how well excavation dig measurements compare to ILI values. Excavations are expensive and such data should be given a proper treatment. This paper presents a methodology that can improve regression modeling both in terms of the underlying theoretical statistical assumptions and also in terms of results that are more useful and better match reality. Almost all linear regression analysis performed is based on a least squares methodology that has much to offer. However the assumptions underlying such applications are commonly

Copyright 2012 by ASME

ignored. In doing so the resulting regression fits are typically disappointing and may not be appropriate for the intended modeling. This paper considers one particular data for demonstrating the potential advantage of reduced major axis (RMA) regression over least squares regression. REGRESSION APPROACHES Least Squares Regression In many commercial spreadsheet programs and major statistical packages, least-squares is the default method for performing a linear regression. Consideration of linear is not limited to straight line fits between a single independent variable x and a single dependent variable y even though the example presented is limited to such an application. Linear is also considered with respect to the parameters to be estimated and not the straight line fit between the x and y variables. But this is a minor issue as far as this paper is concerned. Least squares regression minimizes the sum of squared deviations (errors) of the vertical distance between the actual y values and their corresponding predictions, typically termed yhat where the hat implies an estimate. A key assumption in such a design is that the independent variable x is measured without error. Often in pipeline integrity the horizontal or x predictor variable is the ILI call and the vertical or y variable is the matched field measurement. Both the ILI call (x) and the field measurement (y) in this case are subject to error. Thus from a theoretical statistics perspective, there are problems using least squares regression for such modeling efforts. Reduced Major Axis Regression Reduced Major Axis (RMA) has its roots in various fields including biological applications. For example, biologists may want to develop a predictive model to predict fish weight for a given breed based on its length or vice versa. However, in this case both weight and length are subject to errors when trying to collect data from live fish. Similarly, metal loss correlation data will have error in both the tool reported depth and field measured depth [1]. The RMA approach can be found in the text book by Sokal & Rohlf (1995) [2]. Other names that may appear in the literature for RMA are geometric mean regression, least products regression, diagonal regression, line of organic correlation, and the least areas line (Wikipedia, January 3, 2012) [3]. A common experience for fitting a least squares regression predicting field measurements as a function of ILI calls is that the resulting model under-predicts deeper calls and overpredicts shallow calls. This is reminiscent of the problem of regression to the mean (Galton, F., 1886) [4] although this paper will not pursue the impact of this interesting but different issue with respect to pipeline integrity. It is a question as to whether this result of over and under predicting is a reasonable match for reality or whether it is an artifact of the methodology employed. While one desires an accurate predictive model so that ILI calls for corrosion pits that have not been excavated can be

reasonably quantified, a common desire among many integrity engineers is to be conservative. For predicting pit depths, a model that generally under-predicts deeper pits is nonconservative. RMA software automatically minimizes the sum of the areas (thus using both vertical and horizontal distances of the data points from the resulting line) rather than the least squares sum of squared vertical distances. One of the issues with standard least squares regression is the inability to treat the least squares regression equation y = a + bx as an ordinary equation and back solve to obtain an equation that predicts x from y. With least squares, when one interchanges the x and y variables, the resulting regression equation is not the equivalent of x = (y a)/b. Additionally, doing so with least squares results in the paradox of similar over and under prediction when the variables are interchanged. With an RMA equation, one can perform this simple algebraic feat as it will match the equation RMA one would obtain with the variables interchanged - i.e., the resulting RMA regression is the equivalent of x = (y a)/b. EXAMPLE - LEAST SQUARES VERSUS RMA As mentioned earlier, this paper presents one example to illustrate the differences in various aspects of least squares versus RMA. Some analysts have been using RMA for many years and concluded it is an important tool that has the potential to improve modeling results rather than a panacea. The example given consists of a relatively large data set of matched excavation pit depths and the ILI calls. The data set has 1,812 ILI external pit depths from a single ILI run that have been matched with excavation field data. The authors routine or other software [5] can be used to compute both least squares and RMA estimates. Figure 1 shows the y-intercept and the slope for both approaches. Coefficient Intercept Slope Traditional Least Squares 0.096220 0.501700 Reduced Major Axis -0.00225 1.070149

Figure 1. Least Squares, RMA Regression Coefficients. In this example, which is fairly typical of least squares, results in the expected field measured y (called Pit Depth (%) for this application) regression equation y = 0.09622 + 0.5017 * x where x is the ILI %Depth (% of wall thickness). At this point, it is worthwhile to examine the issues associated with the regression equation. In many applications, pig calls are not reported (or filtered out) if they are less than some threshold such as 10%. Assuming this is a reasonable lower bound the least squares regression equation y = 0.09622 + 0.5017x will predict no values less than approximately 14.6% wall thickness. If there was a pig call of 100%, the least squares equation only predicts 59.8%. This is a concern that is too often overlooked. The following figures will better illustrate some aspects of this issue.

Copyright 2012 by ASME

External Depth % (either field or predicted)

One may question the possible range of values for the RMA equation that has y = -0.00225 + 1.070149 * x. For a 10% ILI call, the RMA predicted value is 10.5% and for a 100% ILI call the predicted value is 106.8%. While a wall thickness greater than 100% is not possible, one starts to see that, at least in this example, the RMA covers a predictive range of importance and is not limited to such a tight interval as will be shown more explicitly in plots that follow. Instead of showing both axes ranging from the possible full range from 0% to 100%, Figure 2 focuses on the actual range of the values in the data to provide a more detailed view in which the 1,812 pairings reside. YHat_RMA is the predicted RMA field depth while YHat_Trad is the traditional least-squares prediction for the field depth.
45.00% 40.00% Predicted Field OR A ctual Field 35.00% 30.00% 25.00% 20.00% 15.00% 10.00%
10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00%

(YHat_RMA) is fairly similar to the distribution of Field Depth(%).


45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% Field Depth (%) ILI % Depth YHat_RMA YHat_Trad

Variable Field Depth (%) YHat_Trad YHat_RMA

Figure 3. Box Plots of Dependent Variable Field Depth (%), Independent Variable ILI % Depth, and the Two Predictions: YHat_RMA and the Least Squares YHat_Trad. Figures 4 and 5 also illustrate the difference between the RMA and least squares regression predictions with a superimposed 1-1 line. The primary observation from Figures 4 and 5 is the limited range on the predicted yhat that is shown here on the horizontal axis with the true dependent (y) variable Field Depth (%) on the vertical axis. The coverage shown in the plots illustrate a previous point - i.e., while the RMA predictions range roughly from about 10% to 45% wall thickness, matching the range of the field measurements, the least squares predictions range from approximately 15% to 30% wall thickness. Such comparisons should be considered before using a traditional least squares regression when comparing ILI to field measurements. Indeed this potential for concern may be generalized to cover a wider domain of pipeline integrity modeling issues.
45.00% 40.00%

ILI % Depth

Figure 2. Predicted Least-Squares and RMA Regression Over the Range of the ILI Calls. YHat_RMA is the Predicted RMA Field Depth While YHat_Trad is the Traditional Least-Squares Prediction for the Field Depth. Figure 3 shows box plots for the following: 1. Y variable: Field Depth (%) which is the field measurement 2. X variable: ILI % Depth 3. YHat_RMA: predicted y using RMA 4. YHat_Trad: traditional predicted y using least squares Figure 3 shows in a much clearer fashion the concern listed in the prior paragraph. The range of predictions for the least squares regression is much too narrow to adequately model the field measured pit depths. Each of the four items is shown with a box plot. The lower part of the box is the 25th percentile, the middle line is the median or 50th percentile, and the top line is the 75th percentile. The lines (known as whiskers) extending out of the box go to the most extreme values that are not potential outliers. Any potential outliers (per the box plot methodology) are represented by asterisks (*). Note the unrealistic small range of distribution in the traditional least squares regression (YHat_Trad) versus the other three plots. Note also the predicted RMA regression distribution

35.00% Field Depth (%) 30.00% 25.00% 20.00% 15.00% 10.00% 10.00% 15.00% 20.00% 25.00% 30.00% YHat_RMA 35.00% 40.00% 45.00%

Figure 4. One to One Plot of RMA Regression Predictions to Actual Field Depth (%).

Copyright 2012 by ASME

45.00% 40.00% Field Depth (%) 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% YHat_Trad

Figure 5. One to One Plot of Least Squares Regression Predictions to Actual Field Depth (%).

CONCLUSIONS Typical analytical work such as modeling pipeline data generally costs little in comparison to the associated field investigation. The authors suggest the use of modeling methods that are theoretically sound and provide a reasonable approximation of reality. For regression oriented tasks, reduced major axis regression is worthy of consideration. ACKNOWLEDGMENTS The authors would like to thank Dr. Martin Phillips for drawing our attention to RMA. REFERENCES [1] HAINES, H., McNEALY, R., and ROSENFELD, M. J., IPC2010-31483: Is the 80% Leak Criterion Always Appropriate?, International Pipeline Conference 2010, ASME, Calgary, Alberta. [2] SOKAL, R. R., and ROHLF, F. J., Biometry, 3rd edition, section 14.3 titled Model II Regression, pp. 541-549, W. H. Freeman, New York, 1995. [3] http://en.wikipedia.org/wiki/Total_least_squares, January 3, 2012 [4] GALTON, F., Regression toward Mediocrity in Hereditary Stature, Journal of the Anthropological Institute of Great Britain and Ireland, 15, pp. 246-263, 1886. [5] BOHONAK, A. J., RMA: software for Reduced Major Axis regression, http://www.bio.sdsu.edu/pub/andy/ RMAmanual.pdf, 2004.

Copyright 2012 by ASME

You might also like