Journal of Research of the National Bureau of Standards

Volume 91, Number 1, January-February 1986

Ruggedness Testing-Part I: Ignoring
Interactions

Robert C. Paule, George Marinenko, Melissa Knoerdel, and William F. Koch
National Bureau of Standards, Gaithersburg, MD 20899

Accepted: August 29, 1985

A straightforward explanation of the statistical technique of ruggedness testing is presented. Efficient
Plackett-Burman designs are used in ruggedness tests. These designs involve the simultaneous change of levels
of a number of variables. The designs allow the ruggedness test user to determine the effect of the separated
variables on the measurement process. This paper (Part I) deals with the common situation where two-factor
and higher order interactions can be safely ignored. A method is presented for evaluating the experimental
uncertainties. A detailed example of glass electrode measurements of pH of dilute HCl solutions is used to
illustrate ruggedness testing procedures.

Key words: interactions; main effects; orthogonal designs; pH measurements; Plackett-Burman designs;
ruggedness tests.

Introduction

The purpose of a ruggedness test is to find the factors that strongly influence measurement results, and to determine how closely one needs to control these factors. Ruggedness tests do not determine optimum conditions for a test method.

In the testing of a protocol, it is frequent occurence that the coordinating scientist is dismayed by the large variabilities observed between different laboratory results. The scientist may have developed the protocol being tested and has taken great care and pride in that development. His laboratory has documented "proof" of high precision and accuracy for the method. What

About the Authors: Robert C. Paule is a physical scientist assigned to the NBS National Measurement Laboratory (NML). George Marinenko and William F. Koch are chemists in NML's Inorganic Analytical Research Division in which Melissa Knoerdel, a student, serves the Division during summer vacations.

has gone wrong? How can the other laboratories get such wild results?

A large part of the answer may be that the coordinating scientist has been unrealistically consistent in his own laboratory work. He may have always used fixed equipment such as a furnace that was set at 60.0 °C and that did not vary by more than ±0.5 °C. Even though the furnace dial read 60.0 °C, the furnace temperature may in reality have been 64.2±0.5 °C. The constant bias of 4.2 °C did not affect his precision, but it may have affected his accuracy. Other constant errors will, likewise, not affect his precision. In regard to accuracy, these additional errors may partially cancel each other. It is the nature of protocol development that work will continue until the errors do cancel, and the "right" answer is obtained. Thus, the laboratory that has developed the protocol will eventually show both good precision and accuracy. In an interlaboratory experiment, however, conditions are different. The other (individual) laboratories do not have the same biases, and the rather complete cancelling of systematic errors does not occur. Differences in laboratory conditions can result in

large variabilities between different laboratory results. In frustration, the coordinating scientist may tighten the protocol specifications. One can see that if temperature is important, then even a tightened protocol specification of 60.0±0.1 °C will not be effective unless the biases between laboratories are eliminated. A true temperature of 60.0±0.5 °C may be quite satisfactory, but large biases cannot be tolerated.

To work towards perfecting a test method one must first determine if a factor such as temperature is important, and then decide if a true ±0.5 °C tolerance is acceptable. Such matters are best investigated in a single laboratory rather than in multiple laboratories since, here, we are interested in the effect of changes in temperature. A constant bias within a single laboratory will not interfere in the investigation of changes of temperature. Other factors associated with the protocol must also be evaluated. How do we proceed?

The coordinating scientist may believe that the protocol contains seven factors (variables) that could influence the measurement results. Suppose it is decided to investigate the effect of each factor at only two levels: at a high level and at a low level. A full factorial investigation of the seven factors at each of the two levels would require 27=128 measurements, and this does not include replicate measurements. Fortunately, one does not have to make this many measurements. One can use a class of experimental designs called Plackett-Burman designs [1]. It is possible, by using these designs, to study up to N-1 factors using only N measurements.

There are some restrictions on the main effects and interaction terms in the model. The restrictions will not be given here since they only have to do with the "centering of the data" for the evaluation of the terms. In ruggedness testing we do not center the data about some midpoint, but rather redefine the effects as differences between the results at the high and at the low levels. We will also do away with the subscripts of the above model. We simply recognize that measurement results are affected by various main effects and interactions.

From the general mathematical model one can infer that experiments with a larger number of factors will have a very large number of higher-order interactions. It is generally believed that main effects tend to be most important in describing (or controlling) the measurement results, that two-factor interactions are less important, and that higher order interactions are even less important. Plackett-Burman designs are well suited for measurement processes that have negligible interactions.

Use of Plackett-Burman Designs

The most common use of Plackett-Burman (PB) designs with N measurements allows one to get the most important (main effects) information. With N measurements, however, the N-1 main effects are confounded with the two-factor and with higher order interactions. If the interactions are relatively small, then we may be satisfied in making only N measurements and obtaining slightly contaminated estimates for the N-1 main effects. Experience has tended to show that one gains more useful information by examining additional factors than by evaluating the interactions.

Numerous PB-designs are available [1]. A PB-design for seven factors and eight measurements is given in table 1. A (+) for a given factor indicates that the measurement is made with that factor set at the high level, and a (-) indicates the factor is to be at the low level. All seven factors are set for each measurement and a single result is obtained from each of the eight measurements. The measurements should be made in a random order. Typical measurement results are shown at the far right of the design. Scanning down each column of the design one sees that there are equal numbers of (+) and (−) factor settings.

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][ocr errors][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

The PB-design (see table 1) is constructed such that the ΣA(+) and the EA(-) terms will each contain an equal number of B(+) and B(−) terms. Thus, the A effect is orthogonal, i.e., is not affected by the B effect. In the PB-designs all main effects (columns) are orthogonal to all other main effects (columns). This orthogonality, however, does not extend to the interactions. The orthogonality of the main effects and the acceptance of a slight contamination of estimates for the main effects (by the interactions) are the major characteristics of ruggedness testing. For many practical problems this is all that is needed.

For the PB-design, the standard deviation for an effect, such as A, is obtained by using eq (1) and the standard deviation of a single measurement σ.

Two methods for determining a sample estimate of the standard deviation of a single measurement, s, will be presented.

PB-Design Considerations

Equation 2b shows that the standard deviation of an effect is inversely proportional to VN, the number of measurements made. One is therefore tempted to use large PB-designs. Practical experience, however, favors moderate size designs. Overly large designs require the correct setting of too many factors, and this increases the chance for blunders. In addition, large designs require more time to complete and one becomes concerned that other factors not being considered in the design can change and distort the results. The effects of incorrect factor settings and of shifting experimental conditions are propagated into all of the calculated results (see eq 1). The above listed (N=8) PB-design is a suitable size for most experiments. If more factors need to be studied, they can be handled by using a second (N=8) PB-design. This latter procedure may even involve the repeated testing of some of the more important factors from the first design. The (N=8) PBdesign can also be conveniently used to study two-factor interactions (see Ruggedness Testing-Part II: Recognizing Interactions).

In general, the size of all effects in a PB-design will increase with increased separation of the high and low factor settings. We have implicitly assumed that the main effects are linear. It seems prudent to only use moderate separations of the high and low settings so that the measured effects will be relatively linear and, at the same time, large relative to the measurement error. For the high and low settings of the factors it is suggested that one use the extreme limits that one may expect to observe between different qualified laboratories.

Judging the Effects

How can one judge if any of the estimated main effects are too large? Since the main effects are expressed in the units of the measurement, one can simply make a direct judgment whether the change associated with a factor shift from a high level to a low level is too large, or not. Other, more quantitative methods of judgment which analyze the variance of measurements are given below. We should recognize that these quantitative methods still only give tentative answers and that follow-up or confirmatory experiments are frequently needed.

[merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small]

Action should be taken if the effect of a factor is statistically significant, and if the size of the effect is of practical importance; we should then tighten the protocol specification for that factor. This will help reduce the interlaboratory variability.

One may wish to repeat the complete PB-experiment so as to obtain better estimates of the factors and to get a current estimate of the within-laboratory measurement variability, s. In estimating the measurement variability one needs to guard against the occurrence of a possible measurement shift between the running of the two designs. This can be handled mathematically. Let us now work through a real example.

This ruggedness testing example deals with factors that may influence the determination of the pH in dilute acid solutions when measurements are made by use of a glass electrode. Table 2 gives the seven factor (N=8) PB-design which was used. This convenient design was first suggested by F. Yates [2]. It was frequently used by W. J. Youden [3] who did much of the pioneering work in ruggedness testing.

The above Yates-Youden design can be obtained from the seven-factor PB-design of table 1 by relabelling the PB-columns A-G to read C, F, G, D, E, B, A, and the PB-rows 1-8 to read 2, 3, 5, 4, 7, 8, 6, and 1. One then

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small]

rearranges the columns and rows to be in the usual alphabetic and numeric order. The above operations are perfectly acceptable since the assignment of column and row labels is arbitrary and the rearrangement of the columns and rows has no effect on the overall arithmetic operations. Such rearrangements are, in fact, one means of randomizing the assignment of variables.

A number of pH measurement experiments were run using six different dilute acid solutions. For simplicity of presentation, Part I discusses only the results from one of the solutions, an HCl solution with a known pH of 2.985. Subjects of more involved PB-testing and comparisons between the different acid solutions are described in Part II. The seven factors that were studied are listed below. The first listed level for each factor has been arbitrarily assigned the positive sign in the above table.

The above is only a partial list of factors that will change the observed value of the pH. Obviously, all other factors that are not listed above need to be kept constant. The particular, constant levels of these other factors will result in some specific offset in the pH measurements. In the ruggedness test, however, this fixed offset need not concern us since we are only interested in the measurement changes (the effects) that occur when the above seven factors (AG) are changed.

Results from the ruggedness test are given in table 3. The complete experiment was also repeated on a second day. A different random order of measurement was used for each day. The two sets of measurement results are given at the far right of the design.

For the first set of the above reported measurements, the effect of factor A is calculated from eq 1 as the difference of the average value when 25 °C is used and the average value when 30 °C 30 °C is used, i.e., (2999+3055+ 3049+2949)/4-(2904+3015+ 3006 + 2964)/4 3013-2972 +41. The averages and differ

« Previous Continue »

Books

Journal of Research of the National Bureau of Standards, Volumes 91-92