Validation and Assessment Issues of Energy Models: Proceedings of a Workshop ...

VALIDITY AS A COMPOSITE MEASURE OF GOODNESS

Harvey J. Greenberg
Frederic Murphy

Office of Analysis Oversight and Access
Assistant Administrator for Applied Analysis
Energy Information Administration

HIERARCHY OF VALIDATION APPROACHES

Let us suppose validity is to be used as an evaluation criterion, expressing preferential selection among alternative models. We would like to say, "Model A is more valid than model B." This ranking may be confined to one application or to a spectrum of related applications. Using validity as a measure of goodness is consistent, for example, with the notion that validity, relative or absolute, pertains to a measure of confidence in the model, its output or its use.

The goal of this paper is to establish a minimum requirement for an objective evaluation of models. The evaluation is presumed to be based on the model's attributes. For example, an equilibrium model may have attributes: a Supply function, a demand function, and a rule that defines a market equilibrium. Other attributes can be how these components interact to produce a forecast or the data bases used.

Today's economic models have many components, both structural and numerical. Such models are defined to be modular, although this term describes a form of model management rather than structure. While it is not necessary that an attribute be synonymous with a module or component, it is desirable to postulate a rule that relates a model's validity to the validities of its attributes. The reasons for this are twofold. First, the model validation process could be partitioned by component, easing the management task. Second, a "supermodel" could possibly be developed using the best design for each component of all models considered, eliminating the need for overall model comparisons.

The first question examined is whether it is sufficient
to define the attributes for model evaluation as the valid-
ities of the submodels. We answer this question by showing
that desirable properties of a concept of validity become
internally inconsistent. Given the inability to analyze
models piece-by-piece, even simple models become complex
to evaluate, and the problem becomes one of multiattribute
utility theory.

We then ask whether it is possible to retain some degree of simplicity in the validation process by using ordinal measures of utility. Given that we must consider both the submodels and their interactions, we shall demonstrate that Arrow's Possibility Theorem [1] can be applied to prove the logical inconsistency of properties that seem reasonable individually (see Fishburn's survey [2]). As a consequence,

if we postulate properties of an ordinal validity measure, such as transitivity and completeness, and if we use a "reasonable" rule about relating validity of models to their attributes, then we encounter an inconsistency and must forsake one of the properties, which we posited. Consequently, there seems to be no reasonable simplification of the model validation process that would allow us to judge models. We are effectively left with the problem of comparing all aspects of models, components, and interactions using cardinal measures to express the degree of merit for every attribute. Since the cardinal measures are dependent upon the scenarios of interest and upon the evaluator, systematic model validation for model selection remains a controversial task and may be unattainable.

INITIAL PROPERTIES FOR VALIDITY

To address the issue of component evaluation, we need only one of the four properties presented below. The four are presented together to maintain the parallels with Arrow [1]. Let X, Y, and Z denote three models, and let a subscript denote an attribute--for example, X. is attribute i of model X.

The relation '<' is defined to be a validity relation,
where X<Y means "X is less valid than Y." We also define
X>Y to mean "X is more valid than Y," and X = Y means "X
and Y are equally valid." We use X<Y (XY) to mean
(XY) or X = Y.'

The first two properties we shall assume are:

'X<Y

Property 1 (completeness): For any two models, X and Y, one of the following relations is true:

X <Y, X = Y or X>Y.

Property 2 (transitivity):

If X, Y, and Z are three models such that X<(=, >)Y and Y, Z, then X<, >) Z.

The validity relations (<, =, >), and associated properties, also apply to attributes. When we do not wish to specify the relation between X and Y, we write ' 'X ¡RiYi''

The Completeness Property (1) is axiomatic if validity is to serve as a criterion for model selection. If, by contrast, the validities of two models are incomparable, then validity cannot be the basis for choosing one of them.

The next property characterizes a relation between model validity and attribute validity. It embodies a form of coordinate-wise monotonicity. In particular, it says that if one of the attributes is improved (in the sense that > X;) while others are left unchanged (i.e., X = X for i), then the model's validity cannot worsen (1.e., x'>X).

Property 3 (monotonicity): If X and Y are two models
such that X 2 Y1 for all attributes (i),
X; Yi
then

X > Y.

The next property is Arrow's independence of irrelevant alternatives. An example of what this property requires is that it excludes the situation where the ordering of validities between two models, X and Y, depends upon whether a third model, Z, is in the set of candidate models. Thus, if X<Z<Y and model Z is deleted, it remains true that X<Y.

More generally, Property 4 requires the following to hold. Assume X<Y<Z. Assume, also, the opinions on the validity of the attributes of Z change relative to the attributes of X and Y. (This change in attribute rankings may change the ranking of Z relative to X and Y.)

If we do not change our opinions about the attributes of X, relative to those of Y, then Property 4 says it must still hold that

X<Y.

Property 4 (independence): Let (R) and (R) be two equivalent attribute orderings for

X and Y--e.g., X; <Y; with R; if and only
i
if X,<Y, with R tf X1R1Y1,..., XRnn
implies x<Y, then XRY, RY
X1RY 1
implies X<Y, irrespective of how (R)
and (R) ranks X and Y, relative to
i
Z1, the i-th attribute of model Z.

There are two other properties that are reasonable to add. Before presenting them, however, we shall illustrate how the first four properties may be inconsistent if we presume a component-by-component evaluation.

INADEQUACY OF COMPONENT EVALUATION

In this section we give an example where applying the monotonicity property exclusively to components may lead to a wrong conclusion. Although the first four properties were stated in ordinal terms, the definition of monotonicity would be unaltered with a cardinal scheme. The examples presented, therefore, represent the intrinsic inadequacy of component-wise measurements. First, to help affix ideas, consider an analogy in numerical error analysis.

Let B represent a computed inverse of a nonsingular matrix, A. Define the error matrix, E = A B. When we use B to solve a linear system, Ax = b, we compute Bb. b, we compute Bb. The error is Ebe, where e is the additional error from the computation of Bb. To keep notation simple, let e be negligible, so the error, in computing x Bb, versus the actual solution, x = A-1 b, is e(b) = xx | Eb. Obviously, as small as possible.

it is desirable to make ||E

There is a technique whereby one column of E can be reduced to zero, while leaving all other columns of E unchanged. That is, one component factor of the inverse can be improved while all others remain unchanged. This suggests the modified, computed inverse, B', induces less error than B. some right-hand sides (b), however, previous cancellation of error is gone, and the solution error, e'(b), is larger than the original error, e(b). That is, we have ||E' || ||E||, yet e' (b) > e(b) for some b.

By analogy, we may have a model composed of three attributes: supply, demand, and equilibration. The equilibration component represents a model of market rules to arrive at a balanced forecast, thus embodying interaction between supply and demand. In its present state, we may know about imperfections in all three components. If we discover a way to improve one of the components, but leave the others unchanged, then for some scenarios (not predictable), the model results may have systematic biases not present before the "improvement."

For example, suppose we are interested in forecasting petroleum product prices. Assume we have an equilibrium solution from a supply model that underestimates supply for the given prices, a demand model that overestimates demand for the given prices, and a relatively inflexible refinery process model. These biases are not necessarily due to flawed design, but may be the consequence of concerns, such as model

« Previous Continue »

Books

Validation and Assessment Issues of Energy Models: Proceedings of ..., Volume 13