Page images
PDF
EPUB

ASSESSING THE ICF COAL AND ELECTRIC UTILITIES MODEL

Neil L. Goldman and James Gruhl

M.I.T. Energy Laboratory, Model Assessment Group

I. Objectives and Types of Model Assessment Activities

Because there is such a variety of ways and means for evaluating a model, the first step in an assessment process should be the development of a strategy. In fact, such a strategy must be very carefully chosen and orchestrated to make proper use of the inevitable limitations in time, funds, and manpower available for the assessment. In order to make choices between alternative assessment paths, the objectives or goals of the assessment must first be clearly understood. Some possible objectives include:

[blocks in formation]

The most obvious, and perhaps best defined, of these objectives is the validation of the input data and structural form of the, model relative to specific applications. It would probably be more useful to make statements about the appropriateness of a model for contributing information to future policy decisions in generic application areas. Two objectives that would be difficult to achieve simultaneously would be (1) suggestions for model improvements, and (2) establishing model credibility. The first of these suggests a series of model versions, while the second suggests a single model version established at the beginning of the assessment.

Once the objectives have been decided, there are a number of alternative settings and depths for the assessment process. Some of these alternatives result from the possibility of different assessor identities. For example, the assessors could be any of the following:

[blocks in formation]

In addition, the assessors could address either a single model or several comparable models. There could be very different expectations from the assessment process depending upon this choice of setting. For instance, the model builder could obviously provide a very cost-effective assessment, but credibility would be difficult to establish under such circumstances.

For each possible assessment setting outlined above there are four potential depths:

(1) literature review: survey and critical analysis of the published literature on the structure, implementation, and applications of one or more models;

(2)

(3)

overview: literature review plus analysis of computer codes and other unpublished materials;

independent audit: overview plus the conduct of model exercises designed by the assessor and executed by the modeler with the assessor "looking over the shoulder;"

(4)

in-depth: independent
independent detailed

assessment of model formulation, structure, and implementation with the assessor in control of the model and associated data base.

The most cost-effective of these depths will depend upon a number of model characteristics, most particularly, model maturity. If a model is very mature, it is probably worthwhile to go to the expense of an in-depth assessment. If it is immature, then an audit or overview might be sufficient for reaching major conclusions. Size, structure, complexity, execution costs, previous applications, and previous assessments are all aspects that should contribute to the decision on the most cost-effective depth. It might be noted that the classical validation process has consisted of in-depth assessment by model builders, audit roles for model sponsors, and literature review or peer review by independent parties.

As has been pointed out by Saul Gass in several of his papers, an important way to limit the assessment process is to limit its scope (see Table 1). First, decisions must be made concerning the version(s) of the model that is to be assessed, and the types of model applications at which the assessment process is to be aimed. Point 2 of Table 1 de fines different aspects of the model that can be evaluated in an assessment: documentation, validity, verification, or operational characteristics.

The ability to assess model documentation adequately will depend to a large degree upon the content and amount of written material that has been produced by the model builders. There are a number of different items that must be included in the documentation:

[merged small][ocr errors][merged small]

description of data that have been used, preparation of new inputs and parametric data,

(1)

(2)

Data

[ocr errors]
[blocks in formation]

(6)

Range of Applicability of Model, and

Descriptions of All Validations Performed - by model builder or by independent parties.

Table 1

DIFFERENT CATEGORIES REPRESENTING VARIOUS SCOPES OF ASSESSMENTS

Specific Applications of Interest

1.1 Validation in context of specific applications, ranges of variables, degree of aggregation required, absolute values

versus policy perturbation studies

1.2 No specific cases, just an assessment that provides the

foundation for generally evaluating model accuracy

2. Aspects to be Assessed

2.1 Documentation

[ocr errors]

of structure, validations performed,

parameter estimation, past uses, applicability, computer code and use

2.2 Validity - informed judgment about model logic and

[blocks in formation]

2.3

Verification - accuracy of algorithmic and computational implementation

2.4 Operational characteristics - flexibility, extensibility,

transferability (training required, ease of use, modeler independence from model and model knowledge), time and

cost efficiencies.

Before discussing validation and verification techniques it is necessary to de fine some terminology. The model is considered to be built from historical or other data observations. The inputs are defined as any values that change for different applications of the model. The parameters and structural elements are those aspects of the model that are meant to stay the same for different sets of model runs. With these definitions in mind, Table 2 illustrates different types of validation and verification techniques that have been found described in the literature or have been postulated by us. These validation techniques are essentially two-part processes. The first part involves examinations or actions that are performed on parts of the model. The second part of the process involves an assessment of the validity of the effects of those actions as measured by any of the seven bases for comparison listed at the end of Table 2.

In previous documents we have discussed several of these validation techniques, so a lengthy discussion would not be appropriate here. We have chosen point 5.4 in Table 2 as a means of summing up our discussion of validation techniques. In many ways this point represents the ideal final result of an assessment, that is, a probabilistic measure of the output validity of the model. The class of models for which this ideal measure can be developed has not been clearly established. It is likely that this ideal probabilistic measure can only be bounded from above and below on the basis of simplified assumptions and techniques, possibly including linear or nonlinear analytic representations of the model's input-output response surface. For simple enough representations of the model it might be possible, either analytically or through the use of Monte Carlo techniques, to propagate input uncertainties through structural uncertainties to create measures of output uncertainties. Besides being difficult conceptually, the process of developing quantitative measures of predictive quality is likely to be hampered by:

(1)

(2)

the fact that it may be as time-consuming a process as the whole model building procedure,

it will be application specific,

(3)

funding requirements are generally not appreciated by sponsors, and

(4)

decision makers are not now insisting on such displays of predictive quality.

The final aspect of the assessment scope is an evaluation of the model's operational characteristics. These characteristics can generally be categorized as:

(1) Ease of Updating Data
levels of aggregation,

different types of applications, changes of

(2) Flexibility through Input and Parameter Changes

different

applications made possible through changes only in inputs and parameters,

Table 2

VALIDATION AND VERIFICATION TECHNIQUES

ACTIONS: EXAMINATIONS OR CHANGES

OBSERVED DATA:

1.1 Examinations of the observed, historical, or estimation data

OBSERVATIONS-TO-STRUCTURAL:

2.1 Observed data perturbation effects on structure and parameters
2.2 Propagation of estimation error on structure and parameters.
2.3 Measure of fit of structure and parameters to observed data
2.4 Effect of correlated or irrelevant observed or estimation
data on structure and parameters

2.5 Sensitivity analysis: quality of fit of structure and
parameters to observed data for altered structure and
parameters (includes ridge regression)

OBSERVATION-TO-INPUT:

3.1 Effects of correlated or irrelevant observed data on outputs

INPUT:

4.1 Base case or recommended input data examinations

INPUT-TO-OUTPUT:

5.1

Examine outputs with respect to base case input data 5.2 Simplify, e.g. linearize, this relationship to provide

understanding

5.3 Simplify (e.g. linearize) structural form analytically, or
group parameters to provide better understanding,
elimination of small effects, grouping of equations,
grouping of parallel tasks

5.4 Develop confidence measure on outputs by propagating input
error distributions through structural error distributions

STRUCTURE:

6.1

Structural form and parameter examinations

6.2 Respecification, that is, make more sophisticated some
of the structural components

6.3 Decompose structure physically or graphically

6.4 Provide new model components to check effects of assumed
data or relationships

« PreviousContinue »