Page images
PDF
EPUB

A NEW APPROACH TO ANALYZE INFORMATION CONTAINED IN A MODEL

Harvey J. Greenberg

Office of Analysis Oversight and Access

Energy Information Administration

INTRODUCTION

The purpose of this paper is to summarize the development of a new approach to address the general question: What information is contained in a model? For example, the equation, E=mc2, is a model that relates to two variables, energy (E) and mass (m), with a numerical constant (c2).

The Energy Information Administration (EIA) is required to provide not only numerical data, but relations among data; not only historical measurements, but forecast estimates; not only basic projections, but impacts of proposed policies. Since energy information is complex, analysis is imperfect, and decisions are difficult, instructive use of energy information depends upon the accuracy, reliability, and credibility by which the information is recorded and interpreted.

Furthermore, since the scope of energy analysis affects every person, industry, and environment, it is vital to apply engineering and economic skills not only artfully, but scientifically. The new approach proposes to account for relational and numerical information with a unified structure to record and analyze the information contained in a model. Questions of information contents may pertain not only to the explicit data that was recorded, but to implied relations. For example, suppose a model relates three processes: production, transportation, and consumption Their amounts may be related, for example, to associated prices at points of supply and demand. Figure 1 illustrates such a structure, where the constants, 1 and -1, and the parameters, C1, C2, C3, U1, U2, and U3, comprise the numerical data.1/

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small]

17 One may think of the Physical Flows Model as a linear program. The Supply and Demand rows then represent "material balances," and the columns represent three activities. The Cost and Capacity rows contain objective and bound values, respectively.

The goal of the new approach is to be able to answer questions pertaining to a model's implicit, as well as explicit, information contents, for three forms of analysis: validation, verification, and assessment.

A validation exercise may be concerned with comparing the accuracy of the model's information contents with evidence obtained from other sources, such as judgments from experts or indications provided by historical trends. Verification, on the other hand, deals with whether the model's information contents agree with the documentation. Assessment may be relative to other models that are designed to represent the same numerical information but with different relations. All three forms of analysis--validation, verification, and assessment--require answers to questions pertaining not only to the explicit information, but to the model's implied relations--that is, the implicit information contents. The new approach, which is described in a series of technical memoranda (see references), proposes a unified structure in two dimensions: the modeling framework and the form of analysis. To indicate the extent of the unification, the next section outlines the scope of the proposed approach. Then, an overview of the constructs that comprise the new approach is presented. Focus is on three related questions: How are relations defined?; How are they determined?; and, How are they measured?

The conceptual approach, however, is only one of the prerequisites for success. A second issue is whether the proposal possesses sufficient rigor that it can be automated--that is, "Is it feasible to implement the approach?" We are especially interested in large, complex models, where the information is not readily apparent.

The concluding section summarizes the proposed approach
and its implementation. The central conclusion is that a
variety of modeling frameworks, including most used by
EIA, can be unified into a new form that organizes the
information into a useful structure. By applying cur-
rent computer technology, a system capable of answering
questions, retrieving information to validate, verify, and
assess a model during its development, application, and
audit is feasible to implement.

SCOPE

The new approach has two dimensions:

the modeling framework and the form of information analysis. Currently, there is no taxonomy for model structures; nevertheless, different modeling techniques generally use different accounting principles. A linear programming model, for example, is oriented towards deterministic representation of "activities" which must satisfy "constraints" as they comprise a "process." An econometric model, however, is oriented towards "exogenous" or "explanatory" variables to statistically estimate "endogenous" or "dependent" variables. " variables. The proposal unifies these two apparently opposite forms of representation into one accounting structure: a "matricial form."

The anatomy of a matricial form is comprised of constructs that embody both relational and numerical information. First, there is a set of variables that are divided into two parts: rows and columns. Generally, a matricial form has a specified number of variables (n), of which a specified number (m) must be rows. We refer to their difference (n-m) as the "degrees of freedom." Each of the possible assignments of variables to be in the row, vs. column, set constitutes a configuration of the matricial form. The reason for considering different configurations is to examine implied relations.

The first division represents relations between the two sets of variables: rows and columns. For example, in a linear programming model, the original matrix configuration uses columns to represent activities and rows to represent either constraints, objectives, or accumulations for report-writing. A basis, such as at optimality, corresponds to one of the configurations. By contrast, an econometric model represents the explanatory variables as columns and the dependent variables as rows. An alternative configuration describes implied relations, for example, between two dependent variables.

The second division pertains to the meaning of elemental values--that is, the location, sign, and magnitude of a value at a row-column intersection. (In some cases, only the locations of nonzeroes are known; in other cases, only their locations and signs are known.) The location of an elemental value generally relates the associated row and column variables. However, some rows represent

variable-specific (or unitary) information--for example, a bound on the capacity of an activity or a range of fixed values for an explanatory variable. Furthermore, some columns contain information associated with the row variables--for example, nonzero entries in a system of equations. In general, the information represented by an elemental value may be unitary, or it may represent an interaction between two variables. The second division, therefore, defines two parts: body and rim. The body "embodies" relational information between row and column variables, and the rim contains unitary elemental values. The body of a matricial form is a matrix. A question addressed by the proposed approach is: If the body contains only the locations and signs of elemental values, but not numerical data, is it still possible to determine implied relations? This question belongs to a class of problems, called "qualitative determinancy," which was posed by Paul Samuelson [7], and is analyzed with the new approach in reference [1].

A second question of interest is illustrated as follows. Suppose a model represents regional production, conversion, and consumption of petroleum products, as well as interregional transportation by pipeline or tanker. The model may have thousands of variables and may use many different databases, thus making its information contents impossible to comprehend, and perhaps prone to error, without some automated aid. Validation and verification exercises must examine the flow relations. For example, a query may be: Does the model account for flow of gasoline from Texas to New York? A question of causality is: Would an increases in Texas' refining capacity affect New York's gasoline supply? These two questions illustrate the need to organize the modeling framework into a structure capable of answering queries about the model's relations.

In summary, one measure of scope is the extent to which the model's information can be revealed for direct reporting. A second dimension is the extent to which a "diagnostic aid" can be developed for certain applications. For example, a model may produce a fallacious result because it contains a faulty element, such as incorrectly entered data. The analyst must trace the result to its cause, often under severe time pressure. The proposed approach, once implemented, offers aid to the diagnostic analysis by automating the determination of causality.

UNIFYING PRINCIPLES

The purpose of this section is to summarize some of the developments obtained thus far.

First, a cardinal measure of economic correlation has been proposed and studied [1]. It is defined, relative to the choice of row (vs. column) variables, to be the inner product of the associated column vectors. The sign of this correlation determines an ordinal relation: two column variables are complements, substitutes, or independent, according to whether their economic correlation is negative, positive, or zero, respectively. Several classes of models were examined to test how well this measure captures intended relationships.

For example, Figure 1 illustrates a matricial form that represents physical flows from supply to demand. The columns are comprised of three classes: production, transportation, and consumption. Since the production and transportation columns intersect a common supply row with opposite signs, their economic correlation is negative, so they are complements. This means an increase in a region's production must be accompanied by an increase in outbound transportation. Furthermore, production and consumption columns are independent, relative to the choice of row variables shown, because they do not intersect a common row; however, there is another choice of row variables shown in Figure 2, where Production and Consumption appear as complements. This raises two related questions:

1. Can the proposed measure of economic correlation
be extended to measure relationships for many
configurations without explicity computing
each one?

2. Are there interesting classes of models for which
the sign of the economic correlation does not
change over all configurations?

The answer to both questions is yes, and reference [1]
develops the associated concepts and specific results.
Another form of implied information pertains to a problem
posed by Koopmans and Bausch [6, p. 138]. The problem is
to determine an "embedded hierarchy" that traces causation.
For example, Figure 3 illustrates hierarchies of the three
column variables for two configurations of the physical flows
model. Extensions and solutions of this form of inferential
analysis is given in reference [2]; algorithms to solve the
associated search problem are described in [3,4].

« PreviousContinue »