NBS Special Publication

41. Joslin, E. O., Cost-Value Technique for Evalua tion of Computer System Proposals, in AFIPS, Proceedings of the Spring Joint Computer Conference, 1964, pp. 367-381, 17 refs. (6430174) The cost-value technique proposed here involves two major changes to existing evaluation techniques which could bring about a more understandable and realistic selection of computer equipment.

The first change is in methodology. The costvalue technique attempts to consider all items of value to a computer system, but to consider them only once and in the environment in which they belong. The categories scored are total system cost and "extras" which are defined as features like expansion potential, vendor support, or similar characteristics which are part of total system cost, but differentiating between vendors. The second change is in scoring technique. The cost-value technique uses dollars rather than weighted points as the basis of comparison. This provides a more natural basis for comparison. It eliminates the need for "tradeoffs" and gives management deeper understanding of the total selection process.

Since the technique is a dynamic one, a number of improvements might be made. The use of debits, as well as of credits, could be made immediately and might make the technique a little more natural. Two other improvements, however, would require more work and understanding. These are quality determination and time dependent cost-value assignments, both of which are discussed briefly. (Modified author)

Category: 1.3

Key words: Cost-value; proposal evaluation; scoring. 42. Joslin, Edward O., Describing Workload for Acquiring ADP Equipment and Software, Computers and Automation, 18:6 (June 1, 1969) pp. 36-40. (6430225)

The author presents a detailed discussion on how to obtain the mix of representative programs to be used for benchmarking purposes. This mix of representative benchmark programs is the first necessary requirement for describing workload. The second requirement is a description of system growth in terms of a series of expected workload levels.

In selecting programs for the representative mix several considerations are important:

(1) the programs should be written in a standard, higher level language;

(2) the mix should be small enough to be processed during a single half-day benchmark demonstration;

(3) the programs are selected not to prove the worst case situation, but rather to test and demonstrate timing and capability for the normal situation.

Should it be necessary to assume and demonstrate capability to handle worst case situations, benchmark programs selected for that purpose should not be included in the representative mix. They should be treated separately as capability benchmarks.

A method for deriving a representative set of programs is discussed and illustrated in detail. Calculation of extension factors and growth projections is also discussed and illustrated. (JLW)

Category: 2.2

Key words: Application benchmarks; growth projections; job mix; system life projections; task mix; workload description.

43. Joslin, Edward O., Techniques of Selecting EDP Equipment, Data Management, 8:2 (February 1970) pp. 28-30. (6430143)

A brief discussion of the importance of using proven procedures in four areas of EDP equipment selection: preparation of specifications for competitive selection; workload representation; evaluation; and costing.

Workload description is part of the specifications, and important enough to merit its own place of importance in the procurement process. Description of the benchmark should consist of benchmark programs to be run on the bid system(s) for a determination of total time. For the benchmarks to be truly representative of the workload, it is essential that the programs selected be representative of the types of tasks to be processed, the time requirements, equipment and storage used, the language used, all in the required order or sequence. The extension factor must be determined; this can be described as the "total monthly time to perform the task set divided by the throughput time to run the representative program.'

Other considerations relevant to selection of representative programs include:

(1) They must represent current and past workload data.

(2) The growth factor must consider current size and planned growth of the facility. (3) They must be selected with an eye to future fiscal policies.

(4) They must reflect current and future manpower.

(5) They should cover some fixed system

life period, normally four to six years. This involves a large expenditure of labor which, however, pays large dividends in verification of

vendors' timing claims. A "single click-click with a stop watch" signals the end of a run and obviates any discussion, for: "It's put up or shut-up time, and everyone knows it and there is little point in arguing with the stop watch." (JLW)

Category: 2.2

Key words: Application benchmarks; computer selection procedures; system life projections; workload description; workload representation.

44. Joslin, Edward O. and John J. Aiken, The Validity of Basing Computer Selection on Benchmark Results, Computers and Automation 15:1 (January 1966) pp. 22-23. (6430196)

This elementary discussion of the validity of basing computer selections on the results of benchmark runs presents a "commonly accepted definition of a benchmark [as] a routine used to determine the speed performance of a com. puter system." If the routine used for the benchmark is truly representative of a workload, then the results of the benchmark run provide an excellent basis for selection.

An Air Force procurement in which actual problems were used as benchmarks, and some of the findings uncovered during the procurement are discussed. Compilation and execution times for four benchmark problems on four computer systems are tabulated. The relative ranking of each system varies from benchmark to benchmark, and this finding is used to make "clear that the measuring device (benchmark) used to determine the capability of a system does make a difference."

Examples from the tabular data are then used to illustrate the critical necessity of selecting benchmark problems to properly reflect the total workload which is to be processed. (JLW)

Category: 9

Key words: Application benchmarks; workload rep

resentation.

45. Karush, Arnold D., Benchmark Analysis of Time-Sharing Systems, System Development Corp., Santa Monica, Calif., Rept. No. SDCSP-3347, June 1969, 40 pp. (AD-689 781) The paper discusses the use of benchmarks to measure and understand the behavior of a general purpose time-shared system. This application of benchmarks is unusual in that: (1) there is no published literature on the application of benchmarks to time-shared systems; and (2) the benchmarks measure system functions rather than job tasks. After discussing the benchmark design concept, the paper describes the benchmark programs that were produced to measure

System Development Corporation's ADEPT time-sharing system. Three types of calibrations were performed on ADEPT, together with numerous experiments. These are described together with some of the results. The paper concludes with a discussion of the problems and potential uses of the benchmark technique. (JLW)

Category: 3.2

Key words: ADEPT-50; Benchmarks; computer performance measurement; computer systems; experi mentation; multiprogrammed computer systems. 46. Karush, Arnold D., Evaluating Time-sharing Systems Using the Benchmark Method, Data Processing Magazine, 12:5 (May 1970) pp 4244, 2 refs. (6430104)

A summary article which discusses the benchmark programs which were used to measure the behavior of System Development Corporation's ADEPT-50 time-sharing system.

The techniques used focused on measuring the effects of functions basic to system operation, rather than on measuring system performance of predefined tasks. The functional variables included are: swap activity; compute activity; interactive activity; I/O activity; page activity; and resource allocation. Throughput (amount of work processed by the computer) and response time (delay between a user's request and system reply) served as metrics of behavior for these functional variables. Each of the seven benchmark programs developed provided one or more of the stimuli; thus all programs affected all of the functional variables, both individually and when run in reasonable combinations.

System performance was measured by the benchmarks in three different environments: (1) stand-alone; (2) benchmark; and (3) realworld. In the stand-alone environment only one program was run at a given time, thus testing system throughput and response time in a batch mode. The benchmark environment provided a measure of a "typical" set of demands upon the system. This was achieved by considering each benchmark program as a user and by running all seven programs simultaneously. Measures obtained from this experiment roughly paralleled those from the real-world environment. This environment simulated the behavior of a timesharing system operating near its rated capacity. By considering one of the benchmark programs as a user with a constant and known demand for service, and running the program when the system has an almost full complement of real users, various metrics can be developed. Some of these include a measure for degradation in response time and throughput under varying user loads; load variances; and what scheduling

algorithms actually schedule. The technique may also be used to tune a system to the needs of its user population. (JLW)

Category: 3.2

Key words: ADEPT-50; benchmarks; compute activity; computer performance measurement; computer systems; interactive activity; I/O activity; multiprogrammed computer systems; page activity; resource allocation; response time; swap activity; throughput.

47. Karush, Arnold D., Two Approaches for Meas

uring the Performances of Time-Sharing Systems, Software Age, 4:3 (March 1970) pp. 1013, 18 refs; Part Two. Stimulus Approach to Time-Sharing Measurement, ibid., 4:4 (April 1970) pp. 26, 27, 40, 8 refs; Conclusion. A Comparison of Analytic and Stimulus Approach to Time-Sharing System Measurement, ibid., 4:5 (May 1970) pp. 13, 14, 3 refs. (6430103)

The two approaches for performance measurement which the author describes are: (1) the "stimulus" approach in which the system is considered as a black box to which a controlled set of stimuli is applied in order to activate the system's functions and then observing the results; and (2) the "analytic" approach in which probes are inserted into the black box in order to record any level of the system's behavior. Both approaches have been used to measure SDC's ADEPT time-sharing system. The author places benchmarks into the first category which is less costly to develop and generally requires less sophistication in the implementor. "The programming of benchmark programs is also less costly than the programming of instrumentation, measurement and recording routines. . . . Personnel with little experience can produce the benchmark programs. Testing can be done under time-sharing. Errors affect no one else."

In benchmarking for the ADEPT system, 6 functional variables were selected; compute, interactive, high speed I/O, swapping, paging, and resource allocation. Seven benchmark programs, each incorporating stimuli of selected functional variables were written and run simultaneously in a time-sharing mode, thus simulating a "typical user population." The technique was also used to measure the effects of variable sizes of quantum-time upon different user demands as represented by the benchmarks.

In discussing areas for further development, the author suggests the following relevant to benchmarks:

Stimulus Approach:

"(1) The conditions under which this ap

proach is cost-effective must be defined. Although the analytic approach provides much more information, benchmark programs can still fill an important role due to their lower cost and the immediate utility of the information.

(2) Standardized measures for describing and ranking the performance of time-sharing systems should be developed. If these measures could be expressed in terms of throughput and response time, perhaps standardized benchmark programs could be specified for inter-system comparison.

(3)

(4) The design of the benchmark programs should be refined so that a minimum system load and terminal time need be required to extract a maximum amount of information." (JLW)

Categories: 3.0; 1.2

Key words: ADEPT-50; benchmarks; compute activity; computer performance measurement; computer systems: experimentation; interactive activity; I/O activity; multiprogrammed computer systems; page activity; resource allocation; response time; swap activity; throughput; user simulation.

48. Kerner, H. and K. Kuemmerle, A Workload Model and Measures for Computer Performance Evaluation, George C. Marshall Space Flight Center, Alabama, NASA TN D-6873, October 1972, 26 pp., 8 refs. (6430151)

A generalized workload definition is presented which constructs measurable workloads of unit size from workload elements, called elementary processes. An elementary process makes almost exclusive use of one of the processors, CPU, I/O controller, etc., and is measured by the cost of its execution. Various kinds of user programs can be simulated by quantitative composition of elementary processes into a type. The character of the type is defined by the weights of its elementary processes and its structure by the amount and sequence of transitions between its elementary processes. A set of types is batched to a mix. Mixes of identical cost are considered as equivalent amounts of workload. These formalized descriptions of workloads allow investigators to compare the results of different studies quantitatively. Since workloads of different composition are assigned a unit of cost, these descriptions enable determination of cost effectiveness of different workloads on a machine. Subsequently performance parameters such as throughput rate, gain factor, internal and external delay factors are defined and used to demonstrate the effects of various workload attributes on the performance of a selected large scale computer system. (IBM)

Category: 2.2

Key words: Cost effectiveness; computer performance measurement; workload definition; workload specifi

cation.

49. Kernighan, B. W. and P. A. Hamilton, Syntheti. cally Generated Performance Test Loads, in Association for Computing Machinery, Proceedings of the SIGME Symposium, February 1973, pp. 121-126, 6 refs. (6430148)

The paper describes the design of and experience with an automated benchmark-generation facility that involves two components. The first component is a simple, highly parameterized synthetic job or "program which uses precisely specified amounts of computing resources, but which does no 'useful' work." Any set of resource utilization parameters for the synthetic jobs specifies an executable program of known characteristics which can be used to model the behavior of another program. The second component of the facility is a generator program that converts a job stream specification into a complete, ready-to-run set of synthetic jobs that can then be used to exercise the system in a controlled and reproducible manner.

Many advantages are offered by this approach: (1) there is no need to find real jobs with the right properties; (2) creation of large test data bases is not necessary; (3) tests can be easily scaled up to test large systems; (4) incremental changes are easily made, as necessary; (5) transferability is simple, since only two programs are involved; (6) test generation and execution is entirely self-contained; and (7) all the benefits accruing from an automated procedure not the least of which is the relative freedom from human error.

The environment for the experiments reported is a dual-processor Honeywell 6070 GECOS III with 256K of 36-bit words of storage, 15 million words of fixed-head storage, 75 million words of moving-head disk storage, and two DN-355 communications processors interfacing with "about half a dozen remote computers." The batch computing aspect, however, is most im portant in resource utilization, therefore a batch environment is implicit in the paper. However, the approach described is also valid for timesharing and for mixes of time-sharing and batch: the system runs a large time-sharing system as a "permanent" batch job.

The simplest model of a synthetic job for such an environment has only 3 resource utilization parameters: core storage; CPU time; and I/O time. The authors have expanded on these basic parameters with sufficient detail to produce synthetic job streams that drive their system "essentially identically to specified real streams (agreement on most parameters within 10%)."

These refinements, as well as others that could be added, involve a trivial increment to the basic structure of the synthetic programs and require only the addition of extra control cards. Tabular data are presented on four experiments conducted: (1) matching a standard benchmark against a synthetic batch stream; (2) matching compiler job steps with the synthetic job steps; (3) comparison of two synthetic streams in a full-load test of the overall system; and (4) simulation of a "normal" user load.

In the experimentation there was much reliance on the detailed accounting information kept by the system for each step and on the metering information kept by the operating system.

A number of experiments are planned for the future: (1) measurement of major changes in system hardware; (2) fine tuning of system and measurement of effects of such changes as variations in scheduling and dispatching modules, memory management, location of system files, etc.; and (3) error-checking and catching of performance bugs. A profitable by-product of the experiments run to date was the detection of several system bugs and solution of a hardware problem as a result of running the benchmark job stream.

In conclusion, the authors state that:

(1) synthetic jobs are easily constructed to match most real jobs; (2) the basic measures of memory, CPU, and I/O appear to be suffi cient to represent real jobs; (3) system performance measures are relatively insensitive to the internal structure of the synthetic jobs comprising a test; (4) job streams representing special demands require relatively little refinement; (5) CPU and I/O time measurements are appropriate for matching job streams, since these are parameters of the jobs being matched; and (6) for experiments in transferability, synthetic jobs creating CPU and I/O transactions are to be preferred, since these measures will be independent of system hardware and software. (JLW)

Category: 3.2

Keywords: Computer systems: experimentation; Honeywell-6070; synthetic benchmarks; synthetic job parameters; synthetic jobs; synthetic job stream; synthetic job stream generator; transferability.

50. Kolence. Kenneth W., Experiments & Measurements in Computing, in Association for Computing Machinery, Proceedings of the SIGME Symposium, February 1973, pp. 69-72, 2 refs. (6430165)

The paper is intended as an initial step toward establishing an experimental discipline within

the field of software physics which "is concerned with understanding the laws governing the behavior of software units." Software units consist of arbitrarily large (or small) groupings of code whose variable observable properties constitute their behavior in contrast to their functional properties. In this context, several classes of computer performance measurement activity are reviewed, to provide perspective in the development of an experimental discipline for testing behavior of software.

Building of an experimental discipline requires some general agreement on how one proves or disproves a theoretical hypothesis. One is proposed from the physical sciences: Experiments performed to verify some hypothesis concerning the real-world behavior of software units can only be accepted as valid if direct measurement is used to obtain all data.

From this it is obvious that an experimental discipline must be based on measurement and not on other techniques such as simulation. Simulation offers prediction which must then be verified or countered by measurement.

The second step in establishing an experimental discipline is also suggested from the physical sciences and that is that results of important experiments be published easily. The author proposes that SIGME encourage the publication of measurement experiments and data in the computer field, for general constructive criticism and comparison against theoretical predictions. In the absence of theory applicable to problems of interest, the author suggests encouragement of membership to suggest procedures for developing empirical curves of interest. (JLW)

Category: 1.3

Key words: Measurement experiments; program behavior; software performance measurement; software physics; software testing; software units.

51. Kolence, Kenneth W., The Software Empiricist, Performance Evaluation Review, 2:2 (June 1973) pp. 31-36. (6430193)

The article is an announcement as well as the first occurrence of "The Software Empiricist," a new feature of Performance Evaluation Review. Software engineering has as its goal "the design of systems to perform as we wish them to." This requires an understanding of the meanings of the variables that are measured, or "the rationale of what has been called software physics."

The new feature will be devoted to the empirical development of such an understanding by providing a publication vehicle for empirical and experimental data and thus developing a solid body of knowledge.

Correspondence is invited initially on "characterization of what types of empirical data are of value, what is meant by the term experimental data, and what constitutes an experiment in software physics." Empiricism, "the search for quantitative expressions of the behavior of some object of study prior to the existence of a formal predictive theory," in reference to software cannot be a "random search for any quantitative expression of some system's behavior. We have that situation today. Software empiricism is the search for invariant relationships between the variables of computer measurement."

...

The author characterizes the people involved in performance improvement as "all empiricists in one way or another, [and] software empiricism can and does proceed without formal theory to guide it." This must necessarily be so until good evidence is obtained of the existence of invariant relationships between measurable variables.

The author invites submission for publication of evidence of such invariance, and conversely, evidence of relationships clearly not invariant. Comment is also invited on the form of data presentation for the new Section. The author takes the initiative by suggesting circular graphs, recommending that they be called "Kiviat plots" or "Kiviat graphs," and inviting experimentation with two samples appended to the article. One graph form is for the presentation of job step time usage, the other for workload characterization.

Admitting that chances of any response are quite low, the author ends with a promise to "dig up examples" for the next few issues of the Review. (JLW)

Categories: 11; 1.3

Key words: Measurement experiments; software engi neering; Software Empiricist; software performance measurement; software physics.

52. Kolence, Kenneth W., A Software View of Meas. urement Tools, Datamation, 17:1 (January 1, 1971) pp. 32-38. (6430176)

In discussing the characteristics and role of software performance measurement tools, the author states that a usable set of measurements requires a combination of descriptive and quantitative variables which must be extracted from memory without altering significantly the run characteristics of the system. Three parameters are involved: CPU cycle requirements, I/O activity, and core usage, all of which must be measured for each program module or segment. Sampling techniques can be used to obtain data with an acceptably low CPU overhead rate. Separation of the function of data execution

« Previous Continue »

Books

NBS Special Publication, Issues 401-405