NBS Special Publication

preliminary set of application tasks and task parameters for benchmark purposes.

2. Department of Agriculture

Has constructed a comprehensive set of benchmark programs which include such functions as transaction processing and data base management. This package should be studied in any effort geared toward designing a library of standard benchmark programs.

3. Department of Labor

Has under development a job simulation model with actual-use statistics as control parameters. Standard benchmarks are not the goal of this effort-but there is potential spinoff.

4. U.S. Marine Corps

Is involved in a project similar to that of the DoL, except that hardware monitors are used to provide data for creation of synthetic jobs. 5. ADPESO

The Navy's ADP Equipment Selection office has several activites under way. These include: (1) the development of a small (5-7 program) library of synthetic programs in joint support of the DoD Steering Committee and their own in-house effort. The following assumptions underlie this effort: (a) relatively few parameters control the behavior of synthetic programs; (b) behavior of the programs relative to changes in parameters is predictable; (c) a workload can be specified based on parameters implicitly defined by the synthetic programs; and (d) these parameters can be set so as to reflect the workload. Such a set of programs could be used to enhance existing natural benchmarks, and for relatively simple systems such a mix could be the entire benchmark. The programs are cur rently in the testing phase of development.

(2) "Sanitization" of natural benchmarks through: (a) use of correct programs which run on at least their native machines; (b) standard code or identification of non-standard in cases where standard is impossible; (c) translation of routines for conversion of machineindependent modules to machine-dependent form, as needed. Some indication of the merit of this activity is expected by the end of calendar 1973.

(3) Investigation of machine-independent basic procedures for benchmarks: duration, file volumes; code standardization; and allowable configurations.

The library of synthetic programs consists of the following: (1) sequential file processing; (2) indexed sequential file processing; (3) relative I/O processing; (4) COBOL sort; and (5)

computation-a program to exercise arithmetic processing capabilities. The Army's Selection Office is writing Program Edit and Report Extract modules to add to the library. As each program is completed, it will be exercised in order to determine execution under varying parameter settings. This phase is also scheduled for com pletion in calendar 1973.

A final effort is planned to relate program para. meters to installation workload parametersto institute the "acceptance" phase of the effort. By second quarter of 1974, some indication should be apparent as to the usefulness of this synthetic programs approach. (JLW)

Category: 3.0

Key words: Department of Defense; Department of the Navy; Standard Benchmark Study; standard benchmarks; survey; synthetic benchmark program library; synthetic benchmarks; synthetic program modules; transferable benchmarks.

63. Parupudi, Murty and Joseph Winograd, Interactive Task Behavior in a Time-Sharing Environment, in Association for Computing Machinery, Proceedings of the National Conference, 1972, pp. 680-692, 12 refs. (6430182)

Continuous Software Monitoring System (COSMOS) is a measurement tool developed for observing and analyzing user behavior and operating system performance under the UNIVAČ Series 70 Virtual Memory Operating System (VMOS), an interrupt-driven, timeshared, demand paging operating system.

This paper reports empirical data obtained by using COSMOS to observe a large number of interactions in which a wide class of programs were being executed interactively. Distributions of think time, compute time, page fault behavior, and I/O frequency are presented. (Author)

Category: 1.2

Key words: Compute time; computer performance measurement; demand paging; interactive activity; I/O activity; operating system measurement; operat ing systems; page faults; software monitors; software performance measurement; think time; Virtual Memory Operating Systems (VMOS); virtual memory sys

tems.

64. Pearlman, Jack M., Richard Snyder and Richard Caplan, A Communications Environment Emulator, in AFIPS, Proceedings of the Spring Joint Computer Conference, 1969, pp. 505 512 The Honeywell Communications Environment Emulator (HCEE) is a communications network simulator whose prime purpose is to aid in the checkout and debugging of communication software. It will simulate up to 63 lines with up to 8

terminals per line, and can generate at least 70,000 messages of 700 characters each per hour. During execution HCEE generates, transmits, and logs queries; receives and analyzes error codes; and logs system responses. A number of parameters describing the system under test, the terminal, and user characteristics, and the reporting are all parameterized and under control of the operator. It is interesting to note that the query generated by HCEE is chosen from a query vocabulary list which is part of its data base. The actual query itself is a random combination of words from that list. (MDA)

Category: 3.5

Key words: Communications network simulators; Honeywell Communications Environment Emulator; measurement driver; query generator; simulators; software performance analysis; user characteristics; user simulation; workload generator.

65. Robinson, Louis, Computer Systems Performance Evaluation (and Bibliography), IBM Corp., November 1972, 35 pp. (6430157)

An overview of the state of the art, techniques and tools in use for measuring and evaluating computer systems performance. The bibliog raphy consists of 365 citations, each annotated with keywords. (JLW)

Category: 1.5

Key words: Bibliography; computer performance eval

uation.

66. Ruth, Stephen R., Using Business Concepts To Evaluate Large Multi-Level Business SystemsSome Applied Techniques, in Association for Computing Machinery, Proceedings of the SIGME Symposium, February 1973, pp. 73–77. (6430147)

The author proposes dollar costs as a basic element of any device involving system measurement and evaluation. Three examples are provided of the application of marginal analysis for the purpose of selection of the best mix of measurement and evaluation techniques. The examples present "before and after" data for cost tradeoffs in: (1) evaluation of a vendor-supplied linkage-checking routine; (2) evaluation of the use of system resources by a program involving data manipulation primarily; and (3) evaluation of compiler efficiency.

Category: 1.3

Key words: Compiler efficiency; compiler evaluation; compiler optimization; program analysis; software evaluation.

67. Scherr, Allan L., Time Sharing Measurement, Datamation, 12:4 (April 1966) pp. 22-26. (6430184)

The article describes measurements made of the performance of the MAC system during the 3month period from December 1964 through February 1965. The system at that time consisted of an IBM-7094 (I) with two 32K memories, IBM-1301-2 discs, an IBM-7320A drum and an IBM-7750 connected to model 35 teletypes and to IBM-1050 terminals. The community of users consisted of nearly 300 people who were characterized by the computational load they placed on the system.

Users, or the programs serving them, were considered to be in one of six states, each of which is defined. These six states are: dead, command wait, working, input wait, output wait, and dormant.

The basic unit of work in a time-sharing system is considered to be the interaction, or "the following sequence of events: user thinks, types input, waits for a response from the system, and finally watches the response being printed." Thus the user may be in either of two states, the working while he is waiting for the system to execute a program, or command wait while the system is waiting for the user. An interaction, then, can be defined as the activity that occurs between two successive exits from either working or command exit states.

Data were gathered by a program which ran as part of the scheduling algorithm which recorded the sequence and timing of the events comprising typical interactions. Approximately 80,000 commands of five types were monitored: file manipulation, source program input and editing, program execution and debugging, compilation and assembly, and miscellaneous commands such as save and resume core images, programs to generate commands, etc. Data from the measurements are presented in graphs which show "think" time per interaction, program size distribution, processed time per interaction, and interactions per command. Other graphs show a typical response time distribution measured from a simulation of MAC under a constant load of 25 interacting users, and simulation results of response time versus processor time per interaction. These last two parameters were derived from simulation in order to eliminate from the measurements results of a constantly changing load. The author feels that simulation models can be easily derived from these data of accurate performance predictions. These predictions have been confirmed by comparing them with actual performance data from the MAC system.

Category: 1.2

Key words: Computer performance measurement; computer performance prediction; computer systems; IBM-7094; MAC system; man-computer interaction;

multiprogrammed computer systems; system monitoring; think time; user characteristics; user simulation; work unit.

68. Schwemm, Richard E., Experience Gained in the Development and Use of TSS, in AFIPS, Proceedings of the Spring Joint Computer Conference, 1972, pp. 559-569, 10 refs. (6430207) The author classifies the experience gained under four major areas: system structure; system performance analysis; software development tools; and management of software development. The second of these areas is of concern to benchmarking.

In the course of the development of TSS/360, a comprehensive scheme evolved for dealing with system performance. Three components comprised the scheme: establishment of performance objectives; creation of external (to the computer) performance measurement tools; and creation of internal recording tools and data reduction facilities. Since TSS/360 is designed for three modes of operation (batch, conversational, and a mixture of the two), performance was defined for each mode of use. However, only conversational performance is discussed in the paper.

Conversational performance is defined as the maximum number of tasks which the system will support with acceptable response time. Specifically, a benchmark terminal session was defined by dividing the interactions into threeclasses-trivial, non-trivial and data-dependent. Acceptable response times for each of these classes were further defined as follows:

Since the above benchmark terminal session was not typical of any user's conversational workload, most users specified their own benchmarks. However, the consensus is that the above definition of performance is adequate. The initial performance objective was to support 40 tasks with one CPU with 512K memory, 1 drum and 1 disk channel.

In order to measure a load imposed by live users on TSS/360, a measurement driver was created to simulate the live environment under controlled and reproducible conditions. A schematic diagram of the TSS/360 measurement driver is presented. To measure conversational performance, a series of driver runs is executed, with varying numbers of tasks for each run. A curve is then drawn representing response time as a function of the number of tasks. The paper includes such a curve for TSS/360 release 6.0 for 2 different system configurations.

The measurement driver discussed here runs on the IBM-360/40 and has been used by some installations to evaluate system performance on their own benchmarks. A second measurement driver is mentioned, this one developed by Carnegie-Mellon University for the IBM-360/67 with the version of TSS under study in this paper. The Carnegie-Mellon Simulator (SLIN) is compatible in script and timing characteristics with IBM's and produces comparable output.

The paper concludes with a discussion of debug. ging aids and internal recording tools such as the Systems Performance Activity Recorder (SPAR), the Systems Internal Performance Evaluation (SIPE), and the Instruction Trace Monitor (ITM). (JLW)

Category: 5.2

Key words: Benchmark terminal session; computer systems; debugging aids; hardware monitors; IBM360/40; IBM-360/67; man-computer interaction; measurement driver; multiprogrammed computer systems; operating systems; response time; software monitors; TSS/360; user scripts; user simulation; workload generator.

69. Schwetman, H. D. and J. C. Brown, An Experimental Study of Computer System Performance, in Association for Computing Machinery, Proceedings of the National Conference, 1972, pp. 693-703, 17 refs. (6430181)

This paper describes an experimental study of the performance of a large multiprogrammed computer system (the UT-1/CDC 6600 system at the University of Texas at Austin) under systematic variation of available resources and resource allocation algorithms. The experiments were carried out in a controlled and reproducible environment provided by a synthetic job stream generator. The experimental data were recorded by an event-driven software monitor which recorded a complete trace of system activities at the level of system defined events. The study relates resource utilization and queuing patterns to the metric of job completion rate. The experiments undertaken in these studies are single factor experiments. Compensatory reactions by this complex system to variation of individual resources are nonetheless revealed. The experiments also demonstrate the criticality of optimal scheduling of bottleneck resources and offer comparisons of the performance of multi-drive disk units under different conditions of availability and space assignment. The data gathering facility was also run on the production environment to determine base lines for com parison to the experiments as well as for an understanding of the production mode of operation of the system. (Author)

Category: 1.2

Key words: CDC-6600; computer performance analysis; computer performance measurement; computer systems; data gathering; experimentation; job completion ratio, queuing patterns; resource allocation; software monitors; synthetic job stream; synthetic job stream generator; Texas University.

70. Shope, W. L., K. L. Kashmarak, J. W. Inghram and W. F. Decker, System Performance Study, in Proceedings of SHARE XXXIV, Vol. 1, 1970, pp. 568-659. (6430106)

Directors and managers of computing facilities are faced each day with questions such as-how fast is our workload growing? Can current machinery contain future growth and for how long? What do new hardware and software developments mean to this installation?

In order to obtain answers to these questions, the University of Iowa Computer Center began in the spring of 1969 to perform benchmark analyses of both hardware and software. It was apparent at that time and is still apparent today that very little valuable information exists to help guide the installation in making an evaluation. Prior to any analysis, many important questions regarding benchmarking techniques need to be resolved. What kind of data are needed? How can the data be obtained? Should software or hardware measurements be made?

One of the prime requisites for a meaningful benchmark analysis is a job stream representative of a given environment. At UCC, the decision was made to assemble a set of jobs from the real job stream which reflected normal operations. Fifty-two jobs were selected through the use of a sample procedure involving the distribution of jobs run versus time of day to generate random times of day at which samples were extracted. The samples consisted of jobs in execution at these times; selection, in turn was on the basis of first-in-first-out and on job classification (which involved amount of core storage and type of processor required). The effect of this procedure was to weight selection in favor of hours of heavy usage and distribution of jobs by classes similar to the normal job stream. The sampling period lasted three days. For testing purposes, the jobs were organized in order of time of selection to construct a job stream that required approximately 55 minutes of execution time on the University's IBM360/65. In retrospect it was learned that average CPU utilization was approximately 10 percent higher in the test job stream than in the actual job stream. The test job stream was adjusted somewhat after the first set of tests; a comparison of actual versus test job streams (presented in the Appendix) revealed that the latter were representative.

Other material presented in the Appendix presents data on the 16 benchmark users, and details on the jobs in the test job stream. A summary sheet describes changes instituted at the University Computer Center and an estimate of of values returned as a result of the tests. (JLW)

Categories: 3.3; 10

Key words: Benchmark run analysis; computer systems; cost-value; experimentation; hardware monitors; IBM-360/65; Iowa University; job characteristics; job classification; University Computing Center; workload construction.

71. Smith, J. Meredith, A Review and Comparison of Certain Methods of Computer Performance Evaluation, The Computer Bulletin, 12:1 (May 1968) pp. 13-18, 5 refs. (6430190).

The article presents a discussion of various types and relative merits of instruction mixes and benchmarks. For an estimate of the relative power of different computers over a wide range of applications, some means must be found for combining the run results of the benchmarks and mixes. A calculation of such a measure of relative performance is suggested and illustrated. The work unit, used by the British GPO for evaluating performance, is described briefly. (JLW)

Category: 3.0

Key words: Benchmark run analysis; benchmarks; instruction mixes; work unit.

72. Sreenivasan, K. and A. Kleinman, On the Construction of Representative Synthetic Workloads, The Mitre Corp., Bedford, Mass., Rept. No. MTP-143 March 1973, 31 PP., 9 refs. (6430170)

The evaluation of computer systems is usually conducted for the purposes of improving the present performance, predicting the effects of changes in either the existing system or the workload, or comparing different systems. The evaluation may use analytical modeling, simulation, or experiments with the existing system. In all these cases there is a need for a drive workload that imitates the real workload with reasonable fidelity, but in an abbreviated form. This paper describes a method of constructing the drive workload using synthetic programs. The real workload is characterized by the magnitude of the demands placed on the various system resources; for example, the CPU time, number of I/O activities initiated, core used, and the usage of unit record devices. These are obtained from the system accounting data. The representative drive workload is constructed by matching the joint frequency distribution of the selected characteristics. The drive workload is

realized by using a synthetic program that contains many parameters. By adjusting these parameters, any desired combination of workload characteristics can be obtained. Using this procedure a synthetic workload with 78 jobs is constructed to represent a month's workload (for an IBM-370/155) consisting of about 7000 jobs. (Author)

Categories: 2.2; 3.2

Key words: Accounting data; computer systems; IBM-370/155; synthetic workload construction; workload characteristics; workload representation.

73. Sreenivasan, K. and A. J. Kleinman, On the Construction of a Representative Synthetic Workload, Comm. ACM, 17:3 (March 1974) pp. 127-133, 13 refs. (6430278)

A revised and later (July 1973) version of Mitre Rept. No. MTP-143. The method described was applied to the construction of a synthetic workload of 88 jobs, representing a month's workload consisting of about 6000 jobs. (JLW)

Categories: 2.2; 3.2

Key words: Accounting data; computer systems; IBM-370/155; synthetic workload construction; workload characteristics; workload representation.

74. Stanley, W. I., Measurement of System Operational Statistics, IBM Systems Journal, 8:4 (1969) pp. 299-308, 6 refs. (6430100) The paper describes design factors and data gathered by JAS, the job accounting system developed for continuously and automatically monitoring a real-time operating system. Use of JAS provides a variety of statistics at optional levels of detail, at little cost in computing time or time lost in collecting unwanted information. All monitored information is stored in a data base for post-processing which is flexible so that reports tailored to particular needs may be produced.

Operational statistics gathered by JAS are used for performance evaluation in two ways: to optimize the operating system performance and to characterize workloads in simulating system operation. Experience with the real-time environment indicates that few statistics are needed to characterize the workload, given suitably parameterized models calibrated with detailed, measured performance data. These data include types of jobs and job steps (assembly, linkageediting, etc.) and some information about the equipment (main storage capacity, CPU speed, etc.). Use of operational statistics in preference to hypothesizing workloads was found to enhance the accuracy of system simulation. (JLW)

Categories: 1.2; 2.2

Key words: Accounting data; job accounting system (JAS); simulation; statistical analysis; statistical models; system monitoring; workload characteristics. 75. Statland, N., R. Proctor, J. Zelnick, R. Getis and J. Anderson, An Approach to Computer In 1 stallation Performance Effectiveness Evaluation, Auerbach Corp., Philadelphia, Pa., Rept. No. 1243-TR-2, June 1965, 164 pp. (ESD-TR-65276; AD-617 613)

The process provides objective measures of performance efficiency based on both quantitative and qualitative data, and provides standards for measuring installation effectiveness. Specifications and characteristics are collected via questionnaires, once and only once, in four categories computer hardware, extended machine (hardware/software interaction), software evaluation and problem specification. An extension of this measurement of computer system performance provides a rating for the performance of a given software package on a given piece of hardware by comparing the time derived from the hand tailored coding to the timing resulting from the object program produced by the software. This ratio measures the efficiency of the software on the specific hardware configuration. The aggregate ratios for all the individual performance criteria are used to derive a performance standard for a software system. Algorithms are used to summarize the raw data elements and a computer program will select data elements. make simple arithmetic combinations of these elements into composites, and prepare the data for entry into a statistical analysis. Stepwise multiple regression analysis is utilized to deter mine the relative significance of various data elements and to calculate their relative weights. (Author)

Category: 1.3

Key words: Computer performance evaluation; computer performance measurement; program timing: questionnaires; software performance measurement; statistical analysis.

76. Steffes, Sylvester P., How the Air Force Selects Computers, Business Automation, 14:8 (August 1967) pp. 30-35. (6430161)

The article reports on an interview with Col. Steffes in which he describes the system analysis that preceded the formulation of the bid package prepared by the Electronic Data Processing Equipment Office, the Air Force's centralized agency for competitive evaluation and selection of commercial computer systems. The RFP was for the Air Force's Phase II Base Level Data Automation Standardization program, and involved the requisition of about 135 computer

« Previous Continue »

Books

NBS Special Publication, Issues 401-405