NBS Special Publication

The problem is that we usually show our management how different curve D1 and D2 are from Figure 3, when in fact the real difference between D1 and D2 is shown in figure 8. If we were to base a decision to upgrade on Figure 3 we could really make a mistake. Instead we should consider mean response time and variance. Curve D1 in Figure 8 is now a very fat curve or interval. This interval comes about because we want to be sure that the mean at Y terminals is in fact within the interval a certain percent of the time. For example if we want to guarantee 95% of the time that

the mean response time at 40 terminals is a certain amount we could create an interval in which this mean would lie 95% of the time. The more you want to guarantee a mean response time the wider the interval. Therefore in order to guarantee a difference between the curves in Figure 8 we must use both the mean and variance at each possible point on the X-axis. We are now faced with comparing intervals. Now the problem of determining when to upgrade and what to upgrade is more complex. We could compare alternative curves via regression analysis and if given the cost of various alternatives we could create econometric models to determine the most cost effective upgrade. In addition, if a 95% guarantee is not needed then one might be willing to risk an 80% or 51% guarantee in which case risk analysis or statistical decision theory becomes useful. Keep in mind we have defined response time in the simplest way possible but it really doesn't represent the real world because a single synthetic job usually cannot recreate the load. What then... multiple synthetic jobs?

If we have more than one synthetic job or multiple synthetic jobs how is response time defined? If we are to recreate the real load on the system then these synthetic jobs must have different response times. For example one program may accept data, compute and then respond where as another may accept a command and immediately respond. This source of variability between different jobs is usually so large that response time has no real meaning. One could create a graph like Figure 8 for each of the synthetic jobs. Now the response at twenty terminals would mean that when twenty terminals were connected to the system running the mix of synthetic jobs we obtained a mean response time of x for synthetic job number one with a variance of Y. We are now faced with the problem of determining with not one synthetic job but many synthetic jobs when to upgrade and what to upgrade. One might find for synthetic job one that a new disk is required after 40 concurrent users whereas synthetic job two indicates more memory at 35 concurrent users. Well we could get around this problem by some cost trade-offs but we immediately land into another pit. We assumed that the multiple synthetic job recreated the load, today. What about tomorrow? Will our decisions be good if synthetic job one changes in its compute requirements? How good?

All these problems because we used response time as our measurement criteria in time-sharing. We can't use processor utilization as the measurement criteria so what now? Most of us will ignore variances and return to the simple definition of response time with one synthetic job and results like Figure 3. If this is the case then the technique outlined for a time-sharing growth plan are quite adequate. If we wish to use variances and multiple synthetic jobs then we must first decide how to use the data. Then we can proceed to make the growth plans more rigorous.

The complexity of many present day computing systems has posed a special challenge to existing performance measurement techniques. With hardware architectures allowing for several independent processors, and with operating systems designed to support several classes of service in a multiprogramming environment, the problem of measuring the performance of such systems becomes increasingly difficult. The architecture of the CDC 6700 poses such a challenge.

With two CPU's and twenty peripheral processors (independent, simultaneously-executing processors), monitoring the CDC 6700 becomes an exceptionally difficult task. With the operating system supporting four modes of service - batch (local and remote), graphics, time-sharing, and real-time all in a multi-programming environment, the monitoring task becomes even more complex.

This paper presents a case study of an on-going effort to monitor the CDC 6700. The goals, approach, and future plans of this monitoring effort are outlined, in addition to the benefits already accrued as a result of this study. Several software monitors used in the study are discussed, together with some proposed hardware monitoring configurations.

The performance measurement study described here has proved to be an extremely worthwhile venture, not only in terms of its direct impact on improved system performance, but also in terms of "spin-off" benefits to other areas (benchmark construction, measurement of operator impact, feedback to on-site analysts and customer engineers).

I. Background

As a large R&D center with requirements for batch, interactive, and real-time computing, the computational needs of the Naval Weapons Laboratory (NWL) are particularly demanding. A CDC 6700 computer with a modified SCOPE 3.3 operating system is currently in use to support these needs. In order to fully appreciate the complexity of monitoring such a system, a brief description of its hardware and software architecture is in order.

The CDC 6700 consists of two CPU's (CPU-A approximately three times faster than CPU-B), and twenty peripheral processors (PP's). The peripheral processors are virtual machines with their own CPU and memory, operating independently of each other and the two CPU's. The PP's may

access both central memory and their own 4K of core. Central memory consists of 131,000 60-bit words. Twenty-four 841 disk drives, three 844 disk units, and two 6638 disks account for over 970 million characters of permanent file space and over 360 million characters of temporary scratch space.

real-time applications.

Up to fifteen jobs may be active at one time. Each active job is said to reside at a "control point" and may be in one of five stages of execution (executing with one of the CPU's, waiting for a CPU, waiting for some PP activity to complete, waiting for an operator action, or rolled out). Both CPU's may never be assigned to the same control point at the same time (i.e., the operating system does not support parallelprocessing of a single job).

The system monitor, MTR, resides in one of the PP's and oversees the total operation of the system (scheduling the CPU's, scheduling the other PP's, honoring CPU and PP requests, advancing the clock). As there are no hardware interrupts in the system, all requests for system resources are done through MTR. Of the remaining nineteen PP's, some have fixed tasks assigned to them (e.g., one is dedicated to driving the operator's display), while the others are available for performing a wide range of tasks (input-output, control-card processing, job initiation). A PP program may reside either on the system device or in central memory. When an available PP is assigned a task by MTR, the relevant PP program is loaded into the PP memory and execution begins. Upon completion, the PP is again available for a system task.

Clearly, the complexity of both the hardware and software architectures of the 6700 poses a tremendous challenge to existing performance

measurement techniques. In light of the "multi" aspects of the system (multi-programming, multiprocessing, multi-mode), not only is the acquisition of monitoring data difficult, but its interpretation is even more so.

In spite of these seemingly overwhelming difficulties, such a monitoring effort was undertaken at NWL in the fall of 1972. The goals of this effort were to:

1) determine any existing bottlenecks in
the system;

2) provide a day-to-day "thermometer" with
which any abnormal aberrations could be
detected;

3) aid in the planning of equipment con-
figurations;

4) aid in determining the need for and in the
selection of new equipment.

What follows is a description of: (1) the sequence of events leading up to the monitoring effort; (2) the monitoring effort itself; and (3) the results and future plans of the effort.

II. The Pre-monitoring Analysis

Prior to the monitoring effort, serious consideration was given to: (1) available monitoring tools; and (2) the system activities to be monitored. Several software monitors were considered. Due to its flexibility, scope, and availability, the software monitor 1SA written at Indiana University was chosen as the primary software monitoring tool. 1SA resides in a dedicated PP and samples various system tables and flags. Due to some basic design differences between the hardware and operating system at Indiana University with that at NWL, approximately three man-months of effort was required to implement 1SA and make its data collection and analysis as automatic as possible.

A second software monitor was written to extract information from the system day file (a detailed, running log of major system events). And finally, a third software monitor which records individual user CPU activity was acquired. Termed SPY, this routine resides in a PP for the duration of a user's job, while continually "spying" on his control point.

Choosing the type of system activities to monitor was one of the more difficult parts of the pre-monitoring analysis. Without knowing a priori what the bottlenecks in the system were (if, indeed, any existed at all!), the choice of activities to monitor was based primarily on an intuitive feeling (on the part of several systems personnel) as to the major points of contention in the system, backed up by some rough empirical data. Some of the activities to be monitored were determined in part by the particular monitors chosen. Other activities were found to be best monitored by hardware monitoring.

finally, the number of absolute, CPU program (e.g., the FORTRAN compiler) loads. Due to limitations in the size of PP memory, several items recorded in the original 1SA had to be deleted in order to facilitate the above changes. In addition, the output medium of 1SA data was changed from punched cards to mass storage files.

Because several critical items in the system could not be software monitored (e.g., memory conflicts), a feasibility study was undertaken to employ a hardware monitor to capture some of the data. Relevant probe points were acquired, with details of the following hardware monitoring experiments completed by 1 April 1973:

1) CPU state;

2) CPU idle vs. I-0 activity;

3) number of seeks, number of transfers, average seek time, average length of transfers for all mass storage devices; 4) CPU and PP wait time due to memory contention.

III. The Monitoring Effort

Full scale use of 1SA began in May, 1973. The following describes the data collection and analysis procedures of 1SA as they were and still are being used.

1SA is automatically loaded into a PP each time the system is brought up whether the first time each day or after a system malfunction. The system is normally up 24 hours a day, six days a week with two hours a day allocated for software maintenance and two hours for hardware maintenance. Input parameters to 1SA allow for n batches of data to be dumped, where each batch represents m minutes worth of monitoring data. After some experimentation, n and m were set to 24 and 60 respectively thereby minimizing the number of dumps, but yet still maintaining a reasonable sampling period.

A GRAB program is run once a day to collect the dumped batches of data and consolidate them into one large file. An ANALIZE program then reads this file and produces a listing of meaningful statistics. Some system activities recorded by 1SA as a function of time are:

1) percent of time interactive users were
editing text vs. executing a program;
2) average and maximum central memory used by
interactive users;

3) central memory utilization;

4) CPU utilization (for both CPU-A and CPU-B);
5) control point activity (e.g., percent of
time n control points were waiting for
a CPU);

6) percent of free space on each of the mass
storage devices;

7) average and maximum number of 1-0 requests
outstanding for each mass storage device;
8) PP activity;

9) number of PP program loads;

10) number of absolute CPU program loads.

An additional option was subsequently added to ANALIZE to summarize all 1SA data collected within a given hourly time frame between any two given dates.

The dayfile analysis program was and still is being run daily. Data collected by the dayfile analyzer includes: turnaround time and load statistics for the various batch classes, frequency of tape read and write errors, abnormal system and user errors, frequency of recoverable and unrecoverable mass storage errors.

The SPY program was made available to the general user population in December, 1972. Although its use was originally limited to a few users, others immediately began to see its worth. Requiring only two control cards to execute and produce a CPU distribution map, SPY became extremely easy for the average programmer to use.

After some initial problems with mating probe tips to probe points were overcome, several system activities (CPU state, memory contention) were successfully hardware monitored and their respective probe points verified. This verification procedure required two weekends of dedicated machine time. As a result of this feasibility study, an effort was immediately undertaken to acquire a dedicated, on-site hardware monitor. At the time of this writing, attempts to acquire such a monitor are still proceeding.

The number and type of benefits accrued as a direct result of the monitoring effort unquestionably proved its worth. Some of these benefits were expected, while others were rather pleasant surprises.

The dayfile analysis program proved to be an especially worthwhile tool for detecting abnormal hardware and software errors. Upon noticing such abnormalities, the personnel monitoring the data would immediately notify the on-site analysts or CE's, who then undertook corrective actions. Each week the daily turnaround time and load statistics would be summarized and compared with data from previous weeks. A relative increase in turnaround time and decrease in number of jobs run per day was usually directly proportional to the number of system interruptions. These numbers were thus good indicators of the relative "health" of the system.

The SPY program did provide some dramatic results. One particular CPU-bound program, for example, was found to be spending over 25% of its execution time in the SIN and COS routines.

The hardware monitoring study demonstrated the feasibility of an expanded hardware monitoring effort. In addition, it emphasized the need for the cooperation of local CE's in helping to develop and verify probe points, and for an extensive pre-monitoring analysis period.

The most fruitful results of the monitoring effort undoubtedly came from the 1SA software monitor. At least seven major benefits were directly attributable to 1SA data collection:

1. After the first two weeks of running 1SA, it became apparent that a number of PP programs were continually being read from disk and loaded into a PP (some on the

order of 8 times a second). These most frequently loaded PP programs were immediately made central memory resident, thereby significantly reducing traffic on the channel to the system device, in addition to decreasing the PP wait time.

2. With the daily runs of 1SA indicating the amount of free permanent file space, a means existed for assessing the permanent file space situation. When space appeared to be running out, appropriate measures were taken before space became so critical that all operations ceased (as was sometimes the case in the past).

3. Two situations arose which dramatically showed the merit of 1SA as a feedback mechanism to on-site analysts and CE's. Immediately after the introduction of a new mass storage driver, 1SA data indicated that a certain PP overlay was being loaded on the average of 74 times a second almost as frequently as the other 400 PP routines combined! The analyst who implemented the driver was easily persuaded to incorporate the overlay into the main body of the driver, thereby eliminating unnecessary loads.

On another occasion 1SA data indicated that one particular scratch device was being used half as much as its identical counterpart. In discussing this situation with one of the CE's, it was learned that he had installed a "slow speed" valve in the unit while waiting for a "high speed" valve to be shipped. He had neglected mentioning this action to any of the operations' personnel.

4. Several times throughout the monitoring period the effect of operator interaction on the system was clearly reflected in the 1SA data. One specific example which occurred intermittently was an operator error in configuring the scratch devices. Specifically, the operator was told to 'OFF' a particular scratch device. However, 1SA data showed that scratch was being allocated to that device. The operator was notified and corrective action was taken.

available disk space (with the remaining space on the pack wasted)? Or should it reside on a 6638 with its dedicated channel, but yet even more wasted space? 1SA has proved to be an invaluable tool in helping to make the above decision. Full configuration testing using 1SA is still continuing at the time of this writing.

7. Finally, 1SA has been a useful tool in the design of a representative benchmark. In benchmarking the SCOPE 3.4 operating system, a one-hour job mix was desired whose resource requirements matched as closely as possible those of a "typical" one-hour period of everyday operation. 1SA data averaged over a random two-week period was used as a "standard". Next, a random mix of jobs was obtained from the input queue and run with 1SA monitoring the system's performance. Monitoring data from this mix was then compared with the "standard". Noting that the random mix was too CPU bound, several synthetic I-O bound jobs were included. This new mix was run and its 1SA data was compared with the "standard". This iteration process continued until a mix was obtained whose 1SA data matched as closely as possible that of the "standard". final mix was then declared a "representative benchmark" and was used in the benchmarking studies.

V. Future Plans

This

As the monitoring effort at NWL is a continuing one, enhancements to the monitoring tools are constantly being made. Future plans call for: full-scale hardware monitoring; implementation of a new, more flexible version of SPY; provisions for graphical output from the 1SA data analyzer;

a parametric study of system resources based on data collected from 1SA; dynamic interaction between the various monitors and the system resource allocators. A combination of the latter two efforts would effectively eliminate the human from the data-collection human-interpretation -system-modification sequence. The performance monitors, observing the state of the machine at time t, could appropriately set various system parameters in order to optimize the allocation of resources at time t + At. An effort is currently underway to employ this technique in order to balance scratch allocation among the three different types of mass storage devices.

« Previous Continue »

Books

NBS Special Publication, Issues 401-405