NBS Special Publication

This paper considers human behavior during experimental planning, test execution, and data analysis for computer performance evaluation. Human problems in the testing environment during a computer performance evaluation effort can be likened to a field test for a product verified in a laboratory. That is, a hypothesis that explains user behavior or system relationships is usually first examined in a controlled environment. Once a hypothesis has been explained in the controlled environment, it is usually necessary to expand the explanation to the normal environment; this involves exposing the hypothesis to the variability of human reactions. A computer performance experiment using the normal system environment requires that an analyst be aware of the human problems associated with testing hypotheses.

[merged small][ocr errors][merged small][ocr errors][merged small][merged small][merged small][merged small]

ENSURING PARTICIPANT AWARENESS

When an experiment relies on overt actions of people other than the analyst, it is critical that their orientation include an awareness of the experiment's relationship to the hypotheses; lack of awareness can lead to inconclusive results. The negative effect was clearly illustrated in an experiment where normal operations were required to validate system specific hypotheses.

The investigation involved increasing the peripheral capacity of an IBM 360/65 MVT system. The workload processed by the computer system indicated that projected workloads would saturate the system. A channel was available to attach additional peripheral devices to validate a hypothesis that suggested a solution to this problem.

The hypothesis was stated as follows: Adding peripheral devices to the available channel and directing spooled data to those devices will result in an increased capacity; proving and implementing this solution should result in an increased machine capacity. The experiment included 1) testing the hypothesis using a specialized job stream that led to stating the hypothesis; 2) if these controlled tests were positive, following them with a modified environment for normal operations; and 3) then verifying a capacity increase by examining accounting data. This test could only be executed once because a special device hookup was required which was borrowed from another computer system and had to be returned at the conclusion of the experiment - the time frame was a 15-hour period.

completion and were very positive.

In fact, substantially more work, in terms of jobs processed and computer resources used (CPU time, I/O executions, etc) had been accomplished with the modified peripheral configuration when compared to the equivalent time period of the previous week (morning start-up to 10:00 AM). Only later did we discover that the operators, in an attempt to "help" the experiment, got started much faster than they had the previous week and thus made the system available for 25% more elapsed time than is usual during the time prior to 10:00 AM. The test results were invalidated; nothing could be said about the effect of a modified peripheral configuration for the normal environment, and the addition of peripheral devices could only be based on the non-system specific testing in the controlled environment using the test job stream.

[ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

ority preventing any one job from gaining control for a prolonged time. The system modification was first tested in a controlled environment with special tests to verify its execution characteristics, and was then used for several days in the normal work environment.

Aside from verifying that on-line users were actively using the system (through accounting and observation), the validity of the hypothesis was confirmed by noting the abrupt end to the complaints about lockout. This experiment emphasized the need for an appropriate measure that was easy to collect from the users.

TESTING FOR OBJECTIVITY

Users are not always good subjective evaluators of system changes, and the analyst should not trust them. An experiment to evaluate the effects of reducing the memory dedicated to an on-line system required soliciting user evaluation to identify the effect of reducing the available memory. The users were told that reduced memory configuration would be available Monday, Wednesday, and Friday, and standard memory would be available Tuesday and Thursday. Following the experiments, the users claimed that the Monday/Wednesday/ Friday service was unacceptable and the Tuesday/Thursday service was very good. Only subsequently were they informed that Monday/Wednesday/Friday service was based on the reduced memory configuration. Selfinterest, combined with changed perceptions of reality, make users poor subjective evaluators of system modifications.

VALIDATING THE ENVIRONMENT

Test periods can coincide with a particular event and can invalidate the experiment. Spurious workload changes can also invalidate experimental results. If a testing period coincides with a particular event, such as annual accounting and reports, the results can be deceiving when applied to the standard working environment. Normal fluctuations in workload over short periods (one or two days) can result in the same effect.

One short test (one-half day) was seriously affected by a user who had just completed debugging several simulations and submitted them for execution. The execution of these simulations resulted in extraordinary CPU utilization (approximately double), definitely out of proportion to the workload characteristics when data over a longer period were examined. Fortunately, this workload shift was discovered, though many analysts forget that a computer system seldom reaches a "steady state". The workload characteristics of a system during a short test period must be compared with characteristics over a longer period. A formal comparison period with appropriate

controls should be used to reduce the probability of invalid results caused by autonomous workload changes.

Accounting data should be verified to validate the normalcy of the environment. The measures that can be included are:

CPU utilization

Mass store channel utilization
Tape channel utilization

Memory requested per activity
Memory-seconds used per second
Average number of activities per
job

Activities initiated per hour

Tape and disk files allocated per
activity

Average number of aborts per 100
activities

Cards read and punched per activity
Lines printed per activity

If the objectives, hypotheses, and procedures for an experiment are clearly defined, many of the metrics listed above may be irrelevant and the analyst can ignore them.

USING MULTIPLE CRITERIA

Validating hypotheses about major changes to the computer system presents a serious problem. The magnitude and direction of the proposed modification usually dictate the level of investigation to the impact on the users. This level is usually quite high, particularly if the objective of an investigation is to reduce the availability of resources. People appear to accept results more readily when a number of different indicators all point toward the same conclusion.

An investigation to remove a memory module, at a savings of $7,500.00 per month was definitely desirable from the computer center's viewpoint if the following could be validated:

O Batch turnaround would not increase beyond one hour for the majority of the jobs.

On-line system response time would not increase significantly.

The quality of response to on-line systems would permit them to still be viable.

The experiment involved collecting measures from the system, systems personnel, and a hardware stimulator. Measurements from a three week period were analyzed. The test plan called for the first and

[merged small][merged small][ocr errors][ocr errors][ocr errors][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small]

Introducing human variability into the arena of computer performance evaluation experimentation requires that the analyst take more than the usual precautions, as illustrated above. In addition, the general procedures to validate hypotheses about a particular system requiring testing in the uncontrolled environment should include verifying the hypotheses in a controlled environment and then, exposing the hypotheses to the uncontrolled environment. The normalcy of the environment should be validated through checks on the accounting data and through use of a control period, as appropriate. The inclusions of these should result in more accuracy in testing hypotheses involving people interactions.

REFERENCES

Bell, T.E., B.W. Boehm, and R.A. Watson, "Computer Performance Analysis: Framework and Initial Phases for a Performance Improvement Effort," The Rand Corp., R-549-1-PR, November, 1972.

[ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

« Previous Continue »

Books

NBS Special Publication, Issues 401-405