Page images
PDF
EPUB
[ocr errors][ocr errors]

the anticipated real world life of his system into a relatively short span of simulation time. This capability can provide the manager with a means of examining next week's (month's, year's) production problems this week; thus he can begin to anticipate the points where the operations will require modification. Moreover, he can examine alternative courses of action, prior to their being implemented in the system, to determine which decision is most effective. For example, the manager can increase the processing load in the simulation to determine where the saturation points are. Once these have been determined, he can hold these overloading states constant and vary the other variables (e.g., number of service units, types of devices, methods of operations) to determine how best to increase the system's capacity." (Blunt et al., 1967, p. 76).

Mazzarese (1965) describes the Air Force Cambridge DX-1 system with a "dual computer concept” that permits investigators to change computer logic and configuration in one machine without interference to programs which run on its interconnected mate, especially for study of real time data filtering operations.

7.9 "A technique for servicing time-shared computers without shutting them down has been developed by Jesse T. Quatse, manager of engineering development in the Computation Center at the Carnegie Institute of Technology. The technique is called STROBES, an acronym for sharedtime repair of big electronic systems. It includes a test program to exercise the computer, and modified test gear to detect faults in the system.” (Electronics 38, No. 18, 26 (1965)).

7.10 “Diagnostic engineering begins in the initial phases of system design. A maintenance strategy is defined and the system is designed to include features necessary to meet the requirements of this strategy. Special features, known as diagnostic handles', are needed for testing the system automatically, and for providing adequate error isolation." (Dent, 1967, p. 100).

An instantaneous alarm followed by a quick and correct diagnosis in a self-controlling system will limit down-time in many cases to the mere time of repair. Instruments for error detection are unnecessary." (Steinbuch and Piske, 1963, p. 859).

7.11 Further, “when a digital system is partitioned under certain restrictions into subsystems it is possible to achieve self-diagnosis of the system through the mutual diagnosis of its subsystems.” (Forbes et al., 1965, p. 1074).

A diagnostic subsystem is that portion of a digital system capable of effectively diagnosing another portion of the digital system. It has been shown that at least two mutually exclusive diagnostic subsystems are needed in self-diagnosable systems. (Forbes et al., 1965, p. 1081).

7.12 "Systems are used to test themselves by generation of diagnostic programs using predefined data sets and by explicit controls permitting

degradation of the environment." (Estrin et al., 1967, p. 645).

“The ‘Night watchman' experiments are directed toward the maintenance problem. Attempts will be made to structure a maintenance concept that will allow for the remote-automatic-checkout of all the computers in the network from a single point. The concept is an extension of the 'FALT' principle mentioned previously. Diagnostic programs will be sent over the lines, during off-use time, to check components, aggregates of components, complete modules, and the entire system. The 'Sentinel station of the network will be responsible for the gathering of statistical data concerning the data, the queries, the traffic, and the overall operations. (Hoffman, 1965, pp. 98-100.)

“The Sentinel is the very heart of the experimental network. It is charged with the gathering of the information needed for long range planning, the formulation of data automation requirements, and the structuring of prototype systems. In addition to the gathering of statistical data, the sentinel will be the control center for the network, generating priority, policy, and operational details. The responsibility for the observance of security and proprietary procedures will rest with the sentinel station." (Hoffman, 1965, p. 100.)

“This data was taken by a program written to run as part of the CTSS Supervisory Program. The data-taking program was entered each time the Scheduling Algorithm was entered and thus was able to determine the exact time of user state changes.” (Scherr, 1965, pp. 27-28).

“Data taken over the summer of 1964 by T. Hastings. indicates that the average program accesses (i.e., reads or writes) approximately 1500 disk words per interaction.” (Scherr, 1965, p. 29).

“We can and will develop instrumentation which will be automatically inserted at compile time. A user easily will be able to get a plot of the various running times of his program

Sutherland also refers to a Stanford University program which “plots the depth of a problem tree versus time was used to trace the operation of a Kalah-playing program.” (Sutherland, 1965, pp. 12-13).

7.13 “The techniques of fault detection fall into two major categories: 1. Concurrent diagnosis by the application of

error-detecting codes and special monitoring circuits. Detection occurs while the system

is being used. 2. Periodic diagnosis using diagnostic hardware

and/or programs. Use of the system is inter

rupted for diagnosis." (Avižienis, 1967, p. 734). “The four principal techniques of correction are: 1. Correction of errors by the use of error

correcting codes and associated special purpose hardware and/or software (including recomputation).

[ocr errors]

a

2. Replacement of the faulty element or system

by a stand-by spare. 3. Replacement as above, with subsequent main

tenance of the replaced part and its return

to the stand-by state. 4. Reorganization of the system into a different

fault-free configuration which can continue

the specified task.” (Avižienis, 1967, p. 734). “The STAR (Self-Testing and Repairing) computer, scheduled to begin experimental operation at the Jet Propulsion Laboratory of the California Institute of Technology this fall, is expected to be one of the first computers with fully automatic selfrepair as one of its normal operating functions There are three 'recovery' functions of the STAR computer: (1) detection of faults; (2) recognition of temporary malfunctions and of permanent failures; and (3) module replacement by power switching. The occurrence of a fault is detected by applying an error-detecting code to all instructions and numbers within the computer. Temporary malfunctions are corrected by repeating a part of the program. If the fault persists, the faulty module is replaced.” (Avižienis, 1968, p. 13).

7.14 “Diagnostic routines can check the oper. ating of a computer for the following possible malfunctions: a single continuous malfunction, several continuous malfunctions, and intermittent malfunctions. When the test routine finds an error it can transfer program control to an appropriate malfunction isolation subroutine. This type of diagnostic technique is standard and has been well used by the computer industry for large package replacement." (Jacoby, 1959, p. 7–1).

“Needless to say, in order for any malfunction to be isolated by an automatic program, it is necessary for a minimum amount of equipment to function adequately. One of the functions of this minimum equipment includes the ability to sequence from one instruction to another, and to be able to interpret (correctly) and execute at least one transfer of control instruction so that logical choices can be made. The control functions of a computer can be defined as Boolean algebraic expressions of the instantaneous state of the computer. If we state that a line, or path, common to two control statements contains those components that are activated when either of the statements is true, this line is either a factor of both statements or a factor of terms of both statements. Similarly, if we consider circuit elements activated by one but not both of two ways to accomplish the same control function, we have a picture of two terms in the algebraic statement for the control function separated by the connector OR.

“A Boolean term will appear as a circuit which must be active for any statement, of which it is a factor, to be true. Hence the location of circuit malfunctions may be considered from the point of view of isolating the minimal Boolean term involved." (Jacoby, 1959, p. 7-1).

7.15 “A program testing method based on the monitoring of object-program instruction addresses (as opposed to a method dependent on, e.g., the occurrence of particular types of instruction, or the use of particular data addresses) would appear to be the most suitable, because the instruction address is the basic variable of this monitoring technique. Monitoring could be made 'selective by specifying instruction addresses at which it is to start and stop: to start it at an arbitrary instruction address it is only necessary to replace the instruction located there by the first unconditional interrupt inserted, and similarly when monitoring is to stop and restart later.

“Another use in this field would be to include in the Monitor facilities for simulating any instruction, and to supply it with details of particular instructions suspected of malfunctioning. The Monitor could then stop any program just before one of these instructions was to be obeyed, simulate it, allow the program to execute the same instruction in the normal way, and then compare the results obtained by the normal action and by simulation." (Wetherfield, 1966, p. 165).

"Of course, having achieved the aim of being able to trace in advance the exact course of the object program's instructions, the Monitor is then able to simulate their actions to any desired degree, and it is here that the power of the technique can be exploited. The contents of the store, registers, etc., before the execution of any instruction can be inspected by the Monitor if it temporarily replaces that instruction by an unconditional interrupt. (Wetherfield, 1966, p. 162).

“The monitoring operation can go wrong for any of the following three reasons.

“(1) In the first case one of the planted unconditional interrupt instructions actually overwrites the instruction at which the object program is going to resume (the one at which monitoring started). This would effectively bring things to a standstill since the situation will recur indefinitely. If the rules above have been followed, this situation can only arise when a branching instruction includes itself among its possible destinations, i.e., there is a potential loop stop in the object program. In order to cope with this situation, if it could occur, it may be necessary for the Monitor to simulate the action of the branch instruction completely and make the object program bypass it. The loop stop might still occur, but it would be foreseen.

“(2) The second possible reason for a failure of the monitoring operation occurs if one of the planted instructions overwrites part of the data of the object program, thus affecting the latter's behaviour. This 'data' might be a genuine instruction which is examined, as well as obeyed, by the object program. Alternatively it might be genuine data which happens to be stored in a position which is, by accident or design, a 'redundant' destination of a branching instruction. Both of these dangers can be anticipated by the Monitor, at the cost of a more

6

376-411 0 - 70 - 9

[ocr errors][ocr errors]

а

[ocr errors]

answers.

detailed examination of instructions (to find out designer and the ingenuity of the user. Digital which store references by the object program simulation can expedite the analysis of a complex involve a replaced instruction location) and more system under various stimuli if the aggregate can frequent interrupts.

be divided into elements whose performance can be “The situation savours of 'trick' programming. It suitably described. If the smallest elements into is apparent that the monitoring process will be which we can divide a system are themselves unsimplified if there is some guarantee that these predictable (even in a probabilistic sense) digital oddities are absent from object programs." (Wether- simulation is not feasible. (Conway, et al., 1959, field, 1966, pp. 162–163).

p. 94). This feasibility test uncovers an important 7.16 "MAID (Monroe Automatic Internal limitation in today's simulation technology with Diagnosis) is a program that tells a machine how respect to information systems. In many respects to measure its circuitry and test performance some of the more important man-information-system on sample problems - computer hypochondria.' interactions cannot now be described in a formal (Whiteman, 1966, p. 67).

manner; hence, cannot be characterized for 7.17 “I have used a program which interprets digital simulation. For example, one can calculate the program under test and makes a plot of the the speed and costs of processing an inquiry, but memory address of the instruction being executed cannot predict if the output will satisfy the user or versus time. Such a plot shows the time the program estimate its impact on his operations. spends doing its various jobs. In one case, it showed "This limitation, therefore, (1) restricts simulation me an error which caused a loss of time in a program applications to examining the more mechanical which nevertheless gave correct

aspects of data processing, or (2) forces the design “Think of the thousands of dollars saved by engineer to adopt some simplifying assumptions tightening up that one most-used program loop. concerning the effects of man's influence on the Instrumentation can identify which loop is the system. An example of the first point is a data most used.” (Sutherland, 1965, pp. 12–13).

flow simulation examining the rate of data proc. 7.18 “On-Line Instrumentation will bring us essing without regard to the quality of the types better understanding of the interplay of the pro- and mixes of equipment and personnel. This capagrams and data within the computer. Simple devices bility for examining the resultant effects in varying and programs to keep track, on-line, of what the parameters of the system enable the design engineer computer does will bring us better understanding to explore more alternatives in less time and at less of what our information reprocessing systems cost than ever before; e.g., he can develop costare actually doing.” (Sutherland, 1965, p. 9).

capability curves for different possible system 7.19 “The process of building a pilot system configurations under both present and anticipated configuration and then evaluating it, modifying it, processing needs. Neglecting this aspect of systems and improving it is very costly both in time and analysis has sometimes led to the implementation money. Another approach is possible. Before he of a system saturated by later requirements and builds the system, the designer should be able confronted by an unnecessary high cost for modificato test his concepts on a simulation model of a tion or replacement.” (Blunt et al., 1967, pp. 75–76). document retrieval system. One such model for "To use simulation techniques in evaluating simulating information storage and retrieval

different computer systems, one must be able to systems was designed by Blunt and his co-workers

specify formally the expected job mix and conat HRB-Singer, Inc., under a contract with the straints under which the simulated system must Office of Naval Research. In this model, the operate, e.g., operating time per week. Equally iminput parameters for the simulation reflects the

portant, one must carefully select a set of char. configuration of the system, the work schedule of the acteristics on which the competing systems will be system, the work schedule of the personnel, judged. For different installations the most im equipment availability, the likelihood and effect

portant characteristics may well be different. Each of errors in processing and the location and availa- system under consideration is modelled, simulation bility of the system user. Simulation output pro- runs are executed, and the results are compared on vides a study of system response time (both delay the selected characteristics. time and processing time), equipment and personnel “Unfortunately, the ideal case seldom occurs. work and idle time and the location and size of Often the available information about the computer the data queues. The systems designer can thus vary system's expected job mix is very limited. Furtherthe inputs, use the model to simulate the inter- more, it is a well-known fact that an installation's actions among personnel, equipment, and data at job mix itself may be strongly influenced both each step of the information processing cycle, qualitatively and quantitatively by the proposed and then determine the effect on the system re- changes in the system. For example, many of the sponse time." (Borko, 1967, p. 55).

difficulties with early time-sharing systems can be 7.20 "Simulation is a tool for investigation and, attributed to the changes in user practices caused like any tool, is limited to its inherent potential. by the introduction of the system. When statistics Moreover, the realization of this potential is depend- on job mix are available, they are often expressed in ent upon economics, the craftsmanship of the averages. Yet, it may be most important to simulate

99

a system's performance under extreme conditions. Finally, it is often difficult to show that a simulation is valid - that is, that it actually does simulate the system in question.” (Huesmann and Goldberg, 1967, p. 150).

7.21 “The field of information retrieval has been marked by a paucity of mathematical models, and the basis of present operational computer retrieval systems is essentially heuristic in design." (Baker, 1965, p. 150).

“The semantic and linguistic aspects of information retrieval systems also lend themselves poorly to the rigidity of models and model techniques, for which claims often lack empirical support." (Blunt, 1965, p. 105).

7.22 “There are structures which can easily be defined but which present-day mathematics cannot handle because of the limitations of presentday theory.” (Hayes, 1963, p. 284).

"Markov models cannot, in general, be used to represent processes where other than random queuing is used.” (Scherr, 1965, p. 32).

“Clearly, we need some mathematical models permitting the derivation of methods which will accomplish the desired results and for which criteria of effectiveness can be determined. Such models do not appear often in the literature.” (Bryant, 1964, p. 504).

“First, it will be necessary to construct mathematical models of systems in which content, structure, communication, and decision variables all appear. For example, several cost variables are usually included in a typical operations research model. These are either taken as uncontrollable or as controllable only by manipulating such other variables as quantity purchased or produced, time of purchase or production, number and type of facilities, and allocation of jobs to these facilities. These costs, however, are always dependent on human performance, but the relevant variables dealing with personnel, structure, and communication seldom appear in such models. To a large extent this is due to the lack of operational definitions of many of these variables and, consequently, to the absence of suitable measures in terms of which they can be characterized.” (Ackoff, 1961,

in the real world do not lend themselves to neat mathematical formulations and in most cases the operations analyst is forced to reduce the problem to simpler terms to make it tractable." (Clapp, 1967, p. 5).

Admittedly the degree to which identifiable factors can be measured - compared to the influence of unidentifiable factors – does help determine whether or not an approach can be scientific. It acts as a limit on the area where scientfic methods can be applied. Precision in model building is relating to the difficulty of the problem and the state of human knowledge concerning specific techniques and their application.” (Kozmetsky and Kircher, 1956, p. 137).

7.23 “There is no guarantee that a model such as latent class analysis, factor analysis, or anything else borrowed from another field will meet the needs of its new context; however this should not dissuade one from investigating such plausible models.” (Baker, 1965, p. 150).

"Models must be used but must never be believed. As T. C. Chamberlain said, 'science is the holding of multiple working hypotheses'." (Tukey and Wilk, 1966, p. 697).

7.24 “System simulation or modeling was subse. quently proposed as a substitute for deriving test problems and is still generally accepted as such even though its use introduced the new difficulty of determining and building meaningful models." (Davis, 1965, p. 82).

“The biggest problem in simulation modeling, as in all model building, is to retain all 'essential detail and remove the nonessential features." (Scherr, 1965, p. 109). (

"The fundamental problem in simulation of digital networks is that of economically constructing a mathematical model capable of faithfully replicating the real network's behavior in regard to simulation objectives." (Larsen and Mano, 1965, p. 308).

7.25 “At this time, there exists no specialpurpose simulation programming language specifically for use with models of digital computer systems. The general-purposes languages, such as SIMSCRIPT, GPSS, etc., all have faults which render them unsuitable for this type of work." (Scherr, 1965, p. 43).

“The invention of an adequate simulation language promises to be the crowbar needed to bring the programming of operating systems to the level of sophistication of algebraic or commercial programming languages." (Perlis, 1965, p. 192).

"The technique of input simulation ... can be very expensive. The programs necessary to create the simulated inputs are far from trivial and may well constitute a second system larger than the operational system.” (Steel, 1965, p. 232).

7.26 “Those programs which require the simulated computer system and job mix to be specified in algebraic or assembly languages have proved useful; but as general computer systems simulation tools, they require too much difficult recording to be

)

p. 38).

“Mathematical analysis of complex systems is very often impossible; experimentation with actual or pilot systems is costly and time consuming, and the relevant variables are not always subject to control. . .

“Simulation problems are characterized by being mathematically intractable and having resisted solution by analytic methods. The problems usually involve many variables, many parameters, functions which not well-behaved mathe. matically, and random variables. Thus, simulation is a technique of last resort.” (Teichroew and Lubin, 1966, p. 724).

“The complex systems generally encountered

are

[ocr errors]

p. 177).

1

p. 152).

can

completely satisfactory. One way to improve upon on main-storage usage. Studies of this nature can this situation has been to use languages specifically become very time consuming unless parameter designed to simulate systems. Teichroew and Lubin selections and variations are carefully limited. It in a recent review have listed more than twenty is no small problem to determine which are the languages, among them GPSS, SIMSCRIPT, SOL, major variations that affect the system. In this and CSL. These simulation languages allow the aspect, simulation is not as convenient as algomodeller to specify the computer configuration rithmic methods with which many variations can and job mix in a more convenient manner. be tabulated quickly and cheaply." (Seaman, 1966, (Huesmann, and Goldberg, 1967, p. 152).

7.27 “One of the more exotic applications of 7.28 "The (SIMSCRIPT] notation is an aug. digital computers is to simulate a digital computer mented version of FORTRAN, which is acceptable; on another entirely different type of computer. Using but this organization does not take advantage of the a simulation program, application programs de- modularity of digital systems. veloped for the first computer, the source computer, "SIMSCRIPT is an event-based language. That may be executed on a second computer, the object is, the simulation is described, event by event, with computer.

small programs, one per event. Each event program “Simulation obviously provides many advantages (or sub-program) must specify the times for the in situations where a computer is replaced by a events following it. Conditional scheduling of an different computer, for which the applications have event is extremely difficult." (Scherr, 1965, p. 43). not yet been programmed. Simulation techniques 7.29 Ewards points out that “... the preparaenable an installation to continue solving problems tion of so-called scenarios, or sequences of events using existing programs after the new computer to occur as inputs to the simulation, is a major has been installed and the old one removed....

problem, perhaps the most important one, in the "Another situation in which simulation is advan

design of simulations, especially simulations of tageous is during the development of a new com- information-processing systems.” (Edwards, 1965, puter. Once specifications for the new computer have been established, programming of applications 7.30 “Parallel processes

be rendered for the computer can proceed in parallel with hard

sequential, for simulation purpose; difficulties then ware developments. The use of a simulator in this

arise when the processes influence each other, situation enables the users to debug their applica- leading perhaps to incompatibilities barring a tions before the hardware is actually available.' simultaneous development. Difficulties of this type (Trimble, 1965, p. 18).

cannot be avoided, as a matter of principle, and the "One of the most successful applications of the

system is thus not deterministic; the only way out recent microprogramming technology is in the

would be to restore determinism through recourse to simulation of computers on computers.

appropriate priority rules. This approach is justified “The microprogram control and the set of only if it reflects priorities actually inherent in the microprogram routines are in effect a simulation

system." (Caracciolo di Forino, 1965, p. 18). program that simulates the programmer's instruc

7.31 “As a programming language, apart from tion set on a computer whose instruction set is

simulation, SIMULA has extensive list processing the set of elementary operations. It may be equally

facilities and introduces an extended co-routine possible to simulate computers with other programmer instruction sets in terms of the same set

concept in a high-level language." (Dahl and

” of elementary operations. This, slightly oversimpli

Nygaard, 1966, p. 671). fied perhaps, is the idea of hardware assisted 7.32 “The LOMUSS model of the Lockheed simulation that is now usually called emulation."

UNIVAC on-line, time-sharing, remote terminal (Rosen, 1968, p. 1444).

system simulated two critical periods ... and “As a result of simulation's ability to deal with provided information upon which the design of the many details, it is a good tool for studying extensive 1108 system configuration was based. An effort is and complicated computer systems. With simula

continuing which will monitor the level and chartion, one may assess the interaction of several sub

acteristics of the workload, equipment utilization, systems, the performances of which are modified turnaround time, etc., for further model validation.” by internal feedback loops among the subsystems.

(Hutchinson and Maguire, 1965, pp. 166-167). For instance, in a teleprocessing system where pro- “A digital computer is being used to simulate the grams are being read from drum storage and held logic, determine parts values, compute subunit temporarily in main storage, the number of mes- loading, write wiring lists, design logic boards, print sages in the processing unit depends upon the drum checkout charts and maintenance charts. Simulating response time, which depends upon the drum access the logic and computing the loading of subunits rate, which, in turn, depends upon the number of gives assurance that a computer design will function messages in main storage. In this case, only a properly before the fabrication starts. After the logic system-wide simulation that includes communica- equations are simulated, it is a matter of hours until tion lines, processing unit, and I/O subsystems will all fabrication information and checkout information determine the impact of varying program priorities is available. Examples are given of the use of these

« PreviousContinue »