Research and Development in the Computer and Information Sciences: Overall system design considerations; a selective literature review

thin films of MnBi which are used for Curie point magnetic recording." (Carlson and Ives, 1968, p. 1). 6.158 "The results of the studies described in this paper have established laser heat-mode recording as a very high resolution real-time recording process capable of using a wide variety of thin film recording media. The best results were obtained with images which are compatible with microscope-type optics. The signals are in electronic form prior to recording and can receive extensive processing before the recording process occurs. In fact, the recordings can be completely generated from electronic input." (Carlson and Ives, 1968, p. 7).

6.159 "Instead of recording a bit as a hole in a card, it is recorded on the file as a grating pattern of a certain spacing . . . A number of different grating patterns with different spacings can be superposed and when light passes through, each grating bends the light its characteristic amount, with the result that the pattern decodes itself . . . The new system allows for larger areas on the film to be used and lessens dust sensitivity and the possibility of dirt and scratch hazards." (Commun. ACM 9, No. 6, 467 (June 1966)).

6.160 "Both black-and-white and color video recordings have recently been made on magnetic film plated discs. .

"Permanent memory systems employing silverhalide film exposed by electron or laser beams. It is possible to record a higher density with beams. Readout at an acceptable error rate is the major problem." (Gross, 1967, p. 5).

uses

"A recently conceived memory which optical readout. Instead of recording bits as pulses, bits are recorded as frequencies. An electron beam, intensity modulated with the appropriate frequencies, strikes the electron sensitive silver-halide film moving transverse to the direction of tape motion . . ." (Gross, 1967, p. 8).

"For recording analog information Ampex has focussed efforts on silver-halide film . . .[which] can be made sensitive to either electron or laser beams... packing density is an order of magnitude greater than the most dense magnetic recording." (Gross, 1967, p. 6).

"Recent work at Ampex indicates that the Kerr magneto-optic effect is likely to be practical for reading digital information. Recording on reflective plated tape for magneto-optic reproducing can be done by local heating with a laser or electron beam." [Erasable, potential density 1 × 108]." (Gross. 1967, p. 8).

6.161 "At first glance, machining with electron beams, or adding ions, appear to be suitable for recording digital information. However, problems in obtaining sufficient linearity in the transfer function (the dynamic range and signal-to-noise limits), and accurately positioning the electron beam for reading make it impossible to read out the potential recording density with acceptable error rates.

The reduced packing density necessary for acceptable error rates cause these approaches to suffer by comparison with magnetic recording." (Gross, 1967, p. 6).

6.162 "The advantages of electron beams over light are a thousandfold increase in energy density, easy control of intensity and position, and a substantial increase in resolution. To offset these advantages, there are the complications of a demountable vacuum system." (Thornley et al., 1964, p. 36).

6.163 "... Some [media] like thermoplastics, involve nearly reversible changes, and the noise content therefore rises with use." (Gross, 1967, p. 2).

6.164 "The standing-wave read-only memory is based on the Lippmann process .. [in which] a panchromatic photographic emulsion is placed in contact with a metallic mirror . . . Sufficiently coherent light passes normally through the emulsion, reflects from the mirror, and returns through the emulsion. This sets up standing waves with a node at the metallic mirror surface. Developable silver ions form in the regions of the antinodes of the standing wave . . . If several anharmonic waves are used to expose the same region of the emulsion, each will set up a separate layer structure... Conceivably, n color sources spaced appropriately over the band of sensitivity could provide n information bits, one per color, at each location". (Fleisher et al., 1965, p. 1).

Advantage would then be taken of "... the Bragg effect, which causes the reflected light to shift to shorter wavelength as the angle of incidence increases. . . With this method, a monochromatic light source, say of violet color, could read out the violet bit at normal incidence and the red bit at the appropriate angle from normal. Hence, a single monochromatic source, such as a laser, could be used to read out all bits . . ." (Fleisher et al., 1965, p. 2).

Further, "random word selection requires a summation of various injection lasers. . . or the use of a white light source in which all colors are present. This source is then deflected to the selected location by the electro-optical deflector. The output from the memory plane is then separated into the various colors by means of a prism or other dispersive medium for a parallel bit readout". (Fleisher et al., 1965, p. 19).

6.165 "A feature of the SWROM [standing-wave read-only memory] which appears to be unique is its capability of storing both digital and video (analog) information. This feature, combined with the capability of the memory for simultaneous, multibit readout with minimal cross talk, will give the SWROM an even wider range of application." (Fleisher et al., 1965, p. 25).

6.166 "Parallel word selection . . . could be

accomplished by fiber-optic light splitting. It could also be accomplished by flooding the area to be read out with monochromatic light whose frequency

is that of the bit or series of bits to be selected. This type of word selection would be useful for associative word selection." (Fleisher et al., 1965, p. 21).

7. Debugging, On-Line Diagnosis, Instrumentation, and Problems of Simulation

7.1 "The quantity and quality of debugging must be faced up to in facility design. This is perhaps the area which has been given more lip service and less attention than any other." (Wagner and Granholm, 1965, p. 284).

"Software checkout still remains an unstructured art and leaves a lot to be desired for the production of perfect code." (Weissman, 1967, p. 31).

"Debugging, regardless of the language used, is one of the most time consuming elements in the software process." (Rich, 1968, p. 34).

we are now

"It has been suggested that entering an era in which computer use is 'debugginglimited'." (Evans and Darley, 1966, p. 49).

7.2 "As computing systems increase in size and complexity, it becomes both more important and more difficult to determine precisely what is happening within a computer. The two sorts of performance measurements which are readily available are not very useful; they are the microtiming information provided by the manufacturer (.4 microseconds/floating add) and the macro-timing information provided by the user ("why does it take three days to get my job back?"). The relationship, if any, between them is obscured by the intricate bulk of the operating system; if it is a multi-programming or time-sharing system, the obscurity is compounded.

"The tools available to the average installation for penetrating this maze are few and inadequate. Simulation is not particularly helpful: the information which is lacking is the very information necessary for the construction of an accurate model. Trace routines interfere excessively with the operation of the system, distorting storage requirements as well as relative timing information. Hardware monitors are not generally available, and though a wondrous future is foreseen for certain of them, they have yet to demonstrate their capabilities in an operational environment; furthermore, they are certain to be too costly for permanent installation, and perhaps too cumbersome for temporary use. The peripheral processor of the Control Data 6000 series computers, however, provides some installations with an easily utilized, programmable hardware monitor for temporary use at no extra cost.' (Stevens, 1968, p. C34).

"Without instrumentation, the user is swimming against the tide of history. It is commonly thought that a good programmer naturally achieves at least 80% of the maximum potential efficiency for a program. But while systems have increased greatly in size and complexity, the average expertise of programmers has decreased. In fact, it is axiomatic

that virtually any program can be speeded up 25 to 50% without significant redesign! Unless monitored and measured, a program's efficiency may easily be as low as 25%. What is worse, multiprogramming, multiprocessing, real time, and other present-day methods have created such a jumble of interactions and interferences that without instrumentation it would be impossible to know where effort applied for change would yield the best return. One tries to mine the highgrade ore first, while it still exists.” (Bemer and Ellison, 1968, p. C40).

7.3 For example, "another practical problem, which is now beginning to loom very large indeed and offers little prospect of a satisfactory solution, is that of checking the correctness of a large program." (Gill, 1965, p. 203).

"With the introduction of time-sharing systems, the conventional tools have become almost worthless. This has forced a reappraisal of debugging procedures. It has become apparent that a new type of debugging tool is needed to handle the special problems created by a time-sharing system. These special problems result from the dynamic character of a time-sharing system-a system in which the program environment is continually changing, a system in which a user is unaware of the actions of other users, a system in which program segments are rolled in and out of storage locations, and a system in which one copy of code can be shared by many users. To debug in this dynamic environment, the programmer needs a debugging support system-a set of debugging programs that operate largely independently from the operating system they service. . .

"What is needed for time-sharing is a debugging support system that meets the following require

ments:

● The system should permit a system programmer at a user terminal to debug system programs associated with his task. When used in this manner, the support system should operate in a time-sliced mode.

When used to debug a separate task, the support system should provide the facility to modify a system program in relation to that task, without affecting the program as executed in relation to other tasks.

When a system program bug cannot be located and repaired from a user terminal, the support system should permit a skilled system programmer at a central location to suspend time-sharing activity until the error is located and repaired. The support system should then permit time-sharing activity to be

resumed as though there had been no interruption. The support system should permit a system programmer to monitor the progress of any task from a remote terminal or from the user's terminal.

The support system should contain the facility to remain dormant until activated by a specified condition. When activated by the condition, the system should be able to gather specified data automatically and then permit processing to continue.

In its dormant state, the support system should not impact the performance of the parent timesharing system.

The support system should use a minimum of main storage and reside primarily on highspeed external storage.

The support system should be completely independent of the time-sharing system (that is, it must use none of the facilities of the parent system), and it must be simple enough to eliminate any requirement for a support system of its own.

"An effort is currently under way to produce a time-sharing support system that meets these requirements." (Bernstein and Owens, 1968, pp. 7, 9). "There has been far too little concern on the part of the hardware system designers with the problems of debugging of complex programs. Hardware aids to program debugging would be among the most important hardware aids to software production. On-line debugging is essential. It should be possible to monitor the performance of software on a cathode ray tube console, without interfering with the performance of the software. It should be possible to examine areas of peripheral storage as well as areas of core storage." (Rosen, 1968, p. 1448).

Further, "the error reporting rate from a program system of several million instructions is sufficient to occupy a staff larger than most computing installations possess." (Steel, 1965, p. 233).

7.4 "By online debugging we mean program debugging by a programmer in direct communication with a computer (through, typically, a typewriter or teletype), making changes, testing his program, making further changes, etc., all with a reasonably short response time from the computer, until a satisfactory result is achieved." (Evans and Darley, 1965, p. 321).

7.5 "Another area of contact between hardware and debugging is involved with trapping . . . The user may ask for a trap on any combination of a number of conditions, such as a store into a specified register, execution of an instruction at a specified location, or execution of any skip or jump instruction. The debugging program handles the interrupt and reports the relevant information to the user. (Evans and Darley, 1966, p. 44).

It is to be noted that although these authors, as of 1966, were concerned that "very little data seems to exist on the relative efficiency of on-line

program debugging versus debugging in a batchprocessing mode." (Evans and Darley, 1966, p. 48), by 1968 Sackman et al., could report "on the basis of the concrete results of these experiments, the online conditions resulted in substantially and, by and large, significantly better performance for debug man hours than the offline conditions." (Sackman et al., 1968, p. 8).

7.6 "We have, in general, merely copied the on-line assembly-language debugging aids, rather than design totally new facilities for higher-level languages. We have neither created new graphical formats in which to present the debugging information, nor provided a reasonable means by which users can specify the processing required on any available debugging data.

"These features have been largely ignored because of the difficulty of their implementation. The debugging systems for higher-level languages are much more complex than those for assembly code. They must locate the symbol table, find the beginning and end of source-level statements, and determine some way to extract the dynamic information-needed for debugging-about the program's behavior, which is now hidden in a sequence of machine instructions rather than being the obvious result of one machine instruction. Is it any wonder that, after all this effort merely to create a minimal environment in which to perform on-line higher-level language debugging, little energy remained for creating new debugging aids that would probably require an increased dynamic information-gathering capability.

"EXDAMS (EXtendable Debugging And Monitoring System) is an attempt to break this impasse by providing a single environment in which the users can easily add new on-line debugging aids to the system one-at-a-time without further modifying the source-level compilers, EXDAMS, or their programs to be debugged. It is hoped that EXDAMS will encourage the creation of new methods of debugging by reducing the cost of an attempt sufficiently to make experimentation practical. At the same time, it is similarly hoped that EXDAMS will stimulate interest in the closely related but largely neglected problem of monitoring a program by providing new ways of processing the program's behavioral information and presenting it to the user. Or, as a famous philosopher once almost said, 'Give me a suitable debugging environment and a tool-building facility powerful (and simple) enough, and I will debug the world'." (Balzer, 1969, p. 567).

7.7 "Diagnostics have been almost nonexistent as a part of operating software and very weak as a part of maintenance software. As a result needless time is spent determining the cause of malfunctions; whether they exist in the program, the hardware, the subsets, the facilities or the terminals." (Dantine, 1966, pp. 405-406).

7.8 "Another advantage of computer simulation is that it may enable a system's manager to shrink

the anticipated real world life of his system into a relatively short span of simulation time. This capability can provide the manager with a means of examining next week's (month's, year's) production problems this week; thus he can begin to anticipate the points where the operations will require modification. Moreover, he can examine alternative courses of action, prior to their being implemented in the system, to determine which decision is most effective. For example, the manager can increase the processing load in the simulation to determine where the saturation points are. Once these have been determined, he can hold these overloading states constant and vary the other variables (e.g., number of service units, types of devices, methods of operations) to determine how best to increase the system's capacity." (Blunt et al., 1967, p. 76).

Mazzarese (1965) describes the Air Force Cambridge DX-1 system with a "dual computer concept" that permits investigators to change computer logic and configuration in one machine without interference to programs which run on its interconnected mate, especially for study of real time. data filtering operations.

7.9 "A technique for servicing time-shared computers without shutting them down has been developed by Jesse T. Quatse, manager of engineering development in the Computation Center at the Carnegie Institute of Technology. The technique is called STROBES, an acronym for sharedtime repair of big electronic systems. It includes a test program to exercise the computer, and modified test gear to detect faults in the system." (Electronics 38, No. 18, 26 (1965)).

7.10 "Diagnostic engineering begins in the initial phases of system design. A maintenance strategy is defined and the system is designed to include features necessary to meet the requirements of this strategy. Special features, known as 'diagnostic handles', are needed for testing the system automatically, and for providing adequate error isolation." (Dent, 1967, p. 100).

"An instantaneous alarm followed by a quick and correct diagnosis in a self-controlling system will limit down-time in many cases to the mere time of repair. Instruments for error detection are unnecessary." (Steinbuch and Piske, 1963, p. 859).

7.11 Further, "when a digital system is partitioned under certain restrictions into subsystems it is possible to achieve self-diagnosis of the system through the mutual diagnosis of its subsystems." (Forbes et al., 1965, p. 1074).

"A diagnostic subsystem is that portion of a digital system capable of effectively diagnosing another portion of the digital system. It has been shown that at least two mutually exclusive diagnostic subsysare needed in self-diagnosable systems. (Forbes et al., 1965, p. 1081).

tems

7.12 "Systems are used to test themselves by generation of diagnostic programs using predefined data sets and by explicit controls permitting

degradation of the environment." (Estrin et al., 1967, p. 645).

"The Nightwatchman' experiments are directed toward the maintenance problem. Attempts will be made to structure a maintenance concept that will allow for the remote-automatic-checkout of all the computers in the network from a single point. The concept is an extension of the 'FALT' principle mentioned previously. Diagnostic programs will be sent over the lines, during off-use time, to check components, aggregates of components, complete modules, and the entire system. The 'Sentinel' station of the network will be responsible for the gathering of statistical data concerning the data, the queries, the traffic, and the overall operations.' (Hoffman, 1965, pp. 98-100.)

"The Sentinel is the very heart of the experimental network. It is charged with the gathering of the information needed for long range planning, the formulation of data automation requirements, and the structuring of prototype systems. In addition to the gathering of statistical data, the sentinel will be the control center for the network, generating priority, policy, and operational details. The responsibility for the observance of security and proprietary procedures will rest with the sentinel station." (Hoffman, 1965, p. 100.)

"This data was taken by a program written to run as part of the CTSS Supervisory Program. The data-taking program was entered each time the Scheduling Algorithm was entered and thus was able to determine the exact time of user state changes." (Scherr, 1965, pp. 27-28).

"Data taken over the summer of 1964 by T. Hastings .. indicates that the average program accesses (i.e., reads or writes) approximately 1500 disk words per interaction." (Scherr, 1965, p. 29).

"We can and will develop instrumentation which will be automatically inserted at compile time. A user easily will be able to get a plot of the various running times of his program

Sutherland also refers to a Stanford University program which "plots the depth of a problem tree versus time was used to trace the operation of a Kalah-playing program. (Sutherland, 1965, pp. 12-13).

7.13 "The techniques of fault detection fall into two major categories:

1. Concurrent diagnosis by the application of error-detecting codes and special monitoring circuits. Detection occurs while the system is being used.

2. Periodic diagnosis using diagnostic hardware and/or programs. Use of the system is interrupted for diagnosis." (Avižienis, 1967, p. 734). "The four principal techniques of correction are: 1. Correction of errors by the use of errorcorrecting codes and associated special purpose hardware and/or software (including recomputation).

2. Replacement of the faulty element or system by a stand-by spare.

3. Replacement as above, with subsequent maintenance of the replaced part and its return to the stand-by state.

4. Reorganization of the system into a different fault-free configuration which can continue the specified task." (Avižienis, 1967, p. 734).

"The STAR (Self-Testing and Repairing) computer, scheduled to begin experimental operation at the Jet Propulsion Laboratory of the California Institute of Technology this fall, is expected to be one of the first computers with fully automatic selfrepair as one of its normal operating functions There are three 'recovery' functions of the STAR computer: (1) detection of faults; (2) recognition of temporary malfunctions and of permanent failures; and (3) module replacement by power switching. The occurrence of a fault is detected by applying an error-detecting code to all instructions and numbers within the computer. Temporary malfunctions are corrected by repeating a part of the program. If the fault persists, the faulty module is replaced." (Avižienis, 1968, p. 13).

7.14 "Diagnostic routines can check the operating of a computer for the following possible malfunctions: a single continuous malfunction, several continuous malfunctions, and intermittent malfunctions. When the test routine finds an error it can transfer program control to an appropriate malfunction isolation subroutine. This type of diagnostic technique is standard and has been well used by the computer industry for large package replacement." (Jacoby, 1959, p. 7–1).

"Needless to say, in order for any malfunction to be isolated by an automatic program, it is necessary for a minimum amount of equipment to function adequately. One of the functions of this minimum equipment includes the ability to sequence from one instruction to another, and to be able to interpret (correctly) and execute at least one transfer of control instruction so that logical choices can be made. The control functions of a computer can be defined as Boolean algebraic expressions of the instantaneous state of the computer. If we state that a line, or path, common to two control statements contains those components that are activated when either of the statements is true, this line is either a factor of both statements or a factor of terms of both statements. Similarly, if we consider circuit elements activated by one but not both of two ways to accomplish the same control function, we have a picture of two terms in the algebraic statement for the control function separated by the connector OR.

"A Boolean term will appear as a circuit which must be active for any statement, of which it is a factor, to be true. Hence the location of circuit malfunctions may be considered from the point of view of isolating the minimal Boolean term involved." (Jacoby, 1959, p. 7–1).

7.15 "A program testing method based on the monitoring of object-program instruction addresses (as opposed to a method dependent on, e.g., the occurrence of particular types of instruction, or the use of particular data addresses) would appear to be the most suitable, because the instruction address is the basic variable of this monitoring technique. Monitoring could be made 'selective' by specifying instruction addresses at which it is to start and stop: to start it at an arbitrary instruction address it is only necessary to replace the instruction located there by the first unconditional interrupt inserted, and similarly when monitoring is to stop and restart later.

"Another use in this field would be to include in the Monitor facilities for simulating any instruction, and to supply it with details of particular instructions suspected of malfunctioning. The Monitor could then stop any program just before one of these instructions was to be obeyed, simulate it, allow the program to execute the same instruction in the normal way, and then compare the results obtained by the normal action and by simulation.” (Wetherfield, 1966, p. 165).

"Of course, having achieved the aim of being able to trace in advance the exact course of the object program's instructions, the Monitor is then able to simulate their actions to any desired degree, and it is here that the power of the technique can be exploited. The contents of the store, registers, etc., before the execution of any instruction can be inspected by the Monitor if it temporarily replaces that instruction by an unconditional interrupt." (Wetherfield, 1966, p. 162).

"The monitoring operation can go wrong for any of the following three reasons.

"(1) In the first case one of the planted unconditional interrupt instructions actually overwrites the instruction at which the object program is going to resume (the one at which monitoring started). This would effectively bring things to a standstill since the situation will recur indefinitely. If the rules above have been followed, this situation can only arise when a branching instruction includes itself among its possible destinations, i.e., there is a potential loop stop in the object program. In order to cope with this situation, if it could occur, it may be necessary for the Monitor to simulate the action of the branch instruction completely and make the object program bypass it. The loop stop might still occur, but it would be foreseen.

"(2) The second possible reason for a failure of the monitoring operation occurs if one of the planted instructions overwrites part of the data of the object program, thus affecting the latter's behaviour. This 'data' might be a genuine instruction which is examined, as well as obeyed, by the object program. Alternatively it might be genuine data which happens to be stored in a position which is, by accident or design, a 'redundant' destination of a branching instruction. Both of these dangers can be anticipated by the Monitor, at the cost of a more

376-411 O-70-9

« Previous Continue »

Books

Research and Development in the Computer and Information Sciences: Overall ...