Research and Development in the Computer and Information Sciences: Overall system design considerations; a selective literature review

time would be unlikely on a lightly loaded system. Second, programmers must simulate at night their counterparts of laymen users. Unfortunately, these two types of people tend to use application programs differently and to make different types of errors; so program debugging is again limited. Therefore, because the same system is used for both service and development, programs checked as rigorously as possible can still cause system failures when they are installed during actual service hours." (Castleman, 1967, p. 17).

"Protection of a disk system requires that no user be able to modify the system, purposely or inadvertently, thus preserving the integrity of the software. Also, a user must not be able to gain access to, or modify any other user's program or data. Protection in tape systems is accomplished: (1) by making the tape units holding the system records inaccessible to the user, (2) by making the input and output streams one-way (e.g., the input file cannot be backspaced), and (3) by placing a mark in the input stream which only the system can cross. In order to accomplish this, rather elaborate schemes have been devised both in hardware and software to prevent the user from accomplishing certain input-output manipulations. For example, in some hardware, unauthorized attempts at I/O manipulation will interrupt the computer.

"In disk-based systems, comparable protection devices must be employed. Since many different kinds of records (e.g., system input, user scratch area, translators, etc.) can exist in the same physical disk file, integrity protection requires that certain tracks, and not tape units, must be removed from the realm of user access and control. This is usually accomplished by partitioning schemes and central I/O software systems similar to those used in tapebased systems. The designer must be careful to preserve flexibility while guaranteeing protection." (Rosin, 1966, p. 242).

2.62 "Duplex computers are specified with the spare and active computers sharing I/O devices and key data in storage, so that the spare computer can take over the job on demand." (Aron, 1967, p. 54).

"The second channel operates in parallel with the main channel, and the results of the two channels are compared. Both channels must independently arrive at the same answer or operation cannot proceed. The duplication philosophy provides for two independent access arms on the Disk Storage Unit, two core buffers, and redundant power supplies." (Bowers et al., 1962, p. 109).

"Considerable effort has been continuously directed toward practical use of massive triple modular redundancy (TMR) in which logic signals are handled in three identical channels and faults are masked by vote-taking elements distributed throughout the system." (Avižienis, 1967, p. 735).

"He must give consideration to 1) back-up power supplies that include the communications gear, 2) dual or split communication cables into his data

center, 3) protection of the center and its gear from fire and other hazards, 4) insist that separate facilities via separate routes and used to connect locations on the MIS network, and 5) build extra capacity into the MIS hardware system." (Dantine, 1966, p. 409).

"It is far better to have the system running at half speed 5% of the time with no 100% failures than to have the system down 22% of the time." (Dantine, 1966, p. 409).

"Whenever possible, the two systems run in parallel under the supervision of the automatic recovery program. The operational system performs all required functions and monitors the back-up system. The back-up system constantly repeats a series of diagnostic tests on the computer, memory and other modules available to it and monitors the operational system. These tests are designed to maintain a high level of confidence in these modules so that should a respective counterpart in the operational system fail, the back-up unit can be safely substituted. The back-up system also has the capability of receiving instructions to perform tests on any of its elements and to execute these tests while continuing to monitor the operational system to confirm that the operational system has not hung up." (Armstrong et al., 1967, p. 409).

2.63 "The large number of papers on votetaking redundancy can be traced back to the fundamental paper of Von Neuman where multipleline redundancy was first established as a mathematical reality for the provision of arbitrarily reliable systems." (Short, 1968, p. 4).

2.64 "A computer system contains protective redundancy if faults can be tolerated because of the use of additional components or programs, or the use of more time for the computational tasks. . .

"In the massive (masking) redundancy approach the effect of a faulty component, circuit, signal, subsystem, or system is masked instantaneously by permanently connected and concurrently operating replicas of the faulty element. The level at which replication occurs ranges from individual circuit components to entire self-contained systems." (Avižienis, 1967, p. 733–734).

2.65 "An increase in the reliability of systems is frequently obtained in the conventional manner by replicating the important parts several (usually three) times, and a majority vote. . . A technique of diagnosis performed by nonbinary matrices. . require, for the same effect, only one duplicated part. This effect is achieved by connecting the described circuit in a periodically changing way to the duplicated part. If one part is disturbed the circuit gives an alarm, localizes the failure and simultaneously switches to the remaining part, so that a fast repair under operating conditions (and without additional measuring instruments) is possible." (Steinbuch and Piske, 1963, p. 859).

"Depending upon the attainable P, and Pnd, the theoretical reliability of a multi-module computing system may be degraded by adding more than a minimal amount of redundancy. For example, P=0.025 . . . it is more reliable to have only one spare module rather than two or four, for a typical current-day Pnd such as 0.075. Even for a Pnd as low as 0.03 (a very difficult Pnd to achieve in a computer), the improvement obtained in system reliability by adding a second spare unit to the system is minor." (Wyle and Burnett, 1967, pp. 746, 748).

k=(n-m) n!

k=1

(Pƒ) * (1 − Pƒ) n−k

(n−k!)k! Pķ(1 − Pƒ) n−k[1 − (1 − Pna)*].

(Wyle and Burnett, 1967, p. 746).

2.67 "One of the system design considerations is the determination of the optimum number of redundant units by means of which the required system reliability is to be reached. It will be seen that Pnd as well as P, must be considered in determining the most economical design." (Wyle and Burnett, 1967, p. 748).

"One of the prime requisites for a reliable, dependable communications data processing system is that it employ features for insuring message protection and for knowing the disposition of every message in the system (message accountability) in case of equipment failures. The degree of message protection and accountability will vary from application to application." (Probst, 1968, p. 21).

"Elaborate measures are called for to guarantee message protection. At any given moment, a switching center may be in the middle of processing many different messages in both directions. If a malfunction occurs in any storage or processing device, there must be enough information stored elsewhere in the center to analyze the situation, and to repeat whatever steps are necessary. This means

that any item of information must be stored in at least two independent places, and that the updating of queue tables and other auxiliary data must be carefully synchronized so that operation can continue smoothly after correction of a malfunction. If it cannot be determined exactly where a transmission was interrupted, procedures should lean toward pessimism. Repetition of a part of a message is less grievous than a loss of part of it." (Shafritz, 1964, p. N2.3-3).

"Reference copies are kept on magnetic tapes for protective accountability of each message. Random requests for retransmission are met by a computer search of the tape, withdrawal of the required messages and automatic reintroduction of the message into the communications system." (Jacobellis, 1964, p. N2.1-2).

"Every evening, the complete disc file inventory is pruned and saved on tape to be reloaded the following day. This gives a 24-hour 'rollback' capability for catastrophic disc failures." (Schwartz and Weissman, 1967, p. 267).

"It is necessary to provide means whereby the contents of the disc can be reinstated after they have been damaged by system failure. The most straightforward way of doing this is for the disc to be copied on to magnetic tape once or twice a day; re-writing the disc then puts the clock back, but users at least know where they are. Unfortunately, the copying of a large disc consumes a lot of computer time, and it seems essential to develop methods whereby files are copied on to magnetic tape only when they are created or modified. It would be nice to be able to consider the archive and recovery problems as independent, but reasons of efficiency demand that an attempt should be made to develop a satisfactory common system. We have, unfortunately, little experience in this area as yet, and are still groping our way." (Wilkes, 1967, p. 7).

"Our requirements, therefore, were threefold: security, retrieval, and storage. We investigated various means by which we could meet these requirements; and we decided on the use of microfilm, for two reasons. First, photographic copies of records, including those on microfilm, are acceptable as legal representations of documents. We could photograph our notebooks, store the film in a safe place, and destroy the books or, at least, move them to a larger storage area. Second, we found on the market equipment with which we could film the books and then, with a suitable indexing system, obtain quick retrieval of information from that film" (Murrill, 1966, p. 52).

"The file system is designed with the presumption that there will be mishaps, so that an automatic file backup mechanism is provided. The backup procedures must be prepared for contingencies ranging from a dropped bit on a magnetic tape to a fire in the computer room.

"Specifically, the following contingencies are provided for:

"1. A user may discover that he has accidentally deleted a recent file and may wish to recover

it. "2. There may be a specific system mishap which causes a particular file to be no longer readable for some 'inexplicable' reason. "3. There may be a total mishap. For example, the disk-memory read heads may irreversibly score the magnetic surfaces so that all diskstored information is destroyed.

"The general backup mechanism is provided by the system rather than the individual user, for the more reliable the system becomes, the more the user is unable to justify the overhead (or bother) of trying to arrange for the unlikely contingency of a mishap. Thus an individual user needs insurance, and, in fact, this is what is provided." (Corbatò and Vyssotsky, 1965, p. 193).

"Program roll-back for corrective action must be routine or function oriented since it is impractical from a storage requirement point of view to provide corrective action for each instruction. The rollback must be to a point where initial conditions are available from sensors, prestored, or reconstitutable. Even an intermittent memory malfunction during access becomes a persistent error since it is immediately rewritten in error. Thus, critical routines or high iteration rate real-time routines (for example, those which perform integration with respect to time) should be stored redundantly so that in the event of malfunction the redundantly stored routine is used to preclude routine malfunction or error buildup with time." (Bujnoski, 1968, p. 33).

2.68 "Restart procedures should be designed into the system from the beginning, and the necessity for the system to spend time in copying vital information from one place to another should be cheerfully accepted.

"Redundant information can be included in supervisor communication or data areas in order to enable errors caused by system failure to be corrected. Even a partial application of this idea could lead to important improvements in restart capability. A system will be judged as much as by the efficiencies of its restart procedures as by the facilities that it provides.

...

"Making it possible for the system to be restarted after a failure with as little loss as possible should be the constant preoccupation of the software designer." (Wilkes and Needham, 1968, p. 320). "Procedures must also be prescribed for work with the archive collection to prevent loss or contamination of the master records by tape erasure, statistical adjustment, aggregation or reclassification." (Glaser et al., 1967, p. 19).

2.69 "Standby equipment costs should receive some consideration, particularly in a cold war situation: duplicate tapes, raw data or semi

processed data. Also consider the possible costs of transporting classified data elsewhere for computation: express, courier, messenger, Brink's service." (Bush, 1956, p. 110).

"For companies in the middle range, the commercial underground vaults offer excellent facilities at low cost. Installations of this type are available in a number of states, including New York, Pennsylvania, Kansas, Missouri and California. In addition to maximum security, they provide pre-attack clerical services and post-attack conversion facilities. The usual storage charge ranges from $2 to $5 a cubic foot annually, depending on whether community or private storage is desired. .

"The instructions should detail procedure for converting each vital record to useable form, as well as for utilizing the converted data to perform the desired emergency functions. The language should be as simple as possible and free of 'shop' terms, since inexperienced personnel will probably use the instructions in the postattack." (Butler, 1962, pp. 65, 67.)

2.70 "The trend away from supporting records is a recent development that has not yet gained widespread acceptance. There is ample evidence, however, that their use will decline rapidly, if the cold war gets uncomfortably hot. Except for isolated areas in their operations, an increasing number of companies are electing to take a calculated risk in safeguarding basic records but not the supporting changes. For example, some of the insurance companies microfilm the basic in-force policy records annually and forego the changes that occur between duplicating cycles. This is a good business risk for two reasons: (1) supporting records are impractical for most emergency operations, and (2) a maximum one-year lag in the microfilm record would not seriously hamper emergency operations." (Butler, 1962, p. 62.)

"Mass storage devices hold valuable records, and backup is needed in the event of destruction or nonreadability of a record(s). Usually the entire file is copied periodically, and a journal of transactions is kept. If necessary, the file can be reconstructed from an earlier copy plus the journal to date." (Bonn, 1966, p. 1865).

2.71 "The life and stability of the [storage] medium under environmental conditions are other considerations to which a great deal of attention must be paid. How long will the medium last? How stable will it be under heat and humidity changes?" (Becker and Hayes, 1963, p. 284).

It must be noted that, in the present state of magnetic tape technology, the average accurate life of tape records is a matter of a few months only. The active master files are typically rewritten on new tapes regularly, as a part of normal updating and maintenance procedures. Special precautions must be undertaken, however, to assure the same for duplicate master tapes, wherever located.

"Security should also be considered in another

sense. Paper must be protected against fire and flooding, magnetic tapes against exposure to electromagnetic fields and related hazards. No special precaution is necessary for microfilm, provided

the reels are replaced periodically in updating cycles. Long-term storage of microfilm, however, will require proper temperature and humidity control in the storage area." (Butler, 1962, p. 64.)

3. Problems of System Networking

3.1 As noted in a previous report in this series: "Information processing systems are but one facet of an evolving field of intellectual activity called communication sciences. This is a generic term which is applied to those areas of study in which the interest centers on the properties of a system or the properties of arrays of symbols which come from their organization or structure rather than from their physical properties; that is, the study of what one M.I.T. colleague calls 'the problems of organized complexity'." (Wiesner, 1958, p. 268).

The terminology apparently originated with Warren Weaver. Weaver (1948) noted first that the areas typically tackled in scientific research and development efforts up to the twentieth century were largely concerned with two-variable problems of simplicity; then from about 1900 on, powerful techniques such as those of probability theory and statistical mechanics were developed to deal with problems of disorganized complexity (that is, those in which the number of variables is very large, the individual behavior of each of the many variables is erratic or unknown, but the system as a whole has analyzable average properties). Finally, he points to an intermediate region "which science has as yet little explored or conquered" (1948, p. 539), where by contrast to those disorganized or random situations with which the statistical techniques can cope, the problems of organized complexity require dealing simultaneously with a considerable number of variables that are interrelated in accordance with organizational factors.

3.2 "Organizational generality is an attribute of underrated importance. The correct functioning of on-line systems imposes requirements that have been met ad hoc by current designs. Future system designs must acknowledge the basic nature of the problems and provide general approaches to their resolution." (Dennis and Glaser, 1965, p. 5).

"Diversity of needs and divisibility of computer resources demand a much more sophisticated multiplexing strategy than the simple communication case where all users are treated alike." (David, 1966, p. 40).

"As we turn toward stage three, the stage characterized by the netting of geographically distributed computers, we find ourselves with a significant base of experience with special-purpose computer networks, but with essentially no experience with general-purpose computer networks of the kind that will come into being when multiple-access systems

such as those at M.I.T. and the Systems Development Corporation are linked by digital transmission channels. The difficulties involved in establishing computer networks appear not to be difficulties of hardware design or even of hardware availability. Rather, they appear to be difficulties of social and software organization, of conventions, formats, and standards, of programming, interaction, and communication languages. It is a situation in which there now exist all the component hardware facilities that can be said to be required, yet in which there do not now exist any general-purpose networks capable of supporting stage-three interaction." (Licklider, 1967, pp. 5-6).

"The state of affairs at the end of 1966 can be summarized as follows. Multiaccess-system application techniques and user-oriented subsystems have been developed to only a relatively primitive level. Far too many people, particularly those with a scientific-application bent, still hold the shortsighted view that the real value of large, multiaccess systems lies in their ability to simultaneously present a powerful 'private computer' to each of several tens or hundreds of programmers. Nearly all of the systems actually in operation are used in this way. Application-oriented systems that free the user of most, or all, of his concern with details of computer-system characteristics and conventional programming are coming much more slowly. Their development has been inhibited by, among other things, the temporary plateau that has been reached in basic multiaccess system technology." (Mills, 1967, p. 247).

3.3 "The analytical tools are simply not available . . ." (Baran, 1964, p. 27).

"The essence of rational benefit-cost analysis is the tracing of indirect as well as direct effects of programs and the evaluation and summing of these effects. Typically, the methodology for tracing all but the most obvious linkages is entirely lacking or fails to use the relevant information." (Glaser et al., 1967, p. 15).

"The problem associated with providing the interconnection of a network of processors is a major one." (Estrin and Kleinrock, 1967, p. 92).

"Solving the data base management problems of a distributed network has been beyond the state of the art." (Dennis, 1968, p. 373).

"Although techniques for multiplex communications are well developed, we are only beginning to learn how to multiplex computers." (David, 1966, p. 40).

"The formalism of hardward/software system

management is just beginning to take shape. SAGE is a landmark because it worked in spite of its immense size and complexity." (Aron, 1967, p. 50). "The system engineer presently lacks sufficient tools to efficiently design, modify or evaluate complex information systems." (Blunt, 1965, p. 69).

3.4 "The design and analysis problems associated with large communications networks are frequently not solvable by analytic means and it is therefore necessary to turn to simulation techniques. Even with networks which are not particularly large the computational difficulties encountered when other than very restrictive and simple models are to be considered preclude analysis. It has become clear that the study of network characteristics and traffic handling procedures must progress beyond the half-dozen switching center problem to consider networks of dozens of nodes with hundreds or even thousands of trunks so that those features unique to these large networks can be determined and used in the design of communications systems. Here it is evident that simulation is the major study tool." (Weber and Gimpelson, 1964, p. 233).

"The time and costs involved make it almost mandatory to 'prove' the 'workability' and feasibility of the potential solutions via pilot systems or by implementation in organizations or associations. which have some of the characteristics of the national system and which would therefore serve as a model or microcosm of the National Macrocosm.' (Ebersole, 1966, p. 34).

"A report by Churchill et al., specifically recognizes the need for theoretical research in order to build an adequate foundation on which to base systems analysis procedures. They point out that recent computer developments and particularly large computer systems have increased the need for research and the body of data that research can provide in such areas as data coding and file organization." (Borko, 1967, p. 37).

3.5 "The coming importance of networks of computers creates another source of applications for... multiple-queue disciplines. Computer network disciplines will also have to be dependent on transmission delays of service requests and jobs or parts of jobs from one computer to another as well as on the possible incompatibilities of various types between different computers. The synthesis and analysis of multiprocessor and multiple processor network priority disciplines remains fertile area of research whose development awaits broad multiprocessor application and an enlightening experience with the characteristics of these disciplines." (Coffman and Kleinrock, 1968, p. 20).

"We still are plagued by our inability to program for simultaneous action, even for the scheduling of large units in a computing system." (Gorn, 1966, p. 232).

"As computer time-sharing systems have evolved from a research activity to an operational activity, and have increased in size and complexity, it has become clear that significant problems occur in

controlling the use of such systems. These problems have evidenced themselves in computer scheduling, program capability constraints, and the allotment of auxiliary storage." (Linde and Chaney, 1966, p. 149).

"A network has to consider with great care the many possibilities of user access which approach more and more the vast possibilities and intricacies of direct human communication.” (Cain and Pizer, 1967, p. 262).

"Increased attention needs to be placed on the problem of techniques for scheduling the many users with their different priorities." (Bauer, 1965, p. 23).

3.6 "Much of the design effort in a messageswitching type communications system goes into the network which links the terminals and nodal points together. The distribution of terminals can be shown, the current message density is known, and programs exist to help lay out the network. With most interactive systems this is not the case." (Stephenson, 1968, p. 56).

3.7 "In looking toward computer-based, computer-linked library systems, that have been proposed as a national technical information network, studies of perceived needs among users are likely to be of very little use. Instead it would seem to be more appropriate to initiate smallscale experiments designed to produce, on a limited basis, the effects of a larger-scale system in order to determine whether such experiments produce the expected benefits." (Schon, 1965, p. 34).

3.8 "Transmitting data collection systems can assume a wide variety of equipment configurations, ranging from a single input unit with cable-connected recorded to a farflung network with multiple input units transmitting data to multiple recorders or computers by means of both common-carrier facilities and direct cable connections. Probably the most important parameter in planning the equipment configuration of a system is the maximum number of input stations that can be connected to a single central recording unit." (Hillegass and Melick, 1967, pp. 50-51).

3.9 Licklider stresses the importance of "coherence through networking" and emphasizes: “On the average, each of n cooperative users can draw n-1 programs from the files for each one he puts into the public files. That fact becomes so obviously significant as n increases that I can conclude by concluding that the most important factors in software economics are n, the number of netted users, and c, the coefficient of contributive cooperativeness that measures the value to his colleagues of each user's creative effort." (Licklider, 1967, p. 13).

"The circumstances which appear to call for the establishment of physical networks (as opposed to logical networks) are generally:

"1. The existence of special data banks or special collections of information located at a

« Previous Continue »

Books

Research and Development in the Computer and Information Sciences: Overall ...