Page images

involves recognition of drug names that have been typed in, more or less phonetically, by doctors or nurses; in the longer view this one aspect of a large effort that must be expended to free the manmachine interface from the need for letter-perfect information representation by the man. People just don't work that way, and systems must be developed that can tolerate normal human imprecision without disaster." (Mills, 1967, p. 243).


2.41 "... The object of the study is to determine if we can replace garbled characters in names. The basic plan was to develop the empirical frequency of occurrence of sets of characteres in names and use these statistics to replace a missing character." (Carlson, 1966, p. 189).

"The specific effect on error reduction is impressive. If a scanner gives a 5% character error rate, the trigram replacement technique can correct approximately 95% of these errors. The remaining error is thus . . . 0.25% overall.

"A technique like this may, indeed, reduce the cost of verifying the mass of data input coming from scanners . . [and] reduce the cost of verifying massive data conversion coming from conventional data input devices like keyboards, remote terminals, etc." (Carlson, 1966, p. 191.)

2.42 "The rules established for coding structures are integrated in the program so that the computer is able to take a fairly sophisticated look at the chemist's coding and the keypunch operator's work. It will not allow any atom to have too many or too few bonds, nor is a '7' bond code permissible with atoms for which ionic bonds are not 'legal'. Improper atom and bond codes and misplaced characters are recognized by the computer, as are various other types of errors." (Waldo and DeBacker, 1959, p. 720).


2.43 "Extensive automatic verification of the file data was achieved by a variety of techniques. As an example, extracts were made of principal lines plus the sequence number of the record: specifically, all corporate name lines were tracted and sorted; any variations on a given name were altered to conform to the standard. Similarly, all law firm citations were checked against each other. All city-and-state fields are uniform. A zipcode-and-place-name abstract was made, with the resultant file being sorted by zip code: errors were easy to sort and correct, as with Des Moines appearing in the Philadelphia listing." (North, 1968, p. 110).


Then there is the even more sophisticated case where "... An important input characteristic is that the data is not entirely developed for processing or retrieval purposes. It is thus necessary first to standardize and develop the data before manipulating it. Thus, to mention one descriptor, 'location', the desired machine input might be 'coordinate', 'city', and 'state', if a city is mentioned; and 'state' alone when no city is noted. However, inputs to the system might contain a coordinate and city without mention of a state.

It is therefore necessary to develop the data and standardize before further processing commences.

"It is then possible to process the data against the existing file information . . . The objective of the processing is to categorize the information with respect to all other information within the files . . . To categorize the information, a substantial amount of retrieval and association of data is often required ... Many [data] contradictions are resolvable by the system." (Gurk and Minker, 1961, pp. 263–264).

2.44 "A number of new developments are based on the need for serving clustered environments. A cluster is defined as a geographic area of about three miles in diameter. The basic concept is that within a cluster of stations and computers, it is possible to provide communication capabilities at low cost. Further, it is possible to provide communication paths between clusters, as well as inputs to and outputs from other arrangements as optional features, and still maintain economies within each cluster. This leads to a very adaptable system. It is expected to find wide application on university campuses, in hospitals, within industrial complexes, etc." (Simms, 1968, p. 23).

2.45 "Among the key findings are the following: ● Relative cost-effectiveness between timesharing and batch processing is very sensitive to and varies widely with the precise manmachine conditions under which experimental comparisons are made.

Time-sharing shows a tendency toward fewer man-hours and more computer time for experimental tasks than batch processing.

The controversy is showing signs of narrowing down to a competition between conversationally interactive time-sharing versus fast-turnaround batch systems.

• Individual differences in user performance are generally much larger and are probably more economically important than time-sharing/ batch-processing system differences.

• Users consistently and increasingly prefer interactive time-sharing or fast turnaround batch over conventional batch systems. Very little is known about individual performance differences, user learning, and human decision-making, the key elements underlying the general behavioral dynamics of man-computer communication.

Virtually no normative data are available on data-processing problems and tasks, nor on empirical use of computer languages and system support facilities - the kind of data necessary to permit representative sampling of problems, facilities and subjects for crucial experiments that warrant generalizable results." (Sackman, 1968, p. 350).

However, on at least some occasions, some clients of a multiple-access, time-shared system may be satisfied with, or actually prefer, operation in a

batch or job-shop mode to extensive use of the conversational mode.

"Critics (see Patrick 1963, Emerson 1965, and MacDonald 1965) claim that the efficiency of timesharing systems is questionable when compared to modern closed-shop methods, or with economical small computers." (Sackman et al., 1968, p. 4).

Schatzoff et al. (1967) report on experimental comparisons of time-sharing operations (specifically, MIT's CTSS system) with batch processing as employed on IBM's IBSYS system.

"... One must consider the total spectrum of tasks to which a system will be applied, and their relative importance to the total computing load." (Orchard-Hays, 1965, p. 239).

... A major factor to be considered in the design of an operating system is the expected job mix." (Morris et al., 1967, p. 74).

"In practice, a multiple system may contain both types of operation: a group of processors fed from a single queue, and many queues differentiated by the type of request being serviced by the attached processor group . ." (Scherr, 1965, p. 17).

2.46 "Normalization is a necessary preface to the merge or integration of our data. By merge, or integration, as I use the term here to represent the last stage in our processes, I am referring to a complex interfiling of segments of our data-the entries. In this 'interfiling,' we produce, for each article or book in our file, an entry which is a composite of information from our various sources. If one of our sources omits the name of the publisher of a book, but another gives it, the final entry will contain the publisher's name. If one source gives the volume of a journal in which an article appears, but not the month, and another gives the month, but not the volume, our final entry will contain both volume and month. And so on." (Sawin, 1965, p. 95).

"Normalize. Each individual printed source, which has been copied letter by letter, has features of typographical format and style, some of which are of no significance, others of which are the means by which a person consulting the work distinguishes the several 'elements' of the item. The family of programs for normalizing the several files of data will insert appropriate information separators to distinguish and identify the elements of each item and rearrange it according to a selected canonical style, which for the Pilot Study is one which conforms generally to that of the Modern Language Association." (Crosby, 1965, p. 43).

2.47 "Some degree of standardized processing and communication is at the heart of any information system, whether the system is the basis for mounting a major military effort, retrieving documents from a central library, updating the clerical and accounting records in a bank, assigning airline reservations, or maintaining a logistic inventory. There are two reasons for this. First, all information systems are formal schemes for handling the informational aspects of a formally specified venture.

Second, the job to be done always lies embedded within some formal organizational structure." (Bennett, 1964, p. 98).

"Formal organizing protocol exists relatively independently of an organization's purposes, origins, or methods. These established operating procedures of an organization impose constraints upon the available range of alternatives for individual behavior. In addition to such constraints upon the degrees of freedon within an organization as restrictions upon mode of dress, conduct, range of mobility, and style of performance, there are protocol constraints upon the format, mode, pattern, and sequence of information processing and information flow. It is this orderly constraint upon information processing and information flow that we call, for simplicity, the information system of an organization. The term 'system' implies little more than procedural restriction and orderliness. By 'information processing' we mean some actual change in the nature of data or documents. By 'information flow' we indicate a similar change in the location of these data or documents. Thus we may define an information system as simply that set of constraining specifications for the collection, storage, reduction, alteration, transfer, and display of organizational facts, opinions, and associated documentation which is established in order to manage, command if you will, and control the ultimate performance of an organization.

[ocr errors]

"With this in mind, it is possible to recognize the dangers associated with prematurely standardizing the information-processing tools, the forms, the data codes, the message layouts, the procedures for message sequencing, the file structures, the calculations, and especially the data-summary forms essential for automation. Standardization of these details of a system is relatively simple and can be accomplished by almost anyone familiar with the design of automatic procedures. However, if the precise nature of the job and its organizational implications are not understood in detail, it is not possible to know the exact influence that these standards will have on the performance of the system." (Bennett, 1964, pp. 99, 103).

2.48 "There is a need for design verification. That is, it is necessary to have some method for ensuring that the design is under control and that the nature of the resulting system can be predicted before the end of the design process. In commandand-control systems, the design cycle lasts from two to five years, the design evolving from a simple idea into complex organizations of hardware, software, computer programs, displays, human operations, training, and so forth. At all times during this cycle the design controller must be able to specify the status of the design, the impact that changes in the design will have on the command, and the probability that certain components of the system will work. Design verification is the process that gives the designer this control. The methods that

make up the design-verification process range from analysis and simulation on paper to full-scale system testing." (Jacobs, 1964, p. 44).

2.49 "Measurement of the system was a major area which was not initially recognized. It was necessary to develop the tools to gather data and introduce program changes to generate counts and parameters of importance. Future systems designers should give this area more attention in the design phase to permit more efficient data collection." (Evans, 1967, p. 83.)

2.50 "[The user] is given several control statistics which tell him the amount of dispersion in each category, the amount of overlap of each category with every other category, and the discriminating power of the variables. . . These statistics are based on the sample of documents that he assigns to each category . . . Various users of an identical set of documents can thus derive their own structure of subjects from their individual points of view." (Williams, 1965, p. 219).

2.51 "We will probably see a trend toward the concept of a computer as a collection of memories, buses and processors with distributed control of their assignments on a dynamic basis." (Clippinger, 1965, p. 209).

"Both Dr. Gilbert C. McCann of Cal. Tech and Dr. Edward E. David, Jr., of Bell Telephone Laboratories stressed the need for hierarchies of computers interconnected in large systems to perform the many tasks of a time-sharing system." (Commun. ACM 9, 645 (Aug. 1966).)

2.52 "Every part of the system should consist of a pool of functionally identical units (memories, processors and so on) that can operate independently and can be used interchangeably or simultaneously at all times . .

"Moreover, the availability of duplicate units would simplify the problem of queuing and the allocation of time and space to users." (Fano and Corbató, 1966, pp. 134-135).

"Time-sharing demands high system reliability and maintainability, encourages redundant, modular, system design, and emphasizes high-volume storage (both core and auxiliary) with highly parallel system operation." (Gallenson and Weissman, 1965, p. 14).

"A properly organized multiple processor system provides great reliability (and the prospect of continuous operation) since a processor may be trivially added to or removed from the system. A processor undergoing repair or preventive maintenance merely lowers the capacity of the system, rather than rendering the system useless." (Saltzer, 1966, p. 2).

"Greater modularity of the systems will mean easier, quicker diagnosis and replacement of faulty parts." (Pyke, 1967, p. 162).

"To meet the requirements of flexibility of capacity and of reliability, the most natural form . . . is as a modular multiprocessor system arranged so that processors, memory modules and file storage

units may be added, removed or replaced in accordance with changing requirements." (Dennis and Van Horn, 1965, p. 4). See also notes 5.83, 5.84.

2.53 "The actual execution of data movement commands should be asynchronous with the main processing operation. It should be an excellent use of parallel processing capability." (Opler, 1965, p. 276).

2.54 "Work currently in progress [at Western Data Processing Center, UCLA] includes: investigations of intra-job parallel processing which will attempt to produce quantititative evaluations of component utilization; the increase in complexity of the task of programming; and the feasibility of compilers which perform the analysis necessary to convert sequential programs into parallel-path programs." (Dig. Computer Newsletter 16, No. 4, 21 (1964).)

2.55 "The motivation for encouraging the use of parallelism in a computation is not so much to make a particular computation run more efficiently as it is to relax constraints on the order in which parts of a computation are carried out. A multi-program scheduling algorithm should then be able to take advantage of this extra freedom to allocate system resources with greater efficiency." (Dennis and Van Horn, 1965, pp. 19–20).

2.56 Amdahl remarks that "the principal motivations for multiplicity of components functioning in an on-line system are to provide increased capacity or increased availability or both." (1965, p. 38). He notes further that "by pooling, the number of components provided need not be large enough to accommodate peak requirements occurring concurrently in each computer, but may instead accommodate a peak in one occurring at the same time as an average requirement in the other." (Amdahl, 1965, pp. 38-39).

2.57 "No large system is a static entity-it must be capable of expansion of capacity and alteration of function to meet new and unforeseen requirements." (Dennis and Glaser, 1965, p. 5).

"Changing objectives, increased demands for use, added functions, improved algorithms and new technologies all call for flexible evolution of the system, both as a configuration of equipment and as a collection of programs." (Dennis and Van Horn, 1965, p. 4).

"A design problem of a slightly different character, but one that deserves considerable emphasis, is the development of a system that is 'open-ended'; i.e., one that is capable of expansion to handle new plants or offices, higher volumes of traffic, new applications, and other difficult-to-foresee developments associated with the growth of the business. The design and implementation of a data communications system is a major investment; proper planning at design time to provide for future growth will safeguard this investment." (Reagan, 1966, p. 24).

2.58 "Reconfiguration is used for two prime purposes: to remove a unit from the system for

service or because of malfunction, or to reconfigure the system either because of the malfunction of one of the units or to 'partition' the system so as to have two or more independent systems. In this last case, partitioning would be used either to debug a new system supervisor or perhaps to aid in the diagnostic analysis of a hardware malfunction where more than a single system component were needed." (Glaser et al., 1965, p. 202.)

"Often, failure of a portion of the system to provide services can entail serious consequences to the system users. Thus severe reliability standards are placed on the system hardware. Many of these systems must be capable of providing service to a range in the number of users and must be able to grow as the system finds more users. Thus, one finds the need for modularity to meet these demands. Finally, as these systems are used, they must be capable of change so that they can be adapted to the ever changing and wide variety of requirements, problems, formats, codes and other characteristics of their users. As a result general-purpose stored program computers should be used wherever possible." (Cohler and Rubenstein, 1964, p. 175).

2.59 "On-line systems are still in their early development stage, but now that systems are beginning to work, I think that it is obvious that more attention should be paid to the fail safe aspects of the problem." (Huskey, 1965, p. 141).

"From our experience we have concluded that system reliability . . . must provide for several levels of failure leading to the term 'fail-soft' rather than 'fail-safe"." (Baruch, 1967, p. 147).

Related terms are "graceful degradation" and "high availability", as follows:

"The military is becoming increasingly interested in multiprocessors organized to exhibit the property of graceful degradation. This means that when one of them fails, the others can recognize this and pick up the work load of the one that failed, continuing this process until all of them have failed." (Clippinger, 1965, p. 210).

"The term 'high availability' (like its synonym 'fail safe') has now become a cliche, and lacks any precise meaning. It connotes a system characteristic which permits recovery from all hardware errors. Specifically, it appears to promise that critical system and user data will not be destroyed, that system and job restarts will be minimized and that critical jobs can most surely be executed, despite failing hardware. If this is so, then multiprocessing per se aids in only one of the three characteristics of high availability." (Witt, 1968, p. 699). "The structure of a multi-computer system planned for high availability is principally determined by the permissible reconfiguration time and the ability to fail safely or softly. The multiplicity and modularity of system components should be chosen to provide the most economical realization of these requirements. . .

"A multi-computer system which can perform the full set of tasks in the presence of a single mal

function is fail-safe. Such a system requires at least one more unit of each type of system component, with the interconnection circuitry to permit it to replace any of its type in any configuration

"A multi-computer system which can perform a satisfactory subset of its tasks in the presence of a malfunction is fail-soft. The set of tasks which must still be performed to provide a satisfactory through degraded level of operation, determines the minimum number of each component required after a failure of one of its type." (Amdahl, 1965, p. 39).

"Systems are designed to provide either full service or graceful degradation in the face of failures that would normally cause operations to cease. A standby computer, extra mass storage devices, auxiliary power sources to protect against public utility failure, and extra peripherals and communication lines are sometimes used. Manual or automatic switching of spare peripherals between processors may also be provided." (Bonn, 1966, p. 1865).

2.60 "A third main feature of the communication system being described is high reliability. The emphasis here is not just on dependable hardware but on techniques to preserve the integrity of the data as it moves from entry device, through the temporary storage and data modes, over the transmission lines and eventually to computer tape or hard copy printer." (Hickey, 1966, p. 181.)

2.61 In addition to the examples cited in the discussion of client and system protection in the previous report in this series (on processing, storage, and output requirements, Section 2.2.4), we note the following:

"The primary objective of an evolving specialpurpose time-sharing system is to provide a real service for people who are generally not computer programmers and furthermore depend on the system to perform their duties. Therefore the biggest operational problem is reliability. Because the data attached to special-purpose system are important and also must be maintained for a long time, reliability is doubly crucial, since errors affecting the data base cannot only interrupt users' current procedures but also jeopardize past work." (Castleman, 1967, p. 17).

"If the system is designed to handle both specialpurpose functions and programming development, then why is reliability a problem? It is a problem because in a real operating environment some new 'dangerous' programs cannot be tested on the system at the same time that service is in effect. As a result, new software must be checked out during offhours, with two consequences. First, the system is not subjected to its usual daytime load during checkout time. It is a characteristic of time-shared programs that different 'bugs' may appear depending on the conditions of the overall system activity. For example, the 'time-sharing bug' of a program manipulating data incorrectly because another program processes the same data at virtually the same

time would be unlikely on a lightly loaded system. Second, programmers must simulate at night their counterparts of laymen users. Unfortunately, these two types of people tend to use application programs differently and to make different types of errors; so program debugging is again limited. Therefore, because the same system is used for both service and development, programs checked as rigorously as possible can still cause system failures when they are installed during actual service hours." (Castleman, 1967, p. 17).

"Protection of a disk system requires that no user be able to modify the system, purposely or inadvertently, thus preserving the integrity of the software. Also, a user must not be able to gain access to, or modify any other user's program or data. Protection in tape systems is accomplished: (1) by making the tape units holding the system records inaccessible to the user, (2) by making the input and output streams one-way (e.g., the input file cannot be backspaced), and (3) by placing a mark in the input stream which only the system can cross. In order to accomplish this, rather elaborate schemes have been devised both in hardware and software to prevent the user from accomplishing certain input-output manipulations. For example, in some hardware, unauthorized attempts at I/O manipulation will interrupt the computer.

"In disk-based systems, comparable protection devices must be employed. Since many different kinds of records (e.g., system input, user scratch area, translators, etc.) can exist in the same physical disk file, integrity protection requires that certain tracks, and not tape units, must be removed from the realm of user access and control. This is usually accomplished by partitioning schemes and central I/O software systems similar to those used in tapebased systems. The designer must be careful to preserve flexibility while guaranteeing protection." (Rosin, 1966, p. 242).

2.62 "Duplex computers are specified with the spare and active computers sharing I/O devices and key data in storage, so that the spare computer can take over the job on demand." (Aron, 1967, p. 54).

"The second channel operates in parallel with the main channel, and the results of the two channels are compared. Both channels must independently arrive at the same answer or operation cannot proceed. The duplication philosophy provides for two independent access arms on the Disk Storage Unit, two core buffers, and redundant power supplies." (Bowers et al., 1962, p. 109).

"Considerable effort has been continuously directed toward practical use of massive triple modular redundancy (TMR) in which logic signals are handled in three identical channels and faults are masked by vote-taking elements distributed throughout the system." (Avižienis, 1967, p. 735).

"He must give consideration to 1) back-up power supplies that include the communications gear, 2) dual or split communication cables into his data.

center, 3) protection of the center and its gear from fire and other hazards, 4) insist that separate facilities via separate routes and used to connect locations on the MIS network, and 5) build extra capacity into the MIS hardware system." (Dantine, 1966, p. 409).

"It is far better to have the system running at half speed 5% of the time with no 100% failures than to have the system down 22% of the time." (Dantine, 1966, p. 409).

"Whenever possible, the two systems run in parallel under the supervision of the automatic recovery program. The operational system performs all required functions and monitors the back-up system. The back-up system constantly repeats a series of diagnostic tests on the computer, memory and other modules available to it and monitors the operational system. These tests are designed to maintain a high level of confidence in these modules so that should a respective counterpart in the operational system fail, the back-up unit can be safely substituted. The back-up system also has the capability of receiving instructions to perform tests on any of its elements and to execute these tests while continuing to monitor the operational system to confirm that the operational system has not hung up." (Armstrong et al., 1967, p. 409).

2.63 "The large number of papers on votetaking redundancy can be traced back to the fundamental paper of Von Neuman where multipleline redundancy was first established as a mathematical reality for the provision of arbitrarily reliable systems." (Short, 1968, p. 4).

2.64 "A computer system contains protective redundancy if faults can be tolerated because of the use of additional components or programs, or the use of more time for the computational tasks.

"In the massive (masking) redundancy approach the effect of a faulty component, circuit, signal, subsystem, or system is masked instantaneously by permanently connected and concurrently operating replicas of the faulty element. The level at which replication occurs ranges from individual circuit components to entire self-contained systems." (Avižienis, 1967, p. 733–734).

2.65 "An increase in the reliability of systems is frequently obtained in the conventional manner by replicating the important parts several (usually three) times, and a majority vote. . . A technique of diagnosis performed by nonbinary matrices. . require, for the same effect, only one duplicated part. This effect is achieved by connecting the described circuit in a periodically changing way to the duplicated part. If one part is disturbed the circuit gives an alarm, localizes the failure and simultaneously switches to the remaining part, so that a fast repair under operating conditions (and without additional measuring instruments) is possible." (Steinbuch and Piske, 1963, p. 859).

« PreviousContinue »