Research and Development in the Computer and Information Sciences: Overall system design considerations; a selective literature review

[merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

systems also become more practical and efficient because of new possibilities for automatic control of necessary interactions.

A major area of continuing R & D concern with respect to both requirements and resources analysis is that of the development of more adequate methodologies.2.3 Nevertheless, the new business on the agenda of the national information scene - that is, the challenge of system networking-offers new possibilities for a meshing of system design criteria that have to do with where and how the system is to be operated and with where and how it is to be used.

2.1. Requirements Analysis

Requirements analysis, as an operational sine qua non of system design, begins of course with suitable assessment of present and potential user needs. Elsewhere in this series of reports, some embarrassingly critical commentaries with respect to actual or prospective usage are selectively covered.2.4 Assuming, however, that there are definitive needs of some specifiable clientele for processing system services that can be identified, we must first attempt some quantifiable measures of what, who, when, where, and why, the information-processing-systemservice requests are to be honored.2.5 In particular, improved techniques of analysis with respect to clientele requirements, information control requirements, and output and cost/benefit considerations are generally desired.

2.1.1. Clientele Requirements

It is noted first that "lack of communication between the client, that is, the man who will use the system, and the system designer is the first aspect of the brainware problem." (Clapp, 1967, p. 3). Considering the potential clients as individual users of an information processing system or service, the following are among the determinations that need to be made: 2.6

1. Who are the potential users?

2. Where are they located? 2.7

3. If there are many potential users, user groups, and user communities, how do needs for information and for processing services differ among them? 2.8

4. What are the likely patterns and frequencies of usage for different types of potential clients? 5. To what extent are potential clients both motivated and trained to use the type of facilities and services proposed? 2.9

However obvious these and other requirements analysis considerations may be, a present cause of critical concern is the general lack of experimental evidence on user reaction, user behavior, and user effectiveness.2.10

2.1.2. Information Control Requirements Detailed consideration and decision-making with respect to controls over the quality and the quantity

of information input, flow, processing, storage, retrieval, and output are essential to effective system design. Davis in a late 1967 lecture discussed many of the multifacetted problems involved in information control-in both system planning and system use. The varied aspects range from questions of information redundancy in information items to be processed and stored to those of error detection and correction with respect to an individual item record as received, processed, stored, and/or retrieved.

Among these information control requirements are: input and storage filtering and compression; quality control in the sense of the accuracy and reliability of the information to be processed in the system; questions of file integrity and the deliberate introduction of redundancy; problems of formatting, normalization, and standardization, and error detection and error correction techniques.

More particularly, Davis (1967) is concerned with problems of information control in a system with the following characteristics:

"1. It has several groups of users of differing administrative levels.

"2. The information within the system has imposed upon it varying privacy, security and/or confidentiality constraints.

"3. The information entering the system is of varying quality with respect to its substantive content; that is, it may be raw or unevaluated, it may have been subjected to a number of evaluation criteria or it may be invariant (grossly so) as standard reference data. "4. The user audience is both local and remote. "5. Individual users or user groups have individual access to the information contained within the system.

"6. The information within the system is multisource information.'

(Davis, 1967, p. 1-2).

We may note first the problems of controls that will govern the total amount of information that is to be received, processed, and stored in the system. These may consist of input filtering operations 2.11 as in sampling techniques applied to remote data acquisition processes 2.12 or in checking for duplications and redundancies in the file.2.13

Other information control requirements with respect to the total amount of information in the system relate to problems of physical storage access, withdrawals and replacements of items to and from the store, maintenance problems including questions of whether or not integrity of the files must be provided (i.e., a master copy of each item accessible at all times),2.14 provisions for the periodic purging of obsolete items,2.15 revisions of the file organization in accordance with changing patterns of usage,2.16 response requirements,2.17 and requirements for display of all or part of an item and/or indications of its characteristics prior to physical retrieval.2.18

Another important area of information control is that of identification and authentication of material

entering the system, with special problems likely to be involved, for example, in the dating of reports. (Croxton, 1955). As Davis (1967) also points out, the timeliness of information contained in the system depends not only on the time of its input but also upon the date or time it was recorded or reported and the date the information itself was originally acquired, including the special case of the "elastic ruler" (Birch, 1966).2.19 Another typical problem is that of transliterations and transcriptions between items or messages recorded in many different languages.2.20

A crucial area of R & D concern is that of the accuracy, integrity, and reliability of information in the system, although these questions are all too often neglected in system design and use.2.21 Again, Davis emphasizes the importance of information content controls. These may be achieved, on input, either by error-detecting checks on quantitative data or by "correctness control through 'common sense' or logical checks." (Davis, 1967, p. 10.) 2.22 Thus, the use of reliability indicators and automatic inference capabilities may provide significant advantages in improved information handling systems in the future.2.23

One of the obvious difficulties in controlling accuracy and reliability of the information content of items in the system is that of correction and updating cycles.2.24 More commonly, however, errors affecting the accuracy and reliability of information are those of human errors in observation, recording, or transcription and those of transmission or equipment failure during communication and input. The incidence of such errors is in fact inevitable and poses a continuing challenge to the system designers which becomes increasingly severe as the systems themselves become more complex.2.25

It is to be noted, of course, that a major area of R & D concern in the communication sciences is that of information theoretic approaches to error detection, correction, and control. In terms of generalized information processing systems, however, we shall assume that advanced techniques of message encoding and decoding are available to the extent required, just as we assume adequate production quality controls in the manufacture and acceptance testing of, say, magnetic cores. Thus our concern here is with regard to the control, detection, and (where feasible), correction, of errors in information content of items in an information processing system or network, regardless of whatever protective encoding measures have been employed.

It should be recognized first of all that any formulation of an information-carrying message or record is an act of reportage, whether it is performed by man or by machine. Such reportage may itself be in error (the gunshots apparently observed during riot conditions may have been backfiring from a truck, the dial indicator of a recording instrument may be out of calibration, and the like). The recording of the observation may be in error: misreading of, say, the dial indicator, transposition of digits in

copying a numerical data display, and accidental or inadvertent misspellings of names are obvious examples.

With respect to errors introduced by transmission, examples of R & D requirements and progress were cited in the first report in this series (“Information Acquisition, Sensing, and Input", Section 3.4). Two further examples to be noted here include the discussion by Hickey (1966) of techniques designed to handle burst-type errors 2.26 and a report by Menkhaus (1967) on recent developments at the Bell Telephone Laboratories.2.27 For checking recording and/or transmission errors, a variety of error detection devices (such as input interlocks,2.28 parity information,2.29 check digits,2.30 hash totals,2.31 format controls and message lengths 2.32) have been widely used.2.33

Problems introduced by alphanumeric digit transpositions or simple misspellings can often be attacked and solved by computer routines, provided that there is some sort of master authority list, or file, or the equivalent of this in terms of prior conditional matching.2.34 For example, Alberga (1967) discusses the comparative efficiency of various methods of detecting errors in character strings.

The use of contextual information for error detection and possible correction in the case of automatic character recognition processes has been noted in a previous report in this series, that on information acquisition, sensing, and input. This is, of course, a special case of misspelling.2.35 Some of the pertinent literature references include Edwards and Chambers (1964), Thomas and Kassler (1967) and Vossler and Branston (1964). The latter investigators, in particular, suggest the use of lookup dictionaries specialized as to subject field and analysis of part-of-speech transitions.2.36

Context analysis is important, first, because for the human such capabilities enable him to predict (and therefore skim over or filter out) message redundancies and to decide, in the presence of uncertainties between alternative message readings, the most probably correct message contents when noise, errors, or omissions occur in the actual transmission of the message.2.37

Context analysis also provides means for automatic error detection and error correction in the input of text at the character level, the word level, and the level of the document itself such as the detection of changes in terminology or the emergence of new content in a given subject field. For example, "various levels of context can be suggested, ranging from that of the characters surrounding the one in question to the more nebulous concept of the subject class of the document being read." (Thomas and Kassler, 1963, p. 5). In automatic character recognition, in particular, consideration has been given to letter digrams, trigrams, and syllable analysis approaches 2.38 as well as to dictionary lookups.

Special problems, less amenable to contextual considerations, arise in the case of large files

Automatic inference and consistency checks may be applied to error detection and error correction as well as to identification and authentication procedures. Waldo and DeBacker (1958) give an early example as applied to chemical structure data.2.42 A man-machine interactive example has been described by North (1968).2.43 For the future, however, it can be predicted that: "Ways must be found for the machine to freely accept and use incomplete, qualitative information, to mix that information with internally-derived information, and to accept modifications as easily as the original information is accepted." (Jensen, 1967, p. 1−1).

Finally, we note that, in its broadest sense, the term "control" obviously implies the ability to predict whether a given machine procedure will or will not have a solution and whether or not a given computer program, once started running, will ever come to a halt. The field of information control may thus include the theories of automata, computability, and recursive functions, and questions of the equivalence of Turing machines to other formal models of computable processes.

2.1.3. Other System Design Requirements Other system design considerations with respect to requirements analysis include questions of centralization or decentralization of functions and facilities, including compromises such as clusters; 2.44 questions of batch-processing as against time-sharing or mixtures of these modes,2.45 and questions of formatting, normalization,2.46 and standardization.2.47

A final area of requirements analysis involves the questions of system design change and modification2

2.48 and of system measurement. 2.49 In particular, information on types of system usage by various clients provides the basis for periodic re-design of system procedures and for appropriate reorganization of files. Such feedback information may also provide the client with system statistics that enable him to tailor his interest-profile or search strategy considerations to both the available collection characteristics and to his own selection requirements. As Williams suggests, 2.50 this kind of facility is particularly valuable in, systems where the client himself may establish and modify the categories of items in the files that are most likely to be of interest to him.

ments, their locations and the probable workloads (both as to types and also as to throughputs required), are the necessary analyses of the resources presently or potentially available. available. Resources analysis typically involves considerations of manpower availabilities, technological possibilities, and alternative procedural potentialities.

The question may well be raised with respect to an obvious spectrum of R & D requirements. Certainly there will be continuing areas of R & D concern with respect to advanced hardware technologies in processor and storage system design, and in materials and techniques that are related to these requirements. Next there are problems of "software"- that is, of programming techniques to take full advantage of parallel processing capabilities, associative memory accessing and organization, multiprogrammed and multiple-access system control.

Certain requirements are obviously overriding because they permeate the total system design and because they interact with many or all of the sub-systems involved. These include the problems of comparative pay-offs between various possible assemblies of hardware and software, the questions of programming languages and of suitable hierarchies of such languages, and the problems of man-machine interaction especially in the case of time-shared or multiple access systems.

Similarly, the requirements for handling a variety of input and output sensing modalities and for processing more than one I-O channel in an effectively simultaneous operation clearly indicate needs for continuing research and development efforts in the design and use of parallel processing techniques, multi-processor networks, time-shared multiple access scheduling, and multi-programming.

Hierarchies of languages are implied, ranging from those in which the remote console user speaks to the machine system in a relatively natural language (constrained to a greater or lesser degree) to those required for the highly sophisticated executive control, scheduling, file protection, accounting, monitoring, and instrumentation programs. For the future, increasing consideration needs to be given not only to hierarchies of languages for using systems, but to hierarchies of systems as well.2.51

There are, of course, concurrent hardware research, development, and effective usage requirements in all or most of these areas. Improvements in microform storage efficiency, lower per bit information-representation costs, communication channel utilization economies, improved quality of facsimile reproduction and transmission of items selected or retrieved, are obvious examples of directly foreseeable future demands. Some of the above considerations will be discussed in later sections of this report. Here we are concerned in particular with resources analysis in terms of system modularity, configuration and reconfiguration, and with provisions for safeguarding the information to be handled in the system.

2.2.1. System Modularity, Configuration, and

Reconfiguration

Today, in increasingly complex information processing systems, there are typically requirements for considerable modularity and replication of system components in order to assure reliable, dependable, and continuous operation.2.52 The possibilities for the use of parallel processing techniques are receiving increased R & D attention. Such techniques may be used to carry out data transfers simultaneously with respect 2.53 to the processing operations, to provide analyses necessary to convert sequential processing programs into parallel-path programs,2 or to make allocations of system resources more efficiently because constraints on the sequence in which processing operations are executed can be relaxed.2.55

2.54

In terms of system configuration and reconfiguration, there is a continuing question of the extent of desirable replication of input-output units and other components or sub-assemblies. This may be particularly important for multiple-access and multiple-use systems.2.56 A particularly important system configuration feature desired as a resource for largescale information processing systems is that of open-endedness,2.57

System reconfigurations, often necessary changing task orders are received, are particularly important in the area of shifting the system facilities for system self-checking and repair.2.58 Thus Amdahl notes that "the process of eliminating and introducing components when changing tasks is reconfiguration. The time required to reconfigure upon occurrence of a malfunction may be a critical system parameter," (Amdahl, 1965, p. 39) and Dennis and Glaser emphasize that "the ability of a system to adapt to new hardware, improved procedures and new functions without interfering with normal system operation is mandatory." (Dennis and Glaser, 1965, p. 5.)

2.2.2. Safeguarding and Recovery Considerations

A first and obvious provision for "fail-safe" (or, more realistically, "fail-softly") 2.59 operation of an information processing system network is that of adequate information controls (for example, as discussed above) on the part of all member systems and components in the network.2.60 This requirement reflects, of course, the familiar ADP aphorism of garbage in, garbage out'. Again, the total system must be adequately protected from inadvertent misuse, abuse, or damage on the part of its least experienced user or its least reliable component. Users must be protected from unauthorized access and exploitation by other users, and they also must be protected from the system itself, not only in the sense of equitable management, scheduling, and costing but also in the sense that system failures and malfunctions should not cause intolerable delays or irretrievable losses.2.61

Tie-ins to widespread communication networks

and the emergence of computer-communication networks obviously imply some degree of both modularity and replication of components, providing thereby some measure of safeguarding and recovery protection.2.62 An extensive bibliographic survey of proposed techniques for improving system reliability by providing various processes for introducing redundancy is provided by Short (1968),2.63 Protective redundancy of system components is, as we have seen, a major safeguarding provision in design for high system reliability and availability.2.64 In terms of continuing R & D concerns, however, we note the desirability of minimizing the costs of replication 2.65 and the possibilities for development of formal models that will facilitate the choice of appropriate trade-offs between risks and costs.2.66

Finally, there are the questions of resources analysis with respect to the safeguarding of the information in the system or network- that is, the provisions for recovery, backup, rollback, and restart or repeat of messages, records, and files.2.67 The importance of adequate recovery techniques in the event of either system failure or destruction or loss of stored data, can hardly be overestimated.2.68

The lessons of the Pentagon computer installation fire, in the early days of automatic data processing operations, still indicate today that, in many situations, separate-site replication of the master files (not only of data but also often of programs) is mandatory.2.69 Otherwise, the system designer must determine whether or not the essential contents of the machine-usable master files can be recreated from preserved source data.2.70 If the file contents can be recreated, then the designer must decide in what form and on what storage media the backup source records are to be preserved.2.71

In terms of system planning and resource analysis for information processing network design, we note the following questions:

Can the network continue to provide at least minimal essential services in the case of one or more accidental or deliberate breaks in the links?

What are the minimal essential services to be maintained at fail-safe levels? To what extent will special priorities and priority re-scheduling be required?

Must dynamic re-routing of information flow be applied, or will store-and-forward with delayed re-routing techniques suffice?

There are known techniques for evaluating optimum or near-optimum paths through complex paths in the sense of efficiency (economic, workload balancing, and throughput or timeliness considerations). Can these techniques be reapplied to the fail-safe or fail-softly requirements must new methods and algorithms be developed?

What are the fallback mechanisms at all levels and nodes of the system for: (a) specific failures

« Previous Continue »

Books

Research and Development in the Computer and Information Sciences: Overall ...