Appendix A. Background Notes on Overall System Design Requirements In this Appendix we present further discussion and background material intended to highlight currently identifiable research and development requirements in the broad field of the computer and information sciences, with emphasis upon overall system design considerations with respect to information processing systems. A number of illustrative examples, pertinent quotations from the literature, and references to current R and D efforts have been assembled. These background notes have been referenced, as appropriate, in the summary text. 1. Introduction 1.1 There are certain obvious difficulties with respect to the organization of material for a series of reports on research and development requirements in the computer and information sciences and technologies. These problems stem from the overlaps between functional areas in which man-machine interactions of both communication and control are sought; the techniques, tools, and instrumentation available to achieve such interactions, and the wide variety of application areas involved. The material that has been collected and reviewed to date is so multifaceted and so extensive as to require organization into reasonably tractable (but arbitrary) subdivisions. Having considered some of the R and D requirements affecting specific Boxes shown in Figure 1 (p. 2) in previous reports, we will discuss here some of the overall system design considerations affecting more than one of the processes or functions shown in Figure 1. Other topics to be covered in separate reports in this series will include specific problems of information storage, selection and retrieval systems and the questions of maintaining the integrity of privileged files (i.e., some of the background considerations with respect to the issues of privacy, confidentiality, and/or security in the case of multiply-accessed, machine-based files, data banks, and computercommunication networks). In general, the plan of attack in each individual report in the series will be to outline in relatively short discursive text the topics of concern, supplemented by background notes and quotations and by an appendix giving the bibliographic citations of quoted references. It is planned, however, that there will be a comprehensive summary, bibliography, and index for the series as a whole. Since problems of organization, terminology, and coverage have all been difficult in the preparation of this series of reports, certain disclaimers and observations with respect to the purpose and scope of this report, its necessary selectivity, and the problems of organization and emphasis are to be noted. Obviously, the reviewer's interests and limitations will emerge at least indirectly in terms of the selectivity that has been applied. In general, controversial opinions expressed or implied in any of the reports in this series are the sole responsibility of the author(s) of that report and are not intended in any way to represent the official policies of the Center for Computer Sciences and Technology, the National Bureau of Standards, or the Department of Commerce. However, every effort has been made to buttress potentially controversial statements or implications either with direct quotations or with illustrative examples from the pertinent literature in the field. It is especially to be noted that the references and quotations included in the text of this report, in the corroborative background notes, or in the bibliography, are necessarily highly selective. Neither inclusion nor citation is intended in any way to represent an endorsement of any specific commercially available device or system, of any particular investigator's results with respect to those of others, or of the objectives of projects that are mentioned. Conversely, omissions are often inadvertent and are in no sense intended to imply adverse evaluations of products, materials and media, equipment, systems, project goals and project results, or of bibliographic references not included. There will be quite obvious objections to this necessary selectivity from readers who are also R & D workers in the fields involved as to the representativeness of cited contributions from their own work or that of others. Such criticisms are almost inevitable. Nevertheless, these reports are not intended to be state-of-the-art reviews as such, but, rather, they are intended to provide provocative suggestions for further R & D efforts. Selectivity must also relate to a necessarily arbitrary cut-off date in terms of the literature covered. These reports, subject to the foregoing caveats, are offered as possible contributions to the understanding of the general state of the art, especially with respect to long-range research possibilities in a variety of disciplines that are potentially applicable to information processing problems. The reports are therefore directed to a varied audience among whom are those who plan, conduct, and support research in these varied disciplines. They are also addressed to applications specialists who may hope eventually to profit from the results of current research efforts. Inevitably, there must be some repetitions of the obvious or over-simplifications of certain topics for some readers, and there must also be some too brief or inadequately explained discussions on other topics for these and other readers. What is at best tutorial for one may be difficult for another to follow. It is hoped, however, that the notes and bibliographic citations will provide sufficient clues for further follow-up as desired. The literature survey upon which this report is based generally covered the period from mid-1962 to mid1968, although a few earlier and a few later references have also been included as appropriate. 1.2 Certain features of the information flow and and process schema of Figure 1 are to be noted. It is assumed, first, that the generalized information processing system should provide for automatic access from and to many users at many locations. This implies multiple inputs in parallel, system interruptibility, and interlacings of computer programs. It is assumed, further, that the overall scheme involves hierarchies of systems, devices and procedures, that processing involves multistep operations, and that multimode operation is possible, depending on job requirements, prior or tentative results, accessibility, costs, and the like. It should be noted, next, that techniques suggested for a specific system may apply to more than one operational box or function shown in the generalized diagram of Figure 1. Similarly, in a specific system, the various operations or processes may occur in different sequences (including iterations) and several different ones may be combined in various ways. Thus, for example, questions of remote console design may affect original item input, input of processing service requests, output, and entry of feedback information from the user or the system client. The specific solutions adopted may be implemented in each of these operational areas, or combined into one, e.g., by requiring all imputs and outputs to flow through the same hardware. 2. Requirements and Resources Analysis 2.1 "The single information flow concept is input-oriented. The system is organized so that essential data are inserted into a common reservoir through point-of-origin input/output devices. User requirements are then satisfied from this reservoir of fundamental data about transactions. "Thus, the single information flow concept is characterized by random entry of data, direct access to data in the system, and complete real-time processing . . . fast response, a high degree of reliability, and an easily expansible system.” (Moravec, 1965, p. 173). 2.2 "In a highly distributed system, however, information on inputs to the organization flow directly to relatively low-level way stations where all possible processing is done and all actions are taken that are allowed by the protocol governing that level. In addition to the direct actions that it takes, the lowest, or reflexive, level of information processing ordinarily generates two classes of information. These are, first, summaries of actions taken or anticipated and, second, summaries of information inputs that, because of their type, salience, or criticality, fall outside the range of action that policy has established as appropriate for that level. "In computer terms, a highly distributed system involves a primary executive program that adds and subtracts subroutines to various primary libraries from which alternative subroutines are to be drawn and combined. Secondary executive programs, responding to separate inputs and conditions, select and organize subroutines from each of these primary libraries and add and subtract subroutines to various secondary libraries from which tertiary executive programs select alternative subroutines for use at their level and for controlling the library one level down, and so forth. The flexibility of a distributed system is an outgrowth of the ability of each of the lower executive programs to organize its program on the basis of separate inputs reaching it directly.” (Bennett, 1964, pp. 104-106). "By a distributed implementation of an information service system we mean that the data processing activity is carried out by several or many installations... The data base is now distributed among the installations making up the information network for this service system ... "The distributed information network should offer considerable advantage in reducing the cost of terminal communications by permitting installations to be located near concentrations of terminals." (Dennis, 1968, p. 373). 2.3 "A large number of factors (user communities, document forms, subject disciplines, desired services, to name but a few) compete for the attention of the designer of information service systems. A methodology for the careful organization of these factors and the orderly consideration of their relationships is essential if intelligent decisions are to be made." (Sparks et al., 1965, pp. 1-2). "The lack of recognition of the nature and even, in some cases, the existence of the problems facing the information systems designer has meant that there has been little or no orderly development of generally agreed upon system methodology." (Hughes Dynamics, 1964, p. 1–7). "To the best of our knowledge, no one has yet developed a completely satisfactory theory of information processing. Because there is no strong theoretical basis for the field, we must rely on intuition, experience and the application of heuristic notions each time we attempt to solve a new information processing problem." (Clapp, 1967, p. 4). 2.4 Additional examples are as follows: "Preliminary data support the previous indications (Werner, Trueswell, et al.) that the introduction of new services is not followed by an immediately high level use of them. The state-of-theart of equipment, personnel, and documentation still offers continuing problems. Medical researchers in the study do not seem to look upon the system as being an essential source of information for their work, but as a convenient ancillary activity." (Rath and Werner, 1967, p. 62). "A major study recently conducted by Auerbach Corporation into the manpower trends in the engineering support functions concerned with information ... which involved investigations of a large number of company and government operations, was both surprising and disconcerting because it showed that there are large areas of both government and industry in which there is very little concern about, or work underway toward, solving the information flow and utilization problem." (Sayer, 1965, p. 24). 2.5 "There are seven properties of a system that can be stated explicitly by the organization requesting the system design: WHAT the system should be, WHERE the system is to be used, and WHERE, WHEN, WITH WHAT, FOR WHOM, and WITH WHOM the system is to be designed." (Davis, 1964, p. 20). 2.6 Consider also the following: "Consequently, it appears that two early areas of required investigation are those of determining: 1) who are the potential users of science and/or engineering information systems, where are they located, what is their sphere of activity? and 2) What is the real nature and volume of material that will flow through a national information system? . . . "In undertaking a program to establish information service networks it is necessary to know: 1. Who are the users? 2. What are the user information needs? 3. Where are these users? 4. How many users and user groups are there, and how do their needs differ? 5. What information products and services will meet these needs? 6. What production operations are necessary to produce these information products and services? 7. Which of these products and services are really being produced now; by whom and where and how well is an ultimate purpose already being achieved? 8. How will any new system best integrate with existing practices? 9. What are the operations best performed from a standpoint of quality and timeliness of service to users, economy of costs and overall network operations, available trained manpower, and ability to respond to change?" (Sayer, 1965, pp. 144-145). "Some of the details the user must determine are the number and location of remote points, frequency of use, response time required, volume of data to be communicated, on line storage requirements, and the like." (Jones, 1965, p. 66). 2.7 "Neglect of 'WHERE the system is to be used' is the most frequent cause of inadequate system designs." (Davis, 1964, p. 21). 2.8 Thus Sayer points out the need for "population figures describing the user community in detail, its interest in subject disciplines, and the effect of this interest on the effective demand on the system from both initiative and responsive demands.” (Sayer, 1965, p. 140). Sparks et al. raise the following considerations: "There are certain basic dimensions of an information service system which it is appropriate to recognize in a formal way. One of these is the spectrum of selected disciplines which are to be represented in the information processed by the system. Another of these is the geographical area to be served by the system and in which the user population will be distributed . . . "The number of user communities into which the user population is divided determines (or is determined by) the number of direct-service information centers in the system. Thus, it has a major effect on system size and structure." (Sparks et al., 1965, pp. 2-6, 2-7). 2.9 "In structuring shiny, new information systems, we must be careful to allow for resistance to change long before the push buttons are installed, especially when the users of the systems have not been convinced that there is a real need for change." (Aines, 1965, p. 5.) "Examine the various systems characteristics such as: user/network interface; network usage patterns; training requirements; traffic control; service and organization requirements; response effectiveness; cost determinations; and network capacity." (Hoffman, 1965, p. 90–91.) "As an appendage to a prototype network, some experimental retraining programs would be well advised... "A massive effort directed at retraining large numbers of personnel now functioning in libraries will be required to produce the manpower necessary for a real-time network ever to reach a fully operational status." (Brown et al., 1967, p. 68). "Where do experimental studies of user performance fit into burgeoning information services? The answer is inescapable: the extent of experimental activity will effectively determine the level of excellence, in method and in substantive findings, with which key problems regarding user performance will be met. If experimental studies in mancomputer communication continue to be virtually nonexistent, the gap in verified knowledge of user behavior will continue to be dominated by immediate cost and narrow technical considerations rather than by the users' long range interests. Everyone will be a loser. Neither the managers of computer utilities, or the manufacturers, or the designers of central systems will have tested, reliable knowledge of what the user needs, how he behaves, how long it takes him to master new services, or how well he performs. In turn, the user will not have reliable, validated guidance to plan, select, and become skilled in harnessing the information services best suited to his needs, his time, and his resources. Since he is last, the user loses most." (Sackman, 1968, p. 351). 2.10 "Everyone talks about the computer user, but virtually no one has studied him in a systematic, scientific manner. There is a growing experimental lag between verified knowledge about users and rapidly expanding applications for them. This experimental lag has always existed in computer technology. Technological innovation and aggressive marketing of computer wares have consistently outpaced established knowledge of user performance-a bias in computer technology largely attributable to current management outlook and practice. With the advent of time-sharing systems, and with the imminence of the muchheralded information utility, the magnitude of this scientific lag may have reached a critical point. If unchecked, particularly in the crucial area of software management, it may become a crippling humanistic lag- a situation in which both the private and the public use of computers would be characterized by overriding concern for immediate machine efficiency and economy, and by an entrenched neglect of human needs, individual capabilities, and long-range social responsibilities." (Sackman, 1968, p. 349). "Quite often the most important parameter in a system's performance is the behavior of the average user. This information is very rarely known in advance, and can only be obtained by gathering statistics. It is important to know, for example, how long a typical user stays on a time-sharing system during an average session, how many language processors he uses, how much computing power he requires during each interaction' with the system, and so forth. Modeling and simulation can be of great help in pre-determining this information if the environment is known, but in many commercial or University time-sharing systems there is little control over or prior knowledge of the characteristics of the users." (Yourdon, 1969, p. 124). "The lag in user studies is a heritage which stems mainly from the professional mix that originally developed and used the technology of man-computer communications. For two critical, formative decades, the 1940's and the 1950's-comprising the birth and development of electronic digital computers social scientists, human engineers and human factors specialists, the professionals trained to work with human subjects under experi mental conditions, were only indirectly concerned with man-computer communications, dealing largely with knobs, buttons and dials rather than with the interactive problem-solving of the user. In all fairness, there were some exceptions to this rule, but they were too few and too sporadic to make a significant and lasting impact on the mainstream of user development. Since there was, in effect, an applied scientific vacuum surrounding mancomputer communication, it is not at all surprising that there does not exist today a significant, cumulative experimental tradition for testing and evaluating computer-user performance." (Sackman, 1968, p. 349). "The problem is, of course, to get the right information to the right man at the right time and at his work station and with minimum effort on his part. What all this may well be saying is that the information problem that exists is considerably more subtle and complex than has been set forth . . . The study for development of a Methodology for Analysis of Information Systems Networks arrives, both directly and by implication at the same conclusion as have a number of other recent studies. That conclusion is that much more has to be known about the user and his functions, and much more has to be known about what the process of RDT & E actually is and how can information, as raw material input to the process, flow most efficiently and most effectively." (Sayer, 1965, p. 146). "The recurrent theme in general review articles concerned with man-computer communication is the glaring experimental lag. Innovation and unverified applications outrace experimental evaluation on all sides. "In a review of man-computer communication, Ruth Davis points out that virtually no experimental work has been done on user effectiveness. She characterizes the status of user statistics as inadequate and 'primitive', and she urges the specification and development of extensive measures of user performance. . . . "Pollack and Gildner reviewed the literature on user performance with manual input devices for man-computer communication. Their extensive survey covering large numbers and varieties of switches, pushbuttons, keyboards and encoders — revealed 'inadequate research data establishing performance for various devices and device characteristics, and incomplete specification of operator input tasks in existing systems.' There was a vast experimental gap between the literally hundreds of manual input devices surveyed and the very small subset of such devices certified by some form of user validation. They recommended an initial program of research on leading types of task/device couplings, and on newer and more natural modes of manual inputs such as speech and handwriting." (Sackman, 1968, p. 350). volume of information units or reports to be received, processed, or stored can be gained through the use of filtering procedures to reduce the possible redundancies between items received. (Timing considerations are important in such procedures, as noted elsewhere, because we won't want a delayed and incorrect message to ‘update' its own correction notice.) "Secondly, input filtering procedures serve to reduce the total bulk of information to be processed or stored-both by elimination of duplicate items as such and by the compression of the quantitative amount of recording used to represent the original information unit or message within the system. "A third technique of information control at input is directed to the control of redundancy within a single unit or report. Conversely, input filtering procedures of this type can be used to enhance the value of information to be stored. For example, in pictorial data processing, automatic boundary contrast enhancements or 'skeletonizations' may improve both subsequent human pattern perception and system storage efficiency. Another example is natural text processing, where systematic elimination of the 'little', 'common', and 'non-informing' words can significantly reduce the amount of text to be manipulated by the machine." (Davis, 1967, p. 49). 2.12 In this area, R & D requirements for the future include the very severe problems of sifting and filtering enormous masses of remotely collected data. For example, "our ability to acquire data is so far ahead of our ability to interpret and manage it that there is some question as to just how far we can go toward realizing the promise of much of this remote sensing. Probably 90% of the data gathered to date have not been utilized, and, with large multisensor programs in the offing, we face the danger of ending up prostrate beneath a mountain of utterly useless films, tapes, and charts." (Parker and Wolff, 1965, p. 31). 2.13 "Purging because of redundancy is extremely difficult to accomplish by computer program except in the case of 100% duplication. Redundancy purging success is keyed to practices of standardization, normalization, field formatting, abbreviation conventions and the like. As a case in point, document handling systems universally have problems with respect to bibliographic citation conventions, transliterations of proper names, periodical title abbreviations, corporate author listing practices and the like." (Davis, 1967, p. 20). See also Ebersole (1965), Penner (1965), and Sawin (1965) who points to some of the difficulties. with respect to a bibliographic collection or file, follows: as "1. Actual errors, such as incorrect spelling of words, incorrect report of pagination, in one or more of the duplicates. The error may be mechanically or humanly generated; the error may have been made in the source bibliog raphy, or by project staff in transcription from source to paper tape. In any case, error is a factor in reducing the possibility of identity of duplicates. "2. Variations among bibliographies both in style and content. A bibliographical citation gives several different kinds of information; that is, it contains several 'elements,' such as author of item, title, publication data, reviews and annotations. Each source bibliography more or less consistently employs one style for expressing information, but each style differs from every other in some or all of the following ways: a. number of elements c. typographical details" (1965, p. 96). 2.14 "File integrity can often be a significant motivation for mechanization. To insure file integrity in airline maintenance records, files have been republished monthly in cartridge roll-microfilm form, since mechanics would not properly insert update sheets in maintenance manuals. Freemont Rider's original concept for the microcard, which was a combination of a catalog card and document in one record, failed in part because of the lack of file integrity. Every librarian knows that if there wasn't a rod through the hole in the catalog card they would not be able to maintain the integrity of the card catalog." (Tauber, 1966, p. 277). 2.15 "Retirement of outmoded data is the only long-range effective means of maintaining an efficient system." (Miller et al., 1960, p. 54). With respect to maintenance processes involving the deletion of obsolete items, there are substantial fact finding research requirements for large-scale documentary item systems in terms of establishing efficient but realistic criteria for "purging". Kessler comments on this point as follows: "It is not just a matter of throwing away 'bad' papers as 'good' ones come along. The scientific literature is unique in that its best examples may have a rather short life of utility. A worker in the field of photoelectricity need not ordinarily be referred to Einstein's original paper on the subject. The purging of the system must be based on criteria of operational relevance rather than intrinsic value. These criteria are largely unknown to us and represent another basic area in need of research and invention." (1960, pp. 9-10). "Chronological cutoff is that device attempted most frequently in automated information systems. It is employed successfully in real-time systems such as aircraft or satellite tracking or airline reservations systems where the information is useless after very short time intervals and where it is so voluminous as to be prohibitive for future analyses . . "That purging which is done is primarily replaceData management or file management ment. |