Page images

notions each time we attempt to solve a new informa- network operations, available trained mantion processing problem." (Clapp, 1967, p. 4).

power, and ability to respond to change?2.4. Additional examples are as follows:

(Sayer, 1965, pp. 144-145). “Preliminary data support the previous indica

“Some of the details the user must determine are tions (Werner, Trueswell, et al.) that the introduc- the number and location of remote points, frequency tion of new services is not followed by an im

of use, response time required, volume of data to be mediately high level use of them. The state-of-the

communicated, on line storage requirements, and art of equipment, personnel, and documentation

the like.” (Jones, 1965, p. 66). still offers continuing problems. Medical researchers

2.7 Neglect of ‘WHERE the system is to be in the study do not seem to look upon the system as used is the most frequent cause of inadequate sysbeing an essential source of information for their

tem designs." (Davis, 1964, p. 21). work, but as a convenient ancillary activity."

2.8 Thus Sayer points out the need for “popula(Rath and Werner, 1967, p. 62).

tion figures describing the user community in detail, "A major study recently conducted by Auerbach its interest in subject disciplines, and the effect of Corporation into the manpower trends in the this interest on the effective demand on the system engineering support functions concerned with from both initiative and responsive demands." information ... which

which involved investigations (Sayer, 1965, p. 140). of a large number of company and government Sparks et al. raise the following considerations: operations, was both surprising and disconcerting “There are certain basic dimensions of an informabecause it showed that there are large areas of tion service system which it is appropriate to recog. both government and industry in which there is nize in a formal way. One of these is the spectrum of very little concern about, or work underway toward, selected disciplines which are to be represented in solving the information flow and utilization the information processed by the system. Another problem." (Sayer, 1965, p. 24).

of these is the geographical area to be served by the 2.5 “There are seven properties of a system that system and in which the user population will be can be stated explicitly by the organization request. distributed ... ing the system design: WHAT the system should “The number of user communities into which the be, WHERE the system is to be used, and WHERE, user population is divided determines (or is deterWHEN, WITH WHAT, FOR WHOM, and WITH mined by) the number of direct-service information WHOM the system is to be designed.” (Davis, centers in the system. Thus, it has a major effect 1964, p. 20).

on system size and structure." (Sparks et al., 1965, 2.6 Consider also the following:

pp. 2–6, 2–7). “Consequently, it appears that two early areas

2.9 “In structuring shiny, new information sysof required investigation are those of determining: tems, we must be careful to allow for resistance to 1) who are the potential users of science and/or

change long before the push buttons are installed, engineering information systems, where are they especially when the users of the systems have not located, what is their sphere of activity? and 2) been convinced that there is a real need for change.” What is the real nature and volume of material

(Aines, 1965, p. 5.) that will flow through a national information sys

“Examine the various systems characteristics tem?

such as: user/network interface; network usage “In undertaking a program to establish informa

patterns; training requirements; traffic control; tion service networks it is necessary to know:

service and organization requirements; response

effectiveness; cost determinations; and network 1. Who are the users?

capacity.” (Hoffman, 1965, p. 90-91.) 2. What are the user information needs?

“As an appendage to a prototype network, some 3. Where are these users?

experimental retraining programs would be well 4. How many users and user groups are there, advised ... and how do their needs differ?

"A massive effort directed at retraining large 5. What information products and services will numbers of personnel now functioning in libraries meet these needs?

will be required to produce the manpower necessary 6. What production operations are necessary to for a real-time network ever to reach a fully opera

produce these information products and tional status.” (Brown et al., 1967, p. 68).

“Where do experimental studies of user perform7. Which of these products and services are ance fit into burgeoning information services? The

really being produced now; by whom and where answer is inescapable: the extent of experimental and how well is an ultimate purpose already activity will effectively determine the level of exbeing achieved?

cellence, in method and in substantive findings, 8. How will any new system best integrate with with which key problems regarding user performexisting practices?

ance will be met. If experimental studies in man9. What are the operations best performed from computer communication continue to be virtually

a standpoint of quality and timeliness of nonexistent, the gap in verified knowledge of user service to users, economy of costs and overall behavior will continue to be dominated by immediate


[ocr errors]

mental conditions, were only indirectly concerned with man-computer communications, dealing largely with knobs, buttons and dials rather than with the interactive problem-solving of the user. In all fairness, there were some exceptions to this rule, but they were too few and too sporadic to make a significant and lasting impact on the mainstream of user development. Since there was, in effect, an applied scientific vacuum surrounding mancomputer communication, it is not at all surprising that there does not exist today a significant, cumulative experimental tradition for testing and evaluating computer-user performance." (Sackman, 1968,

p. 349).

cost and narrow technical considerations rather than by the users' long range interests. Everyone will be a loser. Neither the



computer utilities, or the manufacturers, or the designers of central systems will have tested, reliable knowledge of what the user needs, how he behaves, how long it takes him to master new services, or how well he performs. In turn, the user will not have reliable, validated guidance to plan, select, and become skilled in harnessing the information services best suited to his needs, his time, and his resources. Since he is last, the user loses most." (Sackman, 1968, p. 351).

2.10 “Everyone talks about the computer user, but virtually no one has studied him in a systematic, scientific manner. There is a growing experimental lag between verified knowledge about users and rapidly expanding applications for them. This experimental lag has always existed in computer technology. Technological innovation and aggressive marketing of computer wares have consistently outpaced established knowledge of user performance-a bias in computer technology largely attributable to current management outlook and practice. With the advent of time-sharing systems, and with the imminence of the muchheralded information utility, the magnitude of this scientific lag may have reached a critical point. If unchecked, particularly in the crucial area of software management, it may become a crippling humanistic lag- a situation in which both the private and the public use of computers would be characterized by overriding concern for immediate machine efficiency and economy, and by an entrenched neglect of human needs, individual capabilities, and long-range social responsibilities.' (Sackman, 1968, p. 349).

“Quite often the most important parameter in a system's performance is the behavior of the average user. This information is very rarely known in advance, and can only be obtained by gathering statistics. It is important to know, for example, how long a typical user stays on a time-sharing system during an average session, how many language processors he uses, how much computing power he requires during each 'interaction with the system, and so forth. Modeling and simulation can be of great help in pre-determining this information if the environment is known, but in many commercial or University time-sharing systems there is little control over or prior knowledge of the characteristics of the users.” (Yourdon, 1969,

[ocr errors]

“The problem is, of course, to get the right information to the right man at the right time and at his work station and with minimum effort on his part. What all this may well be saying is that the information problem that exists is considerably more subtle and complex than has been set forth ... The study for development of a Methodology for Analysis of Information Systems Networks arrives, both directly and by implication at the same conclusion as have a number of other recent studies. That conclusion is that much more has to be known about the user and his functions, and much more has to be known about what the process of RDT & E actually is and how can information, as raw material input to the process, flow most efficiently and most effectively.” (Sayer, 1965, p. 146).

“The recurrent theme in general review articles concerned with man-computer communication is the glaring experimental lag. Innovation and unverified applications outrace experimental evaluation on all sides.

“In a review of man-computer communication, Ruth Davis points out that virtually no experimental work has been done on user effectiveness. She characterizes the status of user statistics as inadequate and 'primitive', and she urges the specification and development of extensive measures of user performance.

"Pollack and Gildner reviewed the literature on user performance with manual input devices for man-computer communication. Their extensive survey - covering large numbers and varieties of switches, pushbuttons, keyboards and encoders revealed 'inadequate research data establishing performance for various devices and device characteristics, and incomplete specification of operator input tasks in existing systems.' There was a vast experimental gap between the literally hundreds of manual input devices surveyed and the very small subset of such devices certified by some form of user validation. They recommended an initial program of research on leading types of task/device couplings, and on newer and more natural modes of manual inputs such as speech and handwriting." (Sackman, 1968, p. 350).

2.11 "Information control at input can be used to achieve improved system efficiency in several different ways. First, a reduction in the total

p. 124).

"The lag in user studies is a heritage which stems mainly from the professional mix that originally developed and used the technology of man-computer communications. For two critical, formative decades, the 1940's and the 1950's - comprising the birth and development of electronic digital computers - social scientists, human engineers and human factors specialists, the professionals trained to work with human subjects under experi


volume of information units or reports to be received,

raphy, or by project staff in transcription processed, or stored can be gained through the use

from source to paper tape. In any case, error of filtering procedures to reduce the possible re

is a factor in reducing the possibility of dundancies between items received. (Timing con

identity of duplicates. siderations are important in such procedures, as “2. Variations among bibliographies both in noted elsewhere, because we won't want a delayed style and content. A bibliographical citation and incorrect message to 'update' its own correc

gives several different kinds of information; tion notice.)

that is, it contains several ‘elements, such "Secondly, input filtering procedures serve to as author of item, title, publication data, rereduce the total bulk of information to be processed

views and annotations. Each source bibliogor stored - both by elimination of duplicate items

raphy more or less consistently employs one as such and by the compression of the quantitative

style for expressing information, but each amount of recording used to represent the original

style differs from every other in some or all information unit or message within the system.

of the following ways: “A third technique of information control at

a. number of elements input is directed to the control of redundancy within a single unit or report. Conversely, input filtering

b. sequence of elements

c. typographical details” (1965, p. 96). procedures of this type can be used to enhance the value of information to be stored. For example, in

2.14 “File integrity can often be a significant pictorial data processing, automatic boundary con

motivation for mechanization. To insure file integ. trast enhancements or ‘skeletonizations may im

rity in airline maintenance records, files have been prove both subsequent human pattern perception

republished monthly in cartridge roll-microfilm and system storage efficiency. Another example is

form, since mechanics would not properly insert natural text processing, where systematic elimina

update sheets in maintenance manuals. Freemont tion of the little', 'common', and 'non-informing'

Rider's original concept for the microcard, which words can significantly reduce the amount of text

was a combination of a catalog card and document to be manipulated by the machine.” (Davis, 1967,

in one record, failed in part because of the lack of p. 49).

file integrity. Every librarian knows that if there 2.12 In this area, R & D requirements for the wasn't a rod through the hole in the catalog card they future include the very severe problems of sifting

would not be able to maintain the integrity of the and filtering enormous masses of remotely collected card catalog." (Tauber, 1966, p. 277). data. For example, "our ability to acquire data is so 2.15 “Retirement of outmoded data is the only far ahead of our ability to interpret and manage it long-range effective

of maintaining an that there is some question as to just how far we efficient system." (Miller et al., 1960, p. 54). can go toward realizing the promise of much of this With respect to maintenance processes involving remote sensing. Probably 90% of the data gathered the deletion of obsolete items, there are substantial to date have not been utilized, and, with large multi- fact finding research requirements for large-scale sensor programs in the offing, we face the danger of documentary item systems in terms of establishing ending up prostrate beneath a mountain of utterly efficient but realistic criteria for "purging". Kessler useless films, tapes, and charts." (Parker and Wolff,

comments on this point as follows: “It is not just 1965, p. 31).

a matter of throwing away 'bad' papers as 'good' 2.13 “Purging because of redundancy is ex- ones come along. The scientific literature is unique tremely difficult to accomplish by computer program in that its best examples may have a rather short except in the case of 100% duplication. Redundancy life of utility. A worker in the field of photoelecpurging success is keyed to practices of standardiza- tricity need not ordinarily be referred to Einstein's tion, normalization, field formatting, abbreviation original paper on the subject. The purging of the conventions and the like. As a case in point, docu- system must be based on criteria of operational ment handling systems universally have problems

relevance rather than intrinsic value. These with respect to bibliographic citation conventions, criteria are largely unknown to us and represent transliterations of proper names, periodical title another basic area in need of research and invenabbreviations, corporate author listing practices and tion." (1960, pp. 9-10). the like." (Davis, 1967, p. 20).

“Chronological cutoff is that device attempted See also Ebersole (1965), Penner (1965), and

most frequently in automated information systems. Sawin (1965) who points to some of the difficulties It is employed successfully in real-time systems with respect to a bibliographic collection or file, as

such as aircraft or satellite tracking or airline follows:

reservations systems where the information is “1. Actual errors, such as incorrect spelling of

useless after very short time intervals and where words, incorrect report of pagination, in one

it is so voluminous as to be prohibitive for future or more of the duplicates. The error may

analyses ... mechanically or humanly generated; the error “That purging which is done is primarily replacemay have been made in the source bibliog- ment. Data management

file management




systems are generally programmed so that upon proper identification of an item during the manual input process it may replace an item already in the system data bank. The purpose of replacement as a purging device is not volume control. It is for purposes of accuracy, reliability or timeliness controls." (Davis, 1967, p. 15).

“The reluctance to purge has been a leading reason for accentuating file storage hierarchy considerations. Multi-level deactivation of information is substituted for purging. Deactivation proceeds through allocating the material so specified first to slower random-access storage devices and then to sequentially-accessed storage devices with decreasing rates of access all on-line with the computer. As the last step of deactivation the information is stored in off-line stores ...

“Automatic purging algorithms have been written for at least one military information system and for SDC's time-sharing system ... In the military system ... the purging program written allowed all dated units of information to be scanned and those prior to a prescribed date to be deleted and transcribed onto a magnetic tape for printing. The information thus nominated for purging was reviewed manually. If the programmed purge decision was overridden by a manual decision the falsely purged data then had to be re-entered into system files as would any newly received data.” (Davis, 1967, pp. 16-18).

“Automatic purging algorithms have been explored for the past three years. The current scheme attempts to dynamically maintain a 10 percent disc vacancy factor by automatically deleting the oldest files first. User options are provided which permit automatic dumping of files on a backup, inactive file tape . . prior to deletion." (Schwartz and Weissman, 1967, p. 267).

“The newer time-sharing systems contemplate a hierarchy of file storage, with 'percolation' algorithms replacing purging algorithms. Files will be in constant motion, some moving ‘down' into higher-volume, slower-speed bulk store, while others move 'up' into lower-volume, higherspeed memory – all as a function of age and reference frequency." (Schwartz and Weissman, 1967, p. 267).

2.16 "Some computer-oriented statistics are provided to assist in monitoring the system with minimum cost or time. Such statistics are tape length and length of record, checks on dictionary code number assignment, frequency of additions or deletions to the dictionary, and checks to see that the correct inverted file was updated.” (Smith and Jones, 1966, p. 190).

“Usage statistics as obsolescence criteria are commonly employed in scientific and technical information systems and reference data systems ...

“Usage statistics are also used in the deactivation process to organize file data in terms of its reference frequency. The Russian-to-English automated translation system at the Foreign Technology Divi

sion, Wright-Patterson AFB had its file system organized on this basis by IBM in the early 1960's. It was found from surveys of manual translators that the majority of vocabulary references were to less than one thousand words. These were isolated and located in the fastest-access memory: the rest of the dictionary was then relegated to lower priority locations (Davis, 1967, pp. 18–19).

“The network might show publications being permanently retained at a particular location. This would allow others in the network to dispose of little-used materials and still have access to a copy

if the unexpected need arose . "Such an 'archival copy could, of course, be relocated to a relatively low-cost warehouse area for the mutual benefit of those agencies in the network. Statistics on frequency of usage might be very helpful in identifying inactive materials, and the network could also fill this need." (Brown et al., 1967, p. 66).

“Periodic reports to users on file activity may reveal possible misuse or tampering.” (Petersen and Turn, 1967, p. 293).

2.17 Accessibility. For a system output a measure of how readily the proper information was made available to the requesting user on the desired medium.” (Davis, 1964, p. 469).

2.18 Consider also the following:

“The system study will consider that the document-retrieval problem

problem lies primarily within the parameters of file integrity; activity and activity distribution; man-file interaction; the size, nature and organization of the file; its location and workplace layout; whether it is centralized or decentral. ized; access cycle time; and cost. Contributing factors are purging and update; archival considerations; indexing; type of response; peak-hour, peak-minute activity; permissable-error rates; and publishing urgency.” (Tauber, 1966, p. 274).

Then there are questions of sequential decisionmaking and of time considerations generally. “Time consideration is explicitly, although informally, introduced by van Wijngaarden as 'the value of a text so far read'. Apart from other merits of van Wijngaarden's approach and his stressing the interaction between syntax and semantics, we would like to draw attention to the concept of 'value at time t', which seems to be a really basic concept in programming theory.” (Caracciolo di Forino, 1965, p. 226). We note further that “T as the time a fact assertion is reported must be distinguished from the time of the fact history referred to by the assertion." (Travis, 1963, p. 334).

Avram et al., point more prosaically to practical problems in mechanized bibliographic reference data handling, as in the case of different types of searches on date: The case of requesting all works on, say, genetics, written since 1960 as against that of all works on genetics published since 1960 with respect to post-1960 reprints of pre-1960 original texts.

For the future, moreover, “In some instances, the

search request would have to take into account which data has been used in the fixed field. For example, should one want a display of all the books in Hebrew published during a specific time frame, an adjustment would have to be made to the date in the search request to compensate for the adjustment made to the data at input time." (Avram et al., 1965, p. 42).

2.19 “Here you run into the phenomenon of the 'elastic ruler'. At the time when certain data were accumulated, the measurements were made with a standard inch or standard meter . . . whether researchers were using an inch standardized before a certain date, or one adopted later.” (Birch, 1966,

p. 165).

2.20 “Large libraries face the problem of converting records that exist in many languages. The most complete discussion of this problem to date is by Cain & Jolliffe of the British Museum. They sug. gest methods for encoding different languages and speculate on the extent to which certain transliterations could be done by machine. The possibility of storing certain exotic languages on videotapes is suggested as a way of handling the printing problem. At the Brasenose Conference at which this paper was presented, the authors analyzed the difficulties in bibliographic searching caused by transliteration of languages (this is the scheme most generally sug. gested by those in the data processing field). (Markuson, 1967, p. 268).

2.21 “The question of integrity of information within an automated system is infrequently addressed.” (Davis, 1967, p. 13).

“No adequate reference service exists that would allow users to determine easily whether or not records have the characteristics of quality and compatibility that are appropriate to their analytical requirements." (Dunn, 1967, p. 22).

2.22 "Controls through 'common sense' or logical checks ... include the use of allowable numerical bounds such as checking bearings by assuming them to be bounded by 0° as a minimum and 360° as a maximum. They include consistency checks using redundant information fields such as social security number matched against aircraft type and aircraft speed. They also include current awareness checks such as matches of diplomat by name against reported location by city against known itinerary against known political views." (Davis, 1967, p. 36).

"A quite different kind of work is involved in examing for internal consistency the reports from the more than 3 million establishments covered in the 1954 Censuses of Manufacturers and Business. If these reports were all complete and selfconsistent and if we were smart enough to foresee all the problems involved in classifying them, and if we made no errors in our office work, the job of getting out the Census reports would be laborious but straightforward. Unfortunately, some of the reports do contain omissions, errors, and evidence of misunderstanding. By checking for such incon

sistencies we eliminate, for example, the large errors that would result when something has been improperly reported in pounds instead of in thousands of pounds. Perhaps one-third to one-half of the time our UNIVACs devote to processing these Censuses will be spent checking for such inconsistencies and eliminating them.

“Similar checking procedures are applied to the approximately 7,000 product lines for which we have reports. In a like manner we check to see whether such relationships as annual man hours and number of production workers, or value of shipments and cost of labor and materials, are within reasonable limts for the industry and area involved. ...

“For example, the computer might determine for an establishment classified as a jewelry repair shop, that employees' salaries amounted to less than 10 percent of total receipts. For this kind of service trade, expenditures for labor usually represent the major item of expenses and less than 10 percent for salaries is uncommonly low. Our computer would list this case for inspection, and a review of the report might result in a change in classification from 'jewelry repair shop to retail jewelry store', for example." (Hansen and McPherson, 1956, pp. 59–60).

2.23 “The use of logical systems for error control is in beginning primitive stages. Questionanswering systems and inference-derivation programs may find their most value as error control procedures rather than asquery programs or problem-solving programs." (Davis, 1967, p. 47).

“A theoretically significant result of introducing source indicators and reliability indicators to be carried along with fact assertions in an SFQA (question answering] system is that they provide a basis for applying purifying programs to the fact assertions stored in the system - i.e., for resolving contradictions among different assertions, for culling out unreliable assertions, etc.

“Reliability information might indicate such things as: S's degree of confidence in his own report if S is a person; S's probable error if S is

measuring instrument; S's dependability as determined by whether later experience confirmed S's earlier reports; conditions under which S made its report, etc.” (Travis, 1963, p. 333).

2.24 "Another interesting distinction can be made between files on the basis of their accuracy. A clean file is a collection of entries, each of which was precisely correct at the time of its inclusion in the file. On the other hand, a dirty file is a file that contains a significant portion of errors. A recirculating file is purged and cleansed as it cycles - a utility-company billing file is of this nature. After the file 'settles down,' the proportion of errors imbedded in the file is a function of the new activity applied to the file. The error rate is normalized with respect to the business cycle." (Patrick and Black, 1964, p. 39).



[ocr errors]
« PreviousContinue »