Page images
PDF
EPUB

ompatibility (or at least convertibility) as between rge and small system configurations available at ifferent locations. However, we may also consider, ith respect to system requirements for efficient nd economical storage, not only the problems of ierarchies of access but also those of compression f the data actually to be filed.

A typical instance is that of pictorial data repreentation. Beyond the photographic (or equivalent) eans for full, conservative, input data storage copy, microfilm or otherwise, for the facsimile epresentation of the input item) lie possibilities of eductive transformations.* Here, the continuing 1 & D requirements with respect to information rocessing system design involve fact-finding and echnical analyses.

In particular, we need to learn whether such eductive storage can be so developed as to provide he appropriate type of reproducibility. A given ystem may require either reduced facsimile of the riginal, giving only pre-defined significant features, r an enhanced facsimile eliminating noise or reundancy and emphasizing features such as oundaries and edges, or a complete replica of the nitial input image. In other situations, processes f re-computation and re-creation to produce a easonable reconstruction of the original input nay suffice.

We would also look forward to increased use of edundancy eliminating fact-correlations, duplicate hecking, and validation techniques applied to tems available as input for potential retrieval in rder to keep storage requirements within manageble bounds. In addition, as Mooers (1959) has ointed out,3.1 3.10 the achievement of a minimal edundancy store is of considerable significance rom the client's standpoint and from that of overall ystem effectiveness. In particular: "The . 'more compelling] reason to avoid redundancy and ller matter is that putting such matter into the tore is not the end of it, if the retrieval system is ny good. The text, with all its filler and redundancies ill keep coming out of the system, time after time, ith a consequent vast waste of human effort that ill be continued indefinitely into the future." p. 27-28)

The basic procedures for storage in the overall ystem should obviously be simultaneously effective, fficient, and economical. Effectiveness and econmy combined may relate, for example, to storage f textual messages in facsimile-reproducible form ut with brief, independently searchable, contentdicia and with appropriate selection-retrieval ddresses as typical query-output. Yet many conderations of user-acceptance, user-convenience, vailability of suitable microform readers, queuing onsiderations, and other factors may well affect he economics of the situation.

For example, cost considerations (specifically cluding considerations of total storage require

See also Section 3.4 of the first report in this series, Information Acquisition, Onsing, and Input.

ments, including those of an archival nature), may well dictate microform storage, yet the effectiveness and efficiency of the system may be lowered because of client low-usage or dissatisfaction with microform outputs generally.3.10a The economics of storage for subsequent selection and retrieval of specified items relate then, first, to the costs and problems of storage as such and, secondly but not least as critically, to the problems of selective recall and effective utilization. The obvious requirements here are therefore those of effective file organization both for purposes of economical storage and for those of efficient search, selection, and retrieval. Beyond these are possibilities for adaptive reorganizations of the file based upon various types of system feedbacks.

Prywes comments that "file organization, which in the past has been given a secondary position in overall system performance, should instead be the kernel of future systems", and, further, that "the sharing of the information in the common store is technically one of the most demanding, and intellectually the most intricate, functions of the system. Continuous enhancement of this capability must be provided through reindexing and reclassifying the changing organization of the total information." (Prywes, 1966, p. 460).

3.2. Problems of File Organization and Structure

Problems of file organization again involve the need for dynamic memory allocation facilities for the hierarchy of storage systems so that files may be set up in layers of segmentation and be rapidly re-organized according to frequencies of usage. Thus, "the system must have dynamic memory allocation, alternate forms of data structure, and a data management and transfer mechanism so that the same data can be used in all aspects of the problem solution." (Roos, 1965, p. 423).

For whatever purpose machine-searchable files are used (record-keeping, inventory control, documentary-item-surrogate storage, or data-, record-, item-, and fact-retrieval), there is a commonality of three principal kinds of operations upon the files. These three types of file processing are those of file input and organization, file search and selection, and file maintenance and up-dating.3.10b An example of multilevel file organization for a dynamic system is provided by ver Hoef (1966) with respect to the INTIPS (Integrated Information Processing System) of the Rome Air Development Center.3.10c

The design of effective file organizations will involve first the consideration of the levels of storage required in terms of items of different size and of different type. Similarly, consideration should be given to frequencies of estimated usage. This may take the form of either activity analysis 3.11 or of estimates based upon the age, currency or likely usefulness of items stored in the file.3.12 Possible self-adaptive features should also be considered.

In particular, file organization must be capable of changing in response to changes in needs for information and also in response to both qualitative and quantitative changes in the contents of the file. There exists therefore a requirement to explore concepts and techniques that will make feasible self-adaptive features in both file organization and in search strategy. (J. Blum and J. Guy, private communication).

There is a wide variety of alternatives with respect to different methods of file organization, whether these are planned with reference to most efficient storage-retrieval operations or whether they are designed with reference to presumed searchselective effectiveness. We may find relatively random file organizations or arbitrary orderings of the records to be stored in the file; for example, in libraries or document collections, by physical dimensions of the stored items, by date of accession, by age class, by journal volume identification, and the like.

Partially arbitrary orderings of the files may be imposed in the form of alphabetical sequences of stored items by their source or author, or by whether or not they fall within certain prescribed chronological time periods, as in many typical correspondence files. Then, in the case of uneven file distributions as determined by continuing activity analysis, there are possible random orderings based upon such assumptions as the following: "The generally accepted solution is to pick a storage location capacity (bucket), which tends to make overflows likely and empty buckets rare. The storage method then goes on to dispose of the overflows by a second rule, e.g., chaining, storing overflow addresses in the bucket, or by assignment to another area." (Dumey, 1965, pp. 258–259).

Hierarchically ordered files typically involve classification schemes and structures of various types and, where it is not possible to determine an exclusive classification for a stored item, the use of cross-reference techniques including the placement of multiple copies of a particular item under several different classification categories, thus providing multiple parallel access to different sections of the file. This is the practice, for example, not only in the hard copy files of the U.S. Patent Office, but in microform storage and retrieval systems such as the Eastman Kodak Minicard developments.3.12a

In the case of a partially ordered file organization, there are frequently to be found broad groupings of stored items with perhaps random or arbitrary orderings within each group (e.g., a "bin" approach) or orderings by frequency of usage and the like. The latter type of organization is designed to be particularly responsive to a specific clientele or usage environment.3.13

Next to be considered in generalized information processing system design and use are problems of input-output with respect to the files involving considerations of volume and processing time requirements and questions of efficient space allocation

and utilization. Then there are problems of whether new material may be substituted for or used to replace other material in whole or in part, updating problems generally, and questions of whether incoming items require transcriptions, re-recordings, encodings, reductive transformations of various types, or reproduction as microforms, and the like.

A special problem with respect to efficient and economical storage of items to be searched and retrieved is that of the encoding of pictorial and graphic information for compact storage, but with full-scale facsimile reproduction capabilities available upon demand. In this area of pictorial data coding, a special case is that of two- or three-dimensional representations of chemical structure information. As has been noted in the first report in this series, (on information acquisition, sensing, and input) a number of coding, ciphering, and notation schemes have been under development for linear representations of such structural data in machineuseful form.

Holm (1965) notes that: "Much work is under way to store pictorial representation, such as the chemical structure, in packed coded or binary form, with the reproduction of the original pictorial form upon request either as a display on film, or printed." However, other pictorial data, such as photographs. may probably be stored most efficiently in their original form, in reduced facsimile such as microform, or as TV recordings.3.14

Other design requirements relate to problems of access to physical storage and to withdrawals and replacements of items to and from the store. There are maintenance problems including questions of whether or not the integrity of the files must be maintained (i.e., a master copy of each original item accessible at all times), and whether provisions should be made for the periodic purging of obsolete items and revisions of the file organization in ac cordance with changing patterns of usage or re sponse requirements.3.14a Other design questions relate to requirements for display of all or part of ar item and/or indications of its characteristics prio to physical retrieval.

With respect to storage media and equipmen considerations, the information processing systen designer must not only be concerned with the char acteristics of materials suitable for storing informa tion (to be discussed in another report in this series, on overall system design requirements), but als with those characteristics affecting the selected methodology of file organization and the total sys. tem design. For example, he must carefully conside the interaction between such variables as the siz of the file, its organization, the search strategy o strategies to be used, and equipment speed. (Se Blunt, 1965, p. 14).

[ocr errors]

We shall return to these and other R & D con siderations in other reports in this series when w discuss the more specialized problems of informa tion storage, selection, and retrieval systems a such, but we note here that many levels of fil organization and file compartmentalization may b

sed to speed search processes, to conduct multiple earches in parallel, and the like. In particular, ata bank management design requirements create le organization problems of increasing severity.3.14b n general, where there are increasing opportunities or the establishment and multiple-access use of arge-scale data banks, there are increasingly diffiult R & D problems in file organization, file mainteance, and file protection.

As Orchard-Hays has emphasized: "Probably he knottiest problem facing system designers today s how to set up, maintain, control and protect huge ibraries of heterogeneous data, all changing at lifferent rates and in different ways. . . It is clear hat the systems designs of the past are entirely nadequate. . . (Orchard-Hays, 1965, p. 240). Moreover, it is claimed that "unfortunately little research has been done on methods of organizing and structuring large files. As a result, the available concepts are primitive. One principle is clear: our needs for information are changing; therefore, our file organization must be capable of changing." (Borko, 1965, p. 24).

3.3. Associative Memory Considerations The concept of associative or content-addressable memories 3.15 has been hailed for well over a decade 3.16 as the potential. panacea for many interocking, multiple-aspect processing problems, specifically including those of information storage, selection and retrieval systems.3.17 As of 1967-1968, however, little practical realization has been achieved except on a very small scale 3.18 and some pbservers predict that this condition will continue or some time to come.3.18a

Large-scale associative memories have, on the other hand, been simulated on computers, notably n Fuller's 1963 dissertation investigations,3.19 at he Moore School by Prywes and associates,3.20 by andauer 3.20a and by Feldman.3.21 In particular, Prywes and Gray (1963) claim that the use of an ddressable memory to carry out an associative nemory scheme provides a flexibility that would be difficult to achieve with built-in associative hardware.

In a 1967 state-of-the-art review, Minker and able comment: "It was refreshing to see the Government support studies leading to quantitative esults in the study of hardware vs. software implementation of associative memories. We note that hese studies did not show the hardware associative nemories to be significantly advantageous. Addiional quantitative studies are needed to define the ypes of problems for which hardware associative memories of various sizes could be useful." (Minker nd Sable, 1967, p. 151).

The following desiderata, however, are indicative f continuing R & D concern with respect to memory ystem design: "Features of the memory structure esirable for a complex processing system are sted below:

1. The number of different lists of items of information, length of lists, and length of informa

tion of the item in the memory should be perfectly flexible (except for the total memory size).

2. It should be possible to add, delete, insert and rearrange items of information in a list at any time and in any way . . .

3. The nature of the items in a list should not be restricted. An item may be a symbol, a number, a combination of both in any length or an arbitrary list.

4. It should be possible for the same item to appear on any number of lists simultaneously." (Hormann, 1960, p. 4).

The area of "associative" or "content-addressable" memories will thus require considerable further R & D effort in both hardware and software, including new approaches to file organization.

3.23

Starting on the software side, we note the emergence of list-processing languages intended to facilitate symbol manipulation directly and thereby problem-solving activities more generally.3.22 Limitations with respect to multiply-associated data have led to variations involving threaded-lists, inverted lists, and multilist program structures,3 and special systems such as Rover.3.24 A deliberate attempt to compromise between fixed file organization and list-processing techniques, providing for the building of associative sublists if and only if needed, was indicated in the relatively early NBS model of "selective recall". (Stevens, 1960).

For multiply-related, multiply-associated data in a large file, the problems of efficient storage, selection, and access may involve considerable emphasis upon formal modellings of the possible system parameters and configurations. Here, considerations of efficient machine manipulations of graph-theoretic techniques, input-output economics in the most general sense, control system theory, and studies of the problems of aggregation and partial aggregation may have considerable pertinence.

New technologically-feasible approaches to truly massive direct access file media and to file structurings of the associative memory type point to significant alleviation of some of these problems in the not too distant future. On the other hand, it is not yet clear that enough is now known about multiply-related and associated data to establish organizational schemas that would take best advantage of these promised technological advantages. Other questions as yet largely unresolved include those of the development of performance measures adequate to depict the appropriate trade-offs between storage economy and selection and retrieval effectiveness for a particular application. A familiar question in the literature of the information storage, selection and retrieval field relates to the relative efficiency of "linear" or "unit record" or "termon-term❞ as versus "inverted" or "item-on-term" files, and combinations of these two approaches.3.24a

Some investigators who are concerned with problems of efficient file organization and file. structuring from the points of view of effective

compartmentalization and efficient search strategies have never-the-less tended to neglect the problems and prospects of screening or sieving devices as an important contribution to search tactics.3.42b

An example of compartmentalization and screening techniques has been suggested in the case of fingerprint identification as follows: "If each fingerprint in a set is simply classified according to whether it conforms to a particular type of fingerprint. pattern, e.g., 'whorl', the file can immediately be divided into 1,024 separate file sections representing the different possible combinations of the 10 fingers. More detailed analysis permits further refinement of the groups. With over a thousand file sections and the potential for easy subdivision within each section, the searching of even a multimillion-print file is not too forbidding." (Cuadra, 1966, p. 7).

On the hardware side, we are faced with severe problems of economic and practical feasibility in achieving large-scale, relationally associated data files to date. Small, very fast (e.g., tens of nanoseconds performance), memories of the associative or content-addressable type are beginning to appear in operational systems, primarily as "scratchpad" memories, which are defined as "small uniform access memories with access and cycle times matched to the clock of the logic" of the main processor. (Gluck, 1965, p. 662).

3.25

These scratchpad memories are typically used for such purposes as reducing time of access to instructions, microprogramming, buffering of instructions or of data that is transferable in small blocks (as in the "four-fetch" design of the Burroughs B 8500 system),3. storage of intermediate processing results, table lookup operations, use as index registers,3 3.25a and, to a limited extent, content addressing.3.26 Gunderson et al., of Honeywell (1966), discuss associative memory techniques as used for control functions in a multiprocessor system – more specifically, to provide dynamic control over processor assignments, to mechanize automatic page turning schemes, and to provide other functions relating to I/O I/O executions and to parallel processing. 3.27

Other than such scratchpad memory usage, several special-purpose and experimental associative memory developments may be of interest. One is Librascope's APP (Associative Parallel Processor), as described by Fuller and Bird (1965), intended for use in such tasks as pattern-property-extraction and pattern classification.3.27a Another example is the Pattern Articulation Unit and related parallel processing capabilities of the ILLIAC III computer. 3.27b

In a 1966 survey, Hanlon reports that "memory cycle times have been reported as low as 50 nanoseconds" and that "although the majority of research to date has been with small memories (up to 1000 cells), projections are indicated in the 107-108 bit range. (p. 519)" Hobbs in one of a series of state-of-the-art reviews (1966) concurs in the opinion that advantages can be attained and dis

advantages lessened by the use of a relatively small associative memory coupled to a large capacity random access store.

There may be other, as yet inadequately explored alternatives, however. At Sylvania, for example. a reference-pattern-plane organization for the processing of unknown pattern inputs against many stored reference pattern property-criteria, has been experimentally realized in a combination of automatically processed plastic sheets affecting the behavior of a solenoid-transformer array for character recognition purposes. There are possibilities that the same addressed, transformer-sheet arrays. addressed in parallel searching mode, can be utilized as a practical associative memory of somewhat larger capacity, but slower speed, than those! techniques available for internal 3.28 scratchpad auxiliaries.

In the light of present limitations of associative memory developments, Giuliano (1965) suggests this alternative:

"In my opinion . . . it would pay to look further into the area of large capacity, inexpensive per manent memory devices which would handle associative processing in a special-purpose manner". (p. 260) Wilkes (1965) suggests still another possi bility: "So far the slave principle has been applied" to very small super-speed memories associated with the control of a computer. There would, how ever, appear to be possibilities in the use of a norma sized core memory as a slave to a large core memory . . ." (p. 270)

Climenson comments: "A basic organization con cept receiving little attention recently is King's photostore, where a large read-only disc memory is considered a logical extension of the stored program'. concept. The store is content-addressed, using a longest-match principle; the function found can be data or instructions, or both. The photostore is men tioned here because of the ultimate influence such devices could have on file organization. There may be renewed interest with the announcement ITEK's version of the photostore: the Memory Cen' tered Processor (MCP). ITEK's view is that the MCF is eminently suitable for sorting, compiling, file con version, typesetting, and a host of other applica tions beyond the usual table lookup and file search." (Climenson, 1966, p. 112).

O

Present use of large-capacity associative memory techniques, however, has been limited not only by technological constraints but also by the even more difficult problems of deciding, in advance, what as sociations in a given body of data are most likely to be valuable for future processing service requests. Similarly, there are questions of how to represen efficiently the multiple cross-associations and inter dependencies that may be identifiable. For example in a 1966 study, Dugan et al. conclude that consid erations involving the interrelationships betweer, associative memories and general-purpose com puters in various system configurations are the most important from the standpoint of systen effectiveness.

Licklider has remarked that: "Associative, or content-addressable, memories are beginning to make their appearance in the computer technology. The first generation is, of course, too small and too expensive for applications of the kind we are interested in here, but the basic schema seems highly

relevant." (1965, p. 64). He adds a caveat, however, as follows: "Only when the relative merits of various associative-memory organizations are understood in relation to various information-handling problems, we believe, should actual hardware memories be constructed." (p. 65).

4. Mass Storage Considerations

Turning next to very-large-capacity storage requirements, especially for permanent or archival information storage (such as record files, document collections, and libraries), we note the questions of digital as against facsimile storage media and techniques, the problems of document miniaturization or compacting of the document store, and the availability of so-called random access devices and systems. (The development of general purpose file management procedures and programs will be considered in a later report in this series).

4.1. Digital vs. Facsimile Storage

In the general area of efficient and economical storage, there is considerable continuing controversy with respect to digital as against facsimile storage, especially for documentary items to be maintained in files, with various levels of possible compromise also to be considered. There are, indeed, many considerations indicating that paper records, documents, and books will be always with us (the most obvious form of facsimile storage being that of storing the physical item itself), notwithstanding the possibilities for future development of new forms and media of information recording, publication, and dissemination.4.1

Furthermore, "the storage of the massive amounts of lexical material in libraries in digital (machine readable) form such as on the card catalogs, not to mention citations, abstracts, tables of contents or Full text, requires memories of a size and organizaion not being met elsewhere in the data-processing ndustry." (King, 1965, p. 91). On the other hand, however, it has also been suggested that "it will not be long before it will be cheaper to store english text n the mass memory of a computer system than on paper in cabinet files." (Fano, 1967, p. 32). In general, there are both advantages and disadvantages In the choice of either digitalized or facsimileeproducible storage for documentary items.

4.1.1. Facsimile Storage

Let us consider first the more conventional case of facsimile storage which may consist of the storage of one or more copies of the original record or docunent itself (e.g., carbon copies in correspondence iles or bound volumes of printed pages and books ›n library shelves). Even in the most conventional ibrary case, however, critical system consideraions may arise in terms of the questions of whether, ind when, to bind or to film.4.2

With respect to problems of document miniaturization for more compact facsimile-reproducible storage, the solutions of preference, to date, are those involving the use of microforms.4.2 The actual microform used may be roll microform; 4.3 cassettes, cartridges, or strips; 4.4 microfilm aperture cards,4.5 or microfiche. 4.6

Beyond the conventional microform having typical reduction ratios of 35-50:1 or less, a few systems involve 100:1 or greater reduction ratios, but typically only for very large, and very expensive, experimental systems.4.7 Other factors are technological and involve, for example, the effects of size, contrast, media, resolution and other factors on the usefulness of microcopy, especially at high reduction ratios. 4.8

Another approach to document miniaturization with facsimile image reproduction capabilities involves the use of video tape, but areas of continuing R & D concern are generally similar to those of the photographic-media microform techniques. In addition, it appears that video tape has the disadvantage of allowing only a limited number of replays (on the order of 100 or less).

Continuing system design problems in the handling and use of conventional microforms include questions of quality control of copy reproduction through several generations from a master 4.9 and of aging and preservation, (Fig. 3) especially of rare or irreplaceable items.4.10 In general, system and clientele requirements continue to point to needs for higher quality and higher resolution, with greater accessibility and convenience of use, but at lower overall costs both to the system management requirements and to the individual user.

It may be noted in passing that an earlier claim for microform storage and retrieval advantages namely, the convenience of using integral index techniques where content-indicating and other selection criteria codes are recorded physically adjacent to the item image(s) and where matching of query and item codes automatically triggers selection and reproduction of the indicated images 4.11-is currently somewhat out of favor. A major reason for this is the developing sophistication of search strategy and multiple index manipulation techniques which may nevertheless be coupled with direct and automatic retrieval from microform files, but there are other reasons as well.4.12

An unusual example combining microform storage of document identifications, citations, and abstracts. with coordinate indexing and Peek-a-boo type search and retrieval techniques is provided in the Microcite developments at the National Bureau of Standards.

[ocr errors][ocr errors][merged small][ocr errors]
« PreviousContinue »