Page images


p. 5-1).

[ocr errors]

throughput and minimum requirement for storage format, but all records within a file must be identical space. Ease of programming, program size, and in format. New files may be created or old files running time are also important considerations.' changed to meet new requirements. Data can be (Bonn, 1966, p. 1866).

added to files, or changes can be made to correct "The most widespread information retrieval errors in existing files." (Baker and Triest, 1966, systems are those for data base file management, which process records organized into fields, each "The Formatted File System (FFS) developed for containing a type of data in the record." (Hayes, the Defense Intelligence Agency is a general1968, p. 23).

purpose data management system for the IBM 1410 "Data Management Problems. It is interesting which is coupled to the 1410/7010 Operating Systo review a few of the problems raised by G. H. tem. It is oriented to a set of users (technicians) who Dobbs at the summary session of the first sym- can maintain an intimate knowledge of the structure posium on data management systems mentioned of their files and the query language to access them. above. One of these was the diverse terminology It employs both tapes and disc to define, maintain, and points of view, which make it difficult to extract and query a set of independent files. A table of any basic principles. Another was lack of concern contents and cross index can be defined and mainto input quality control. Still another was lack of tained on tape or disc. An FFS file must have a appreciation for the real-life data base problems unique key field group in each record. A single level as the user sees them. At the second symposium, embedded files (periodic sets) is permitted in the two years later, Galentine described the relative record. Except for the last field of a record, all fields lack of progress as ‘apalling.' Dobbs, at this sec- are fixed in length. The query language permits ond session, identified several specific technical general logical conditions and relations and provides areas needing further development – among these, several geographical and statistical operators. FFS is the ability to allow an unsophisticated user to one of the few general-purpose data management describe data structures, capability to change data systems which are operational.” (Minker and Sable, and file organization, ability to share files among 1967, p. 148). simultaneous users with adequate file security "The users of non-numeric systems had require‘lockout' procedures, and the need for more flexible ments for very long alphanumeric records. Some report formatting.” (Climenrson, 1966, p. 128). of the records were formatted as were unit records 5.19 One example of a developmental system

but the fields were not all of predetermined length. claiming to incorporate these features is the Catalog

To cope with this, the formatted file concept was Input/Output System at RAND Corporation. More

developed. It had the ability to handle records of specifically: "Computer applications in linguistics,

variable length by referring to a data definition library science, and social science are creating a

which described the permissible record contents, need for very large, intricately structured, and in

context, and internal structure. The data definition some cases tentatively organized files of data. The

could be carried within each record but was more catalog - a generalized format for data structures

normally separated into a data definition table to is designed to meet that need ... The computer

eliminate redundant entries. The formatted file programs will:

could handle variable length records but could not

interpret completely free form text. Special techa) Facilitate partitioning, rearranging, and con

niques were developed to handle free text which, in verting data from any source in preparation

general, relied on the usual delimiters in the text, for writing the catalog.

such as periods and commas, to identify the end of b) Format and convert data for printing on one

each structural unit. Free text then could be of a variety of printers.

interpreted by scanning it as though the computer c) Sort the data elements within a catalog and

were reading it from left to right." (Aron, 1968, p. 7). merge data from two or more separate catalogs.

5.21 "Univac's B-O or Flow-Matic, which was d) Restructure a file by rearranging the order running in 1956, was probably the first true Data

of classes of data - catalog transformations. Processing compiler. It introduced the idea of e) Address nodes in the structure, retrieve data file descriptions, consisting of detailed record and

from the structure, and add to or delete from item descriptions, separate from the description of the structure - file maintenance.” (Kay et al., program procedures. It also introduced the idea of 1966, pp. 1-2).

using the English language as a programming

language.” (Rosen, 1964, p. 8). 5.20 “In recent years there has been a rapid 5.22 "General Electric has announced GECOS growth in the use of so-called 'formatted file sys- III (General Comprehensive Operating Supervisor tems’. These systems are general-purpose data III), an advanced operating system for large-scale storage, maintenance and retrieval systems de- computers. GECOS III integrates requirements for signed to provide the user with a maximum amount on-line batch, remote batch, and time-sharing into of flexibility. They feature the use of a single set of one system using a common data base. The 'heart' programs to handle a variety of demands on a group of the GECOS III is a centralized file system of of large files. Each file may possess a different hierarchical, tree-structured design which provides

[ocr errors]

multiprocessor access to a common data base, full file protection, and access control.” (Commun. ACM 11, 71 (Jan. 1968).)

5.23 “The Integrated Data Store (IDS), developed by Bachman of the General Electric Company, is a data processing programming system that relies on linkage of all types for its retrieval and maintenance strategies. Through extensions to the COBOL language and compiler, IDS permits the programmer to use mass random-access storage as an extension of memory.” (Minker and Sable, 1967, p. 126).

. “The IDS file structure allows a linked-list structure in which the last item on every list is linked back to the parent item that started the list. Thus, it is possible to return to the parent item without a recursive list of return points. In IDS, each record is an element in a linked list. A file of records may be subordinated to a master record by linking it to the first member of the subordinate file and chaining from that point, through each record in the subordinate file, through the last one, and back to the master record. There is no inherent limit in IDS to the number of records that may exist in a chain or to the number of detailed chains that may be linked to a given record with a single master record. There is also no inherent limit to the depth of nesting that is permitted; i.e., a record in a chain that is subordinate to a given record may, in turn, have subordinate record chains.” (Minker and Sable, 1967, pp. 126-127).

“The G.E. Integrated Data Store is an example of a linked file organization. Master and detail items are organized in a series of linked chains to form records. Each chain at least one master item and one more detail items. Each item contains linking or chaining information which contains the addresses of the next item and the previous item in the chain. An item may belong to several chains, and linking information to all chains is included in the record." (Bonn, 1966, p. 1867).

5.24 "Franks describes the SDC Time-Shared Data Management System (TDMS), whose design draws upon ADAM and the earlier LUCID. TDMS employs an interesting data structure involving only a single appearance of each item of data with appropriately organized pointers to represent order, multiple instances, etc. This is a muchdiscussed idea that has needed exploration in a large system. TDMS, like too many similar systems, lacks means by which the system can ‘learn' frequently traversed paths through the data, a mechanism that would permit subsequent identical or similar searches to be handled more efficiently." (Mills, 1967, p. 240).

“Williams & Bartram have developed a report generator as part of the TDMS. The object of this program is to give a nonprogrammer the ability, while he is on-line with the system, to access a large file of data for the purpose of de

scribing and generating a report. Another work in this area is by Roberts.” (Minker and Sable, 1967, p. 137).

“The file organization of TDMS is an inverted tree structure with self-defining entries. This organization has made it possible for TDMS to meet its goal of providing rapid responses to unpredictable queries in a time-shared environment. Although this organization requires more on-line, random-access storage than most other file organizations, the benefits obtained far outweigh this storage cost." (Bleier and Vorhaus, 1968, p. F97).

5.25 “IBM has developed a Generalized Information System based on experience gained with military file applications. Because of IBM's intention to provide this system as part of its applications library for the System/360 series, this system undoubtedly will be examined quite closely by a variety of potential users. The system has two basic modules: one for defining, maintaining, and retrieving files of 'formatted' data, and one for text processing and concordance-type retrieval.” (Climenson, 1966, p. 126).

“The text-processing module of GIS includes three basic files: (1) a dictionary ordered on key word, each record containing: pointers to synonyms and equivalents, key word frequency data, and a pointer to (2) the inverted file, which can contain a variable number of document numbers indexed by the given key word. Finally, (3) the master file can contain bibliographic data and all words stored for that document. Given the document number, the bibliographic data, and key words from the document, the system can automatically generate the above files." (Climenson, 1966, p. 127).

5.26 “GRED, the Generalized Random Extract Device developed at Thiokol Chemical Corporation and described by Heiner & Leishman, is written in COBOL. Files within the system follow the COBOL restrictions of fixed-record size and fixedfield length size for each field; the files are also restricted to tape. File definitions are provided at run time by the user, who specifies the file and record description. A file definition library option is provided and can be maintained by input request. The system, developed for the IBM 7010 computer, has the ability to sort and output data, and is useful for small files." (Minker and Sable, 1967,


p. 147).

5.27 “At a second symposium held in September 1965, a benchmark problem was used to organize the discussion of specific systems. The problem involved a management data base, that is, an organ: ization table and personnel files. Five systems were given the same file data and asked to create the file(s) and perform several kinds of operations on it. The five systems were: COLINGO of MITRE Cor. poration, the Mark III File Management System of Informatics, Inc., the on-line data management system by Bolt, Beranek, and Newman, Inc., the BEST System of National Cash Register, and the


*’ "Integrated Data Store' of General Electric." (Climenson, 1966, p. 125).

5.28 “This work has led to the definition of FILER (File Information Language Executive Routine), the formalization of a calculus of operations with specific relevance to problems of file organization." (Hughes Dynamics, 1964, p. 1.3).

5.29 “Apart from languages at the implementation or procedural level, a number of user-oriented systems employ list-structured data as a basic mechanism. Bernstein & Slojkowski describe a system called Program Management System (PMS), which stores data in a two-level file structure. Associated with each file is a file name, a list of pointers to its subfiles, and a list of pointers to other files." (Minker and Sable, 1967, p. 126).

5.30 “The long-range objectives are:
“1. To define some of the characteristics of large

files, and to develop the structure for a large
file of heterogeneous scientific and technical
information including chemical structure
representations, with particular attention to
the necessity for:
a. Manipulating information which is in

some cases formatted and in others

completely amorphous. b. Making provision for inclusion of cer

tain kinds of information when it exists, and for later filling of gaps when the information is not available at the time

of file initiation. c. Creating a multi-level file of information

so that provision may be made for the inclusion of general information, as well as the addition of various levels of specificity as required by the user, either simply because the additional levels of specificity exist and might be useful at a later date or because there is a user

requirement for that degree of specificity. d. Examining the inter-structuring of files

that are part of the larger files but which

may be geographically separated. “2. To provide list-processing capability in the

file structure in order to:

a. Maintain flexibility in the file.
b. Provide for efficient

c. Permit additions to files, both in

classes existing already in the system and in the entry of new classes of

information. d. Free the system from the constraints

of fixed-length and formatted files. e. Permit aggregations of data from files

that are geographically separated. "3. To investigate the techniques of file manipula

tion in order to provide systems of subfiles, special files, desirable redundancy in exchange for multiple access to information, and the necessary keying or

referencing facility required for such a system of multi-level, multi-subject files. This work must take into account also the planning and development of several kinds of computer programs which are required

for file maintenance "4. To determine the kind of organization “

of information which will most readily permit questioning of the file by different groups of questioners who have varied (and varying) requirements for the kinds of information contained in the file, and who have, furthermore, requirements for different degrees of specificity in the information they are seeking." (Anderson et al., 1966,

pp. 2–4). 5.31 "Fossum & Kaskey examine different kinds of file organization and investigate the potential of a 'list ordered file for economizing on the work-load on a computer when carrying out a search. The data base used is the terms assigned to a number of DDC documents, using the DDC thesaurus. The relevance of this study for this chapter is the analysis of word associations, and the objective of the analysis is, in effect, to permit a certain amount of precoordination of terms, in order to reduce the amount of processing necessary in the retrieval operation. The means to achieving this objective is the determination of the mutual exclusiveness of terms. Unlike the rigorous intellectual marshalling of terms in facet analysis, the procedure here is to determine mutual exclusiveness on the basis of whether or not terms appear together in the indexing of individual documents. The attempt to develop an economic list-organized file, the specific purpose of the work reported here, is abortive, but the technique of handling terms in this way remains an intriguing possibility as a machine aid to the generation of schedules for faceted classification schemes." (Sharp, 1967, p. 102).

5.32 “The fundamental importance of data structures may be illustrated by considering the problem of designing a single language that would be the preferred language either for a purely arithmetic job or for a job in symbol manipulation. Attempts to produce such a language have been disappointing. The difficulty is that the data structures required for efficient implementation in the two cases are entirely different. Perhaps we should recognize this difficulty as a fundamental one, and abandon the quest for an omnibus language which will be all things to all men." (Wilkes, 1968, p. 5).

5.33 “Historically primary preoccupation with three classes of data structures (real-complex scalars and arrays, alphanumeric strings and files, and symbolic lists) have led to three major language developments; exemplified - but not exhaustively defined - by ALGOL, COBOL, and LISP, respec. tively. A major concern of procedural language designers is the reconcilization of these diverse




languages has as yet been established as 'standard' even in an informal sense." (Raphael, 1966, p. 67).

“The concept of list processing, or chaining, has been used as a technique for the manipulation of logical data strings for many years and has been formalized as a language for handling data in computer storage. List processing has also been utilized with limited success for the control of data in directaccess storage devices. In general, when list structures are used for external data control, only a subset of the possible data structures is implemented, and the logical and physical relationships are approached as a single entity. Thus, the many ventures into this area are highly individualized, resulting in duplication and incompatibility.” (Henry, 1969, p. 2).

5.38 "List processing is a convenient mode of description for many of the more interesting and provocative areas of computer application, but the available languages - such as IPL-V, COMIT, LISP, SIMSCRIPT-require a level of programmer sophistication that makes them almost prohibitive for normal classroom use." (Conway et al., 1965,

[ocr errors]
[ocr errors]

p. 215).

data types and their transformations with one language.” (Perlis, 1965, p. 189).

“Studies in artificial intelligence call for powerful list-processing techniques, and involve such operations as the placing of items on lists, the searching of lists for items according to specified keys, and so on. Languages in which such operations can be easily specified are indicated. The pioneer language in this regard was IPL developed by Newell, Simon and Shaw." (Wilkes, 1964, p. F 1-3).

5.34 "Early list processing languages were invented in order to carry out specific projects of a non-numerical nature. The sequence of languages called IPL ... Started out in order to provide satisfactory methods of organizing information for work in theorem proving and problem solving, a field in which the amount of storage needed is very variable and unpredictable, and in which the structure of the lists carries important information. LISP . . . was developed partly for the use of a project called the 'Advice Taker' which was intended to operate in a complex way on English statements about situations. A language called FLPL... which was embedded in FORTRAN was produced to write programs for proving theorems in geometry. COMIT..

... was devised for language research." (Foster, 1967, p. 3).

5.35 “Furthermore, since a list is itself an ordered set of items which may themselves be lists, there is no limit to the complexity of the structures that can be built up except the total memory space available. Also, there is no restriction to the number of lists on which an item can appear." (Hormann, 1960, p. 14).

“List structures have, as their fundamental property, the substitution rule that allows any symbol in a list to be substituted by a list structure. The natural control procedure for such rules is recu[r]sion: the use of rules whose definition contains calls on itself.” (Perlis, 1965, p. 189).

5.36 “List structures allow dynamic storage allocation for units of data larger than can be stored in a single computer word. Normally blocks of consecutive storage cells are allocated for data sets prior to program execution based on estimates of the maximum size of these sets. If the sizes of data sets are variable or unknown, wasteful amounts of storage must be assigned to guard against program failure caused by overflow at some block of assigned storage. Faced with this problem, Newell, Shaw, and Simon organized information in memory into an associative list structure format.” (Fuller, 1963, p. 5).

“Since the order relation among items is determined only by links, this relation can be changed simply by changing the links without moving the items physically, thus allowing simple and quick changes of organization of memory content. Processes such as insertion and deletion of items in a list become very simple." (Hormann, 1960, p. 13).

5.37 For example, "list processing is relatively new; none of the forms for the components of list

5.39 “While the list processing languages . make it possible to handle reasonably complicated data structures, the programmer is nevertheless faced with a number of difficult problems as soon as he attempts to use such languages in practice ... it becomes necessary to fit the given data structures into what often turns out to be an unnatural format . . . At the very least, this type of data organization wastes a great deal of storage space ... Furthermore, the inefficient space allocation also results in extremely slow execution times for the object programs." (Salton, 1966, p. 205).

"The user of list-processing languages is frequently plagued by the lack of memory space. Sophisticated means of 'garbage collection' have become important to circumvent memory space limitations; however, they constitute mostly a palliative that seldom satisfies the user's appetite for extra space." (Cohen, 1967, p. 82).

“In their relation to machines the list processing languages are burdensome, first because the data structures involved may be represented only through explicit links, and second because the free growth and cancellation of structures requires a troublesome administration of the storage. (Naur, 1965, p. 197).

"Powerful software schemes (e.g., the list

“ processing languages) have been developed to deal with the problem of treating scattered data as a contiguous string, but they pay a very heavy price in memory overhead (in some schemes over three-fourths of the available memory is required to handle the addressing mechanisms) and in the processing time required to perform the address arithmetic.

“An alternative solution is proposed here which involves the addition of a small Associative Memory

(AM) to the addressing machinery of the computer (or peripheral direct access storage device). As will be shown, this hardware modification will permit scattered data to appear contiguous, with only a token overhead cost in memory and processing time.” (Fischler and Reiter, 1969, p. 381).

5.40 “The process of item deletion involves the elimination of an existing item in the file. This task is complicated by the fact that after deletion, linkage information must be modified if the system is to operate properly. This can be accomplished in two ways. In the first case the item can be physically deleted and each list associated with the item can be searched and the associated link deleted. Alternatively, the specified item can be flagged in the control section. Then during retrieval flagged items are ignored." (Prywes, 1965, p. 20).

"There are several penalties paid for obtaining the flexibility of a memory organized into a list structure. One of these is the requirement for additional memory bits to store appropriate linking addresses. A far more serious penalty is the time required to retrieve symbols from lists and for operations required to add symbols to lists, reorder lists, delete symbols from lists, erase lists, transfer lists to and from bulk storage, and manipulate push-down registers. These tasks are taken out of the hands of the programmer by various listprocessing languages, but they still must be performed by the machine." (Fuller, 1963, p. 12).

5.41 “Most list-processing languages have suffered from their inability to deal directly with complex data structure and/or from their inability to perform the complete range of programming language operations upon the data list structures. (Lawson, 1967, p. 358).

5.42 ... In many cases, especially among the list-oriented languages, they simply have not geared themselves to the large amounts of data and data processing required in this special field.” (Simmons, 1966, p. 214).

“The reason list processing has yet to acquire universal popularity in non-numeric data processing is that, while these mechanics are easy in the list processing language, they can be very time-consuming when performed by a computer, so much so that the economics of use of present-day list processing languages in most information retrieval applications is questionable. Because of the saving in programming time that is possible, however, we may anticipate future hardware and software devel. opments that will enable these methods to be made practical.” (Meadow, 1967, pp. 200–201.)

“This suggests that a conventional linked list structure is inadequate for representing map (or picture) structure. Alternative forms might be an associative memory or a more general linked element structure.” (Pfaltz et al., 1968, p. 369).

5.43 “It must be said also that numeric information is often awkward to store in terms of list struc

tures, and arithmetic procedures may be correspondingly difficult to perform.” (Salton, 1966, p. 208).

“The well-known list structure languages such as LISP were not designed for graphics and are not very efficient or easy to use for such multidimensional problems. They are well suited for processing strings of text but break down when two-way associations between list elements are the rule rather than the exception." (Roberts, 1965, p. 212).

5.44 “The list processing section of the compiler should make it possible to handle variable length data structures, in such a way that each data item may be associated with a variable number of list pointers, as specified by a special parameter." (Salton, 1966, p. 209).

5.45 "It might be useful to think about a general graph and tree manipulator, capable of performing most of the common transformations on abstract graphs and trees." (Salton, 1966, p. 209).

“The value of a generalized file processing system is directly related to the variety of file structures it can accommodate. With most existing systems, the user is limited to structures having the form of hierarchies or trees. For many problems, such structures are either not adequate or not efficient; instead, these problems require structures of the form of graphs, of which the tree is a special case. Examples of problems of this type are network and scheduling problems, computer graphic problems, and information retrieval problems.” (McGee, 1968,

p. F68).

“Most complex data structures can be adequately represented as directed graphs. A directed graph is defined informally as a set of labeled points or nodes, and a set of directed line segments or arcs connecting pairs of nodes. An arc running from node x to node y denotes that the entities represented by x and y are related in some way, e.g., x is superior to y, or x precedes y. If x and y bear multiple relationships to each other, they may be connected by multiple arcs, each arc being labeled with the relation being represented.” (McGee, 1968,

p. F68).

5.46 “Since Rover has more features that are common to IPLs than any other language, it may be said to be a member of the IPL family. Some additional features are present in Rover which attempt to provide a more powerful tool for synthesizing complex processes.

“The original design of IPL memory structure has only DLs (down links) which, by themselves can form lists and achieve a great flexibility. We are introducing ULs (up links] here in order to give greater flexibility and to overcome some of the difficulties encountered by having only DLs. . .

“Since the order relation among items is determined only by links, this relation can be changed simply by changing the links without moving the items physically, thus allowing simple and quick changes of organization of memory content.

« PreviousContinue »