Page images
PDF
EPUB

3.1.2

reflect assessment of costs in relation
to the potential marketability among campus
users and industrial clients of KASC, but
also include consideration of maximum
effectiveness for users in information
seeking. For example, two commercial data
bases are being acquired from the Institute
for Scientific Information for user compar-
ison with the specialized data bases al-
ready available. ERIC files are being
strongly considered for acquisition.

Discipline Coverage:

Discipline coverage for CBIS is intended to
be comprehensive, through locally held data
bases or by remote access to such services
as the New York Times data bank, MEDLINE
and MEDLARS. KASC provides service on cer-
tain data bases, e.g., COMPENDEX, through
other universities participating in the
NASA dissemination project.

3.2 Data Preparation, Conversion and Entry:
Pittsburgh search software (PIRATES) utilizes one
common input format, and all acquired data bases
must be converted. Conversion delays have caused
temporary interruptions to standard services, such
as retrospective search on NASA files.

3.3

Two conversion steps are involved. Since most
data files are received in an IBM-oriented format
(e.g., EBCIDIC code), they are first converted to
a standard text tape format according to DEC
PDP-10 conventions. The second step converts
these tape files into the PIRATES standard search
format, which involves a linked-list organization
of all words in a text record, suitable for a
binary search against a set of user profile terms.

Data Base Contents:

The KASC fee schedule indicates the available data bases, described more fully below. Regarding CACondensates, Pittsburgh has found the five major subject categories particularly suited to its user population. Profile studies have shown that one subject category will provide 75% of the relevant citations in the entire file, while two selected categories can provide 90% of relevant documents.

The Library of Congress MARC file is being used experimentally in conjunction with computer-output

microfilm equipment. The CBIS project foresees this as a means to economically provide specialized card catalogs on microfilm for IUL's and special library centers.

3.3.1 Data Base (1): CA-Condensates (CA-C)

3.3.2

CA-C is the computer searchable complement to the printed publication, Chemical Abstracts (CA), which covers the full range of chemistry, referencing 250,000 articles per year.

CA-Condensates is issued weekly; the con-
tent corresponds to an issue of CA. The
tape version, CA-Condensates, precedes
the corresponding printed issue of CA by
several weeks due to the time required to
print, bind, and distribute CA printed
issues.

The abstracts in CA and CA-Condensates are
grouped into five categories: Biochem-
istry, Organic Chemistry, Macro-molecular
Chemistry, Applied Chemistry and Chemical
Engineering, and Physical and Analytical
Chemistry. The first two groupings are
published as an odd numbered issue one
week, and the last three groupings are
published as an even numbered issue the
following week. Searches may be limited
to odd or even numbered issues if desired.

Pittsburgh has available seven volumes
beginning July 1968.

Data Base (2): Chemical Titles (CT)
CT, which is issued by Chemical Abstracts
Service, contains journal references to
approximately 4,500 articles per issue
appearing in 650 important U. S. and non-
U. S. chemical and chemical engineering
journals. Titles that appear in CT repre-
sent over 65% of the total abstracts that
later appear in Chemical Abstracts. Chem-
ical Titles offers journal references to
articles approximately 70 days before their
abstracts are published in Chemical Ab-
stracts. In many cases titles appear in

4.

3.3.3

3.3.4

3.3.5

Chemical Titles before the journal containing the article is published. Thus Chemical Titles is valuable as an alerting service.

Data Base (3): Computerized Engineering
Index (COMPENDEX). COMPENDEX, issued
monthly by Engineering Index, Inc., is
the computer-readable version of the
printed publication, Engineering Index
Monthly, which contains references spanning
all engineering disciplines. These refer-
ences are taken from professional and trade
journals, publications of engineering
organizations, papers from conferences and
symposiums, and books and other documents.

This data base is made available at Pittsburgh through Indiana University (batch service only).

Data Base (4): GRA

Government Reports Announcements is available in machine readable form from the Department of Commerce. It consists of unclassified government reports resulting from government-sponsored research.

Data Base (5): NASA File

The NASA File consists of the STAR and IAA documents. Tapes are received monthly containing approximately 4,500 documents.

3.3.6 Data Base (6): ASM/IM

The American Society of Metals' METADEX file issued every month contains approximately 1,600 entries.

This data base is made available at
Pittsburgh through the University of
Connecticut, an associate center in the
NASA regional dissemination activity.

HARDWARE CONFIGURATION

4.1 Main Frame:

Digital Equipment Corporation PDP-10 dual processor (KI-10) configuration.

5.

4.2 Core Size:

4.3

4.4

256K words per processor

Mass Storage Devices:

15 RP03 magnetic disk drives, 51M char. each.

Input-Output Devices:

Data terminals must be ASR-33 teletype compatible.

SOFTWARE CONFIGURATION

5.1 Operating System:

5.2

5.3

Standard time-shared operating system for PDP-10 dual processor offered by DEC.

Operational Environment:

Multiprogrammed, with foreground time-sharing and background batch processing.

Information System

5.3.1 Name and Brief Description:

PIRATES is a full text search system devel-
oped at Pittsburgh, based upon original
design concepts and aimed at effectively
utilizing the particular capabilities of
the PDP-10, including interactive time-
sharing. Users may specify search terms
of 24 characters maximum length, embedded
blanks being prohibited. Left and/or right
truncation may be used to construct a stem.
Boolean AND, OR, NOT logic may be used to
combine terms in a profile, and a connector
operator is provided to concatenate single
terms into multiple word phrases. The types
of data to be searched for a match (e.g.
title, author, journal citation, keywords)
may be selected in various combinations.
The profile specification statements have a
card-oriented format (e.g. columns 73-80 a
are not used, and all statements must begin
with a proper character, not a blank, in
column 1) whether entered from a console or
via a card deck. The software gives mini-
mal guidance and cues to the console user
who is asked to answer a few questions to
select various options (e.g. DO YOU WISH
DOCUMENTS DISPLAYED?) and is given the

6.

several possible acceptable answers, (e.g. ANSWER "YES" "NO", "HEAD", OR "LEVELS".) After examining potential retrievals from on-line searching of a small file, the console user may restart the profile entry sequence to refine his profile definition.

5.3.2 Source Language:

5.3.3

5.3.4

5.3.5

PIRATES is implemented in PDP-10 assembly language, using available macros of the operating system.

Mode:

The basic search strategy is to serially pass all document records in a file against the collected set of profiles, looking for a match with any of them. The system operates using about 10K words of core memory during the search.

Generalized Packages Used:

Exclusing operating system macros and utility routines, the system uses no major generalized software packages. PIRATES is modularized into a basic search package and a "co-routine controller" which governs input-output and the order of functions performed.

Availability:

Although partially developed with University funds as well as NASA and NSF funds, PIRATES is available with minimal charge for reproduction of tapes, decks, and existing documentation.

COMPUTER PROCESSING FUNCTIONS

Development and improvement of PIRATES is continuing, and documentation on its design has not been completed. Certain design attributes are reported from conversation with the principal designer, Professor Dale Isner.

6.1 Data Definition:

Searchable files are created by one program, CONVERT. The program accepts input text on-line from a data terminal. Alternatively, a text file may be created on-line using the PDP-10 system

« PreviousContinue »