Page images
PDF
EPUB

Chapter 2

READER TYPE DEVICES

2.1 Equipment Category: Optical Character Readers (OCR)* - An optical character reader is a device which recognizes the shape of characters by the contrast of light and dark areas created when when light is reflected from the surface of a document (or transmitted through a film).

OCR's are generally classified by three characteristics:

the style of characters or fonts which can be the character sets and repertoire which can be

recognized

recognized reco

the size or type of forms which can be accommodated Based on the number of character styles (fonts) an OCR can recognize, it is classified as a single-font, multi-font, multiple-font, omni-font, or handprint reader, Within these classes, OCR's may be further divided by the character set(s) recognized: numeric, alphanumeric, symbols, and special function notation. In addition, OCR's are distinguished by the size or type of forms that the device will accept. Page readers typically accept forms up to 11 by 14 inches while document readers typically forms

up to 4 by 9 inches. An example of a form used on a document reader is the "turnaround" credit card document. Document readers generally read one or two lines per document, while page readers are capable of reading a whole page.

Other forms frequently used as input to OCR devices Include Journal tapes (e.g., paper cash register rolls) and microfilm.

There are two basic modes of operation used with OCR systems. When the generation of source documents can be controlled (1.0., the correct font is used, the is used, the quality of

paper and of the printing is consistent, and acceptable document sizes are used) a Direct Read mode is used. This means the source documents are read directly by the OCR. However, if the generation of source documents cannot be controlled, a Retype mode is used. In this mode, part or all of the source data is retyped on acceptable forms using an appropriate font.

Before the advent of key/disk systems, typewriters and subsequent scanning

⭑OCR is also used as an abbreviation for
Recognition.

typing on electric (the retype mode of

Optical Character

operation) proved to be more economical and accurate than keypunching and verifying, if the installation was utilizing a large typing pool comprising thirty or more employees. At the present time, using the retype mode of operation would, in most cases, be difficult to justify.

2.1.1 Equipment Characteristica

2.1.1.1 Data Capacity/Speed

2.1.1.1.1

re

Iranafer¿Icansmission Speed - OCR speeds limited by three factors: the rated speed of the communications interface used, the capabilities of the mechanism that handles the documents or pages to be read and the electronic character reading speed of the device. 2.1.1.1.2 Volume Per Unit of Iima - Typical speeds for Page and document readers are as follows:

Page Readers

Pagga/Min.
Up to 400

Document Readers Up to 1600

Characteca/Soc.

Up to 3600

Up to 3600

2.1.1.1.3 Operator Speed - Not applicable.

2.1.1.2 Operational and Environmental Requirementa

2.1.1.2.1 Lamperature/Humidity Reguiramanta - Most OCR units can operate in ordinary office environments; however, some of the larger units require supplemental air conditioning. 2.1.1.2.2 Acea/Physical Location - Most of the OCR units can be connected directly to a computer through an I/O channel similar to that used by a card reader. Other OCR units are designed for remote operations and are connected to a computer through a modem and communication lines. Some OCR's are designed to be used in an off-line mode and provide an intermediate magnetic tape, magnetic disk, punched cards, or punched paper tape.

2.1.1.3 Input Characteristics

[ocr errors]

2.1.1.3.1 Record Sizes Minimum and Minimum and maximum record sizes for OCR devices range from several characters per document for the simple, single line special purpose devices up to approximately 7,000 characters per page for the more sophisticated page readers.

2.1.1.3.2 Character Sets Available - The most significant difference among OCR's is their ability to recognize different shapes (styles) of machine-generated characters (fonts) and their ability to recognize handprinted characters. The more common machine-generated fonts include ANSI OCR-A and OCR-8, Farrington 78, and IBM 1428. Most OCR's with the ability to recognize handprinted characters

are limited to a small repertoire of characters including numerics only or numerics and few alpha and special characters such as C, T, X, N, Z, +, and -. However, several OCR manufacturers provide machines that can recognize the full set of alpha and numeric handprinted characters. Although handprinting at the source seems to be a very good technique of collecting data in certain application areas where data collection personnel can be adequately trained, error rates ranging from 3 percent to 20 percent are common depending on the operating environment and data preparation controls. FIPS PUB 32 "Optical Character Recognition Character Sets" and FIPS PUB 33 "Character Set for Handprinting" provide information in this area of interest.

2.1.1.3.3 Special Eorm Raguiramanta - Forms are probably the most critical requirement of an OCR operation. The design of the form must meet the requirements of the reading device and also have an efficient layout for the user. The OCR depends on the contrast of the characters to the background of the form for character recognition. The paper must be of high quality, free of dirt and other foreign substances, and its thickness must be in a specified range. Data usually must be in predefined locations and the characters properly registered,* crisp, and well defined with very little skew. FIPS PUB 40 "Guideline for Optical Character Recognition Forms" provides guidance in this area.

2.1.1.4 Qutput Capabilities – The input to optical character readers is in human-readable form and the output is in machine-readable form. Optical character readers can operate in either an online or an off-line mode. When operating online, the OCR transmits to a host computer via communication interfaces or direct channel connection. Off-line, the OCR uses a temporary storage medium for output. This may take the form of punched cards, paper tape or magnetic tape. Some OCR's use a CRT for communications with the operator, displaying operational status and error conditions. The CRT also facilitates error correction ་ ་ employed with a keyboard.

2.1.1.5 Edit/Validate Capabijjities

The control Component of any OCR system is a minicomputer. As with medium and large scale computers, the capability to manipulate and edit or validate data in local storage is thus an inherent feature of all OCR equipment.

The following are examples of editing functions that can be performed by the OCR:

. Character deletion

*The physical positioning of

character, vertically and

horizontally, with relation to the form.

[merged small][ocr errors][ocr errors][merged small]

Reject correction can be accomplished "on the fly" using CRT device.

[merged small][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small]

2.1.3 Operator Beguiramanta · If the the OCR system uses keyboard device (CRT or teleprinter) for error correction, keying skills are beneficial. However, the personnel who place the data on the forms read by the OCR are the critical link in the success of an OCR system. This is especially important where the reading of handprinted characters is involved. If the OCR input is prepared on a typewriter, the typist should be specifically trained in the preparation of the OCR forms. If the input will be handprinted, the Personnel preparing the OCR input should be thoroughly familiar with the rigid requirements of OCR handprinting.

2.1.4 Coat Bangea - The price of OCR has decreased considerably during the the last several years due to the decreasing cost of electronic logic components. Document readers can be rented for under $1,000 per month, single font page readers for $2,000 to $4,000 per month, and multi-font readers for over $10,000 per month. Purchase price for single-font readers range from $20,000 to $30,000 while purchase purchase prices prices for multi-font readers range from $100,000 to $1,000,000. Although low-cost OCR devices offer an economical means of data capture, they require tighter controls on preparation of forms than do higher-priced, more sophisticated units,

2.1.5 Iypical Data Entry Applicationa

· In many instances,

OCR's are used as

direct replacement for keypunches with typing and visual verification being substituted for the keypunching and verification processes, An application which requires the generation of a hard copy as a by-product of data collection could lend itself very easily to an OCR system. OCR's have had a tremendous impact on applications using "turn-around" documents as in the insurance, retail and oil industries. The ability of Optical Character Readers to read handprinted characters should make the OCR a prime candidate for applications where small amounts of data need to be collected from many locations. This ability makes the pencil and paper powerful source data entry

tool.

2.1.6 Advantagea and Strong Points

OCR's convert human-readable and human-prepared

data

directly into computer-readable codes. The key to efficient and more economical data entry lies in the elimination of data transcription and the associated labor costs by direct data capture at the source. Optical Character Readers offer means to accomplish this feat by direct reading of printed, typed, or handprinted data without special manual intervention.

replacement, the Reference sources

When the OCR is used as a keypunch keypunch can be replaced by a typewriter. state that the operation of the typewriter is is faster and generally more accurate than keypunching. Users consistently support the idea that accuracy via typing and proofreading is comparable with that obtained by keypunching and verifications i.e., 1 to 3 percent error rate.

Extensive edit/validation procedures can be applied to the data as it is being read by the OCR. Software executed on the OCR's minicomputer can assume some of the data manipulation normally accomplished on the larger host computer.

2.1.7 Disadvantagea and Limitationa

OCR operations are unique in that document control plays a significantly more important role in reading reliability than does any other single consideration.

there is strict control over source documents, OCR works well. Controlled conditions exist when operating Personnel are experienced, well trained, and can be directly supervised. While low-cost OCR devices offer an economical means of data capture, they require tighter controls over the form than do higher priced, more sophisticated units. To date, only very large large applications can justify the expense of the more sophisticated character readers capable of functioning satisfactorily with completely uncontrolled field documents.

« PreviousContinue »