NIST Technical Note

maxima in the response. Combining these constraints, the solution becomes:

f(x) = a1 exp(ax) cos(wx + 01) + a2 exp(-ax) cos(wx + 02)

[8]

When the values of the constants are solved for, the solution can be approximated by the first derivative of a Gaussian function. To expand this solution to two dimensions, a Gaussian is also used to project the direction of two dimensional slope to one dimension. These two operators are convolved together. Since the edges can be approximated by linear segments, highly directional operators at several orientations are used. Varying widths of the operators to cope with varying signal to noise ratios in the image. The results are integrated into a single description.

The one dimensional edge operator described by Canny provides similar results to the zero-crossing of the Laplacian operator described by Marr and Hildreth [MARR75]. The Laplacian operator, shown below:

is a digital approximation to the second partial derivative of 82f/8x2 + 82/8y2 in the same

way that the gradient methods discussed previously are approximations to the first partial derivatives. The Laplacian operator, though, does not provide useful directional information and doubly enhances the noise in an image. The work by Marr and Hildreth advocates filtering an image using four Gaussians which have different bandpass characteristics. The filtered images are then convolved with the Laplacian operator, and places where changes in sign occur correspond to edges in the original image.

B8.1.2. Template Matching

In template matching, an edge pattern is centered on each pixel in an image, and the closeness of their correspondence is measured. Since these templates often represent second differences of step edges, the operators are similar to those difference operators in section 8.1.1. The Prewitt and Sobel operators can be generalized to eight masks corresponding to eight edge orientations. The Kirsch operator is related to the edge gradient by:

S(x) = max [1, max Σ|f(xg) - f(x)| ]

[9]

where f(x) are the eight surrounding pixels of x. The corresponding masks are shown be

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

In practice, the operator is sensitive to the magnitude of f(x), so that templates with larger spans offer the advantage of being less sensitive to noise. However, larger templates have difficulty resolving the detail of fine texture. Marr [MARR81] present methods for choosing the appropriate span. Using these ideas, Nevatia and Babu [NEVAT80] use six 5x5 masks that correspond to an ideal step edge at various 30° orientations:

Another template-based method, used by Frei and Chen [FREI77], chooses orthogonal 3x3 masks as a basis for expansion. This expansion yields a space of 3x3 masks with which

local neighborhoods in the image can be compared. Given these templates, an edge response is measured by determining the generalized correlation measure between grey level values in the template and those in the image window. Define the mean, a, and the variance, σ, in the template as:

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][ocr errors]

(where p1 is a pixel in the template at 1, m and the template size is odd or n = 2k+1) and the

1,m

[merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small]

Then the generalized correlation measure between the image and the template at pixel i,j is:

[11]

[12]

[13]

[14]

Close correlation between the template and the window indicates an edge of the type depicted in the template.

B8.1.3. Parametric Edge Modeling

Parametric edge models provide more information than the magnitude and direction of the gradient as discussed for previous edge detection methods. This approach involves expanding the image and the step edge functions in terms of a set of orthogonal basis functions. Hueckel [HUECK71] proposed analyzing the frequency behavior by observing the zero

crossings of the following eight basis functions defined on a disk:

By minimizing the sum of the squared error between the image and the edge model in the circular neighborhood, measures of the slope of the step edge and the average intensity values on either side of the step edge can be obtained. Other uses of parametric edge models include models by Nevatia [NEVAT77] who used a subset of Hueckel's basis functions and O'Gorman [O'GORM78] and Mero and Vassy [MERO75] whose bases were defined on squares. Parametric edge models determine more about an edge's structure, but they are also more computationally expensive than other methods of edge detection.

B8.2. Region Extraction

The class of segmentation methods which label pixels according to similarities is termed region or surface patch extraction. These methods can be looked at as the opposite of edge extraction. Region based methods group pixels which share some intensity based property and which provide spatial continuity. These variations can occur on a large scale, where the classification is based on shading, or on a small scale, where the differences are based on

texture.

B8.2.1. Intensity and Color

When viewed objects are uniform in color or intensity, labeling pixels according to these characteristics is a natural way to segment the image. The classification of images can be accomplished by labeling pixels based on a number of criteria. One of approach labels pixels by comparing them to a threshold value and another method of labelling is based on comparing the connectedness of adjacent pixels. Each of these approaches are described in more detail.

Weszka [WESZK78] describes numerous global, local, and dynamic methods to choose threshold values. Global threshold values separate peaks of an image's histogram into two or more categories. However as grey level subpopulations become less distinct, reliable threshold selection becomes more difficult. Local threshold techniques label each pixel based on the properties of its surrounding neighbors. These methods are susceptible to minor variations in intensity but have the advantage of being applied in parallel. A dynamic method, designed to operate on low quality images, uses the statistical variance in a local neighborhood to select a threshold. The methods described can be used to perform binary or multilevel thresholding using grey levels or multispectral images.

B8.2.2. Texture

Textured patterns are regions of uniform brightness that have many internal edges. Consequently, methods that apply to smooth region extraction (Appendix B8.2.1) cannot be used to classify pixels in a textured region [ROSEN88]. Texutred regions can be segmented by

information determined from individual pixels, local features, or larger regions. Measurements can be made on pixels or local features using statistical relationships, which characterize the distributions and relations of pixels or regions. The structural methods describe primitives and the patterns used to generate a texture. Both of these classes of methods are described in more detail.

Uniformly spaced elements of similar shape in an image produce texture. Both the autocorrelation function of an image and its Fourier transform measure the spatial frequency that characterizes a pattern. Autocorrelation indicates how each pixel in an image influences surrounding pixels. It is a linear model that describes the frequency of light transmitted when the intensity of a pixel is compared to its neighbor. The maxima and minima of this two dimensional function indicate the size and separation of the texture primitives that compose the image. The coarseness of the pattern is indicated by the slope of the central peak in any given window of the image. The periodicity of the peaks also characterize the frequency of the texture pattern. The auotocorrelation function is given by:

Δ
y

[15]

where i and j can lie within a window and A4 and A, represent the shift between a pixel and its compared neighbor. The autocorrelation function does not provide good discrimination for natural textures, since coarseness tends to not be distinct.

B8.3. Optical Flow

Optical flow is defined as the motion of object points across an image resulting from the relative motion between a camera and objects in the scene. It is calculated from local temporal and spatial variations in sequences of grey level images. The optical flow, or instantaneous velocity field, assigns a two dimensional "retinal velocity" to every point in the visual field. The results of this measurement are used as input for higher level methods which compute camera motion, depth maps, and surface normals.

There are two general classes of methods for extracting optical flow from sequences of images: gradient based methods and correlation based methods [HONG89]. The first method uses the spatial and temporal derivatives of pixel brightness; the second tracks features in small regions of images over time. Level 1 processing includes the gradient method of optical flow extraction. The assumption of gradient based techniques is that pixel intensity in an image is constant over time, and thus any change in intensity at a point in the image is due to camera motion The optical flow is defined as (u,v) :

u = (1/z)(x; Vz

'x f ) + α (x ; y ; )/ f + ẞ (x; )2 /£ + ß£ - vy ;

i i

[16]

« Previous Continue »

Books