A Preliminary Study of Content-based Mammographic Masses Retrieval Yimo Taoa, b*, Shih-Chung B. Lob, Matthew T. Freedmanb, and Jianhua Xuana a b

Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA Department of Radiology, Georgetown University Medical Center, Washington DC 20007, USA ABSTRACT

The purpose of this study is to develop a Content-Based Image Retrieval (CBIR) system for mammographic computeraided diagnosis. We have investigated the potential of using shape, texture, and intensity features to categorize masses that may lead to sorting similar image patterns in order to facilitate clinical viewing of mammographic masses. Experiments were conducted within a database that contains 243 masses (122 benign and 121 malignant). The retrieval performances using the individual feature was evaluated, and the best precision was determined to be 79.9% when using the curvature scale space descriptor (CSSD). By combining several selected shape features for retrieval, the precision was found to improve to 81.4%. By combining the shape, texture, and intensity features together, the precision was found to improve to 82.3%. Keywords: computer-aided diagnosis, image retrieval, mammography, masses, shape, curvature scale space.

1. INTRODUCTION In the United States, breast cancer accounts for one-third of all cancer diagnoses among women, and it has the second highest mortality rate of all cancer deaths in women. Breast cancer studies are therefore essential for its ultimate eradication. Several studies show that only 13%-29% of suspicious masses are determined to be malignant [1-3]. In normal clinical practice, experienced radiologists often refer to their mental images of previously proven cases when making diagnoses and patient management decisions. This memory comparison is used both to detect cancer and to reduce call-backs for tissue overlap patterns, which in the past did not turn out to be true lesions. Providing a computerized library can refresh the radiologist’s mental memory with a broad array of proven cases and concrete visualizations. The system can also assist less experienced radiologists when making diagnoses by referring them not only to histological proven cases but also to the statistical distribution of features. It can help them see how the pattern in their current case closely resembles a pattern in cases previously proven to be non-cancerous, thereby, improving specificity. Freed from the bias induced by recent experience, such a library system should help to improve radiologists’ accuracy. Compared with the traditional computer-aided diagnosis (CAD) system, a CBIR system is more interactive in providing visual cues to assist radiologists’ in the characterization of mammographic masses. Since CBIR technology [4] has already been recognized as an important research direction to provide clinical decision support for medical image interpretation, we have performed a study to investigate the potential of using CBIR in mammographic image research in radiology.

2. MATERIAL AND METHODS 2.1 Database of Mammograms In this study, we used 243 regions of interest (ROIs) of breast masses collected form the University of South Florida’s (USF) Digital Database for Screening Mammography [5]. The USF films were scanned at 43.5 and 50 µm per pixel size. Of the 243 masses, 121 were malignant, and 122 were benign. These masses have been ranked by an expert radiologist on a scale from 1 to 5, in which 1 represents the easiest case and 5 represents the most difficult case. Table 1 lists the distribution of the masses studied according to their subtlety ratings. The images were of varying contrasts, and the *

[email protected]; phone: (1)202-687-5135; http://www.cbil.vt.edu

masses were of varying sizes. Contours of masses were first traced by an automatic delineation program [6] and then manually modified by a senior radiologist at Georgetown University Medical Center (GUMC). Table 1. Subtlety ratings of the masses within the database. Subtlety Benign Malignant

1 25 21

2 60 49

3 31 39

4 5 12

5 1 0

2.2 Features Feature extraction is an important aspect for any CBIR and CAD applications. Selected features should effectively represent the content of the images for indexing and retrieval purposes. In this preliminary study, we extracted several shape, texture, and intensity features, which are described in section 2.2.1 to 2.2.3. 2.2.1 Shape Features We have investigated a wide range of shape features in this study. Figure 1 shows the top-down study design. These shape features are described in detail in the following subsections.

Figure 1. Extracted shape features of mass lesions.

2.2.1.1 Moment Invariant Moments and functions of moments have often been used as global invariant image features in pattern recognition, shape retrieval, and image classification work. They have also been utilized in the classification of breast masses [7]. A geometric moment of order ( p + q ) of an object is given by:

m pq = ∑ ∑ i p j q f (i, j ) , p, q = 0,1, 2,... . i

(1)

j

where f (i, j ) is the image intensity of pixel at coordinate (i, j ) within the object boundary. In this work, we converted the delineated mass into binary images so that intensities of all pixels within the mass are set to 1, and intensities of all pixels outside the mass are set to 0. The geometrical moments indicated in (1) are not invariant with scaling, translation, or rotation. The translation invariance can be achieved using the central moment: (2) µ = (i − x ) p ( j − y ) q f (i, j ) pq

∑∑ i

where

c

c

j

xc , yc are the coordinates of the object’s center of gravity, which are given by: xc =

m10 m , yc = 01 m00 m00

(3)

The normalized un-scaled central moment

ϑ pq

is given by:

ϑ pq = where γ =

µ pq ( µ00 )γ

(4)

p+q +1 . 2

Hu [8] introduced seven central moment invariants, which are derived from the second and third-order normalized un-scaled central moments:

M 1 = ϑ20 + ϑ02

(5)

M 2 = (ϑ20 − ϑ02 ) 2 + 4ϑ112

(6)

M 3 = (ϑ30 − 3ϑ12 ) 2 + (3ϑ21 − ϑ03 ) 2

(7)

M 4 = (ϑ30 + ϑ12 ) 2 + (ϑ21 + ϑ03 ) 2

(8)

M 5 = (ϑ30 − 3ϑ12 )(ϑ30 + ϑ12 )[(ϑ30 + ϑ12 ) 2 − 3(ϑ21 + ϑ03 )]2

(9)

+(3ϑ21 − ϑ03 )(ϑ21 + ϑ03 )[3(ϑ30 + ϑ12 ) 2 − (ϑ21 + ϑ03 )2 ] (10)

M 6 = (ϑ20 − ϑ02 )[(ϑ30 + ϑ12 ) 2 − (ϑ21 + ϑ03 ) 2 ] +4ϑ11 (ϑ30 + ϑ12 )(ϑ21 + ϑ03 )

M 7 = (3ϑ21 − ϑ03 )(ϑ30 + ϑ12 )[(ϑ30 + ϑ21 ) 2 − 3(ϑ21 + ϑ03 ) 2 ]

(11)

−(ϑ30 − 3ϑ12 )(ϑ21 + ϑ03 )[3(ϑ30 + ϑ12 )2 − (ϑ21 + ϑ03 ) 2 ] These low-order moment invariants capture global shape properties associated with local pixel distribution. For example, the first moment invariant M 1 indicates the compactness/spreadness of the shape [9]. In this work, we computed two sets of the moment invariants listed above for each mass: one set M 1i→7 are computed from the solid region within the mass boundary; one set M 1b→7 are computed from pixels on the mass boundary. Therefore, the similarity between a query mass q and a model mass m in the database based on the region moment invariants is given by their Euclidean distance as: 7

d M i ( q , m) = ( ∑ M − M i =1

i q

i m

2 1 2

)

(12)

where M is the ith region moment invariant of the query mass q, and M mi is the ith region moment invariant of the model i q

mass m. The similarity based on the boundary moment invariants is similar to (12). 2.2.1.2 Fourier Descriptor Fourier descriptors (FD) are one of the most widely used boundary descriptors. In general, FDs are obtained by applying Fourier transform on a shape signature, which is any 1-D function representing a 2-D area or boundary. These descriptors represent the shape of the object in a frequency domain. Sahiner [10] used the FDs derived from complex coordinates for mammographic mass classification. However, as has been shown in [11], FD derived from radial length function outperformed the FD derived from complex coordinates. Therefore, in this work, we adopted FDs derived from radial length function as one of the mass shape indices. (A) Radial Length Function The radial length function is given by the distance of the boundary point from the centroid

r (t ) = [( x (t ) − xc ) 2 + ( y (t ) − yc ) 2 ]1/ 2

( xc , yc ) of the mass shape: (13)

Before applying Fourier transform on the shape signature, we first sampled the contour into 256 points with equal arc length, which best preserves the boundary topological structure.

(B) Discrete Fourier Transform of Radial Length Function For a given contour by:

r (t ) in (13), assuming it is sampled to N points, the discrete Fourier transform of r (t ) is then given

(14) 1 N −1 j 2π nt r (t ) exp(− ), n = 0,1,..., N − 1 ∑ N t =0 N The coefficients FDn , n = 0,1,..., N − 1 , are called Fourier descriptors of the shape. Since (14) is a real value function, only half of the FDs are distinct, therefore, only half of the FDs in (14) are needed to index the shape. Using only the magnitude of the FDs, normalized FDs (NFDs) are then given by: FDn =

NFDn =

FDn FD0

(15)

, n = 1, 2,..., N / 2

Note that these NFD values are scaling, translation and rotation invariant. Since the radial lengths of spiculated and irregular masses vary more frequently than those of circumscribed and regular masses, the coefficients of high frequency descriptors of a spiculated (or irregular) mass will be larger than that of a circumscribed one. In this sense, FDs derived from the radial length provide explicit physical characteristics to discriminate circumscribed and spiculated masses. Although the number of coefficients generated is large, a subset of the coefficients may be sufficient to capture the features of the mass shape. Since the coefficients of high frequency section describe fine details of the shape, which usually are not significant in shape discrimination, they can be safely ignored. In our experiment, we found using the first 20 NFDs should be sufficient. Hence, when comparing these NFD features, we define the similarity between a query mass q and a model mass m in the database as: 20

2 1

d NFD (q, m) = (∑ NFDqi − NFDmi ) 2

(16)

i =1

where NFDqi is the ith NFD of the query mass q and NFDmi is the ith NFD of the model shape m. 2.2.1.3 Curvature Scale Space Descriptor Basically, the curvature scale space descriptors [12] (CSSDs) are the descriptors of key local shape features, which represent the degree of convexities (or concavities) of curve segments on a shape boundary. Since the curvature is an important local measure associated with the degree of contour turning, the exploit of CSSD may closely match the human visual perception. The CSSD was selected as a contour-based shape descriptor for MPEG-7. Figure 2 shows the steps to compute CSSD.

Figure 2. Steps for the processing of CSSDs.

The first step is to obtain the coordinates of the mass boundary ( x (t ), y (t )), t = 0,1,..., N − 1 . In order to match shapes with different boundary points, we sampled all shape boundaries into fixed number of points. In this work, we use 256 sampling points with equal arc length, which is the same as that in computing Fourier Descriptors. The remaining steps are used to obtain the CSSDs from the CSS image, which is a multi-scale organization of the inflection points (or curvature zero crossings) of the shape boundary. The curvature of a planar curve, at a point on the curve, is given by:

κ (t ) =

x (t )  y (t ) −  x(t ) y (t ) ( x (t ) + y (t ))3/ 2

(17)

where x (t ) ,  x(t ) , y (t ) , and  y (t ) are the first and second derivatives of x(t ) and y (t ) respectively. We convolve x (t ) and y (t ) respectively with 1D Gaussian filter g t ( t , δ ) of standard deviation δ . Then, the smoothed curve is given by:

X (t , δ ) = x(t ) ⊗ gt (t , δ )

(18)

Y (t , δ ) = y (t ) ⊗ gt (t , δ )

(19)

X t ( t , δ )Ytt ( t , δ ) − X tt ( t , δ )Yt ( t , δ ) ( X t ( t , δ ) 2 + Yt ( t , δ ) 2 ) 3 / 2

(20)

Thus, the smoothed curvature is given by:

κ (t , δ ) =

where X t (t , δ ) , X tt (t , δ ) , Yt (t , δ ) and Ytt (t , δ ) are the first and second derivatives of X (t , δ ) and Y (t , δ ) . Curvature zero crossings are then located on the shape boundary at where κ (t , δ ) = 0 . In order to obtain the curvature zero crossings at multiple scales, we convolve the shape boundary with gt (t , δ ) of successive δ values. As δ increases, the shape boundary shrinks and becomes smoother and the number of curvature zero crossings decreases, finally to zero. The whole process is called contour evolution. All curvature zero crossings are located during evolution and mapped to the CSS image in which the horizontal axis represents the normalized arc length parameter on the original shape boundary, and the vertical axis represents the δ value of the Gaussian filter. Figure 3 shows the evolution process of a spiculated mass and a circumscribed mass.

(a)

(b)

(c)

Figure 3. The contour evolution processes of a spiculated mass and a circumscribed mass, and their corresponding CSS images. (a) The contours of masses. (b) The evolution processes (δ = 1, 4, 7, 9). (c) The CSS images (the black dots correspond to the extracted peaks from the height adjusted CSS images).

The peaks (or the maxima) of the CSS image are then extracted out and sorted in descending order of δ values as CSSDs, which would then be used to index the mass shape. While the peaks with large δ value represent major inflections on the boundary shape, the peaks with small δ value represent minor ripples on the shape boundary and can be safely ignored. As seen in Figure 3, a regular and circumscribed mass will evolve faster than an irregular and spiculated mass. The extracted peaks of irregular and spiculated masses would thus have larger δ values than those of regular and circumscribed masses. Hence, it is expected that CSSDs could provide a physical interpretation in curvature measurements to discriminate circumscribed and spiculated masses, as well as regular and irregular masses. In this work, we extracted CSSDs from an enhanced CSS image [13], which is called a height adjusted CSS image. The height adjusted CSS image solved the shallow concavity problems, which existed in the original CSS image.

Concentrating on discriminating the regularity of the mass shapes, we only used the δ values of the extracted CSSDs for mass retrieval. In this experiment, we evaluated the use of 5 CSSDs to 15 CSSDs, and found that the use of 11 CSSDs showed the optimal retrieval performance. Hence, when comparing the CSSD features, we define the similarity between a query mass q and a model mass m in the database as: 11

2 1

dCSSD (q, m) = (∑ δ qi − δ mi ) 2

(21)

i =1

where δ qi is the δ value of the ith CSSD of the query mass q, and δ qm is the δ value of the ith CSSD of the model mass m. 2.2.1.4 Other Global Shape Features A large group of shape description techniques is represented by heuristic approaches which yield acceptable results in description of simple shapes. In this work, we employed six global features including area, perimeter, compactness[7], solidity, eccentricity, and elongation. (22) Compactness: G = 1 − (4π A / P 2 ) compactness

where A is the area and P is the perimeter of the boundary.

GSolidity = A / H

Solidity :

(23)

where A is the area and H is the area of the corresponding convex hull of the shape. Eccentricity:

where

GEccentricity =

µ20 + µ02 − ( µ20 − µ02 ) 2 + 4µ112

(24)

µ20 + µ02 + ( µ20 − µ02 ) 2 + 4 µ112

µ20 , µ02 and µ11 are central moments defined in (2). Eccentricity[9] is actually the ratio of the short axis length

to the long axis length of the minimum bounding ellipse of the shape. Elongation: where

GElongation

I = min I max

(25)

I min is the short axis length of the minimum bounding rectangle, and I max is the long axis length of the minimum

bounding rectangle. The similarity between the query mass q and a model mass m in the database based on the ith global shape feature is given by: i d Global = Gqi − Gmi

(26)

where Gqi is the ith global shape feature of the query mass q and Gmi is the ith global shape feature of the model mass m. 2.2.1.5 Statistics Derived From Normalized Radial Length Statistics derived from normalized radial length (NRL) has been previously employed as features for mammographic mass classification [14, 15]. The NRL is defined as the radial length of (13) divided by the maximum of the radial length. While computing the NRL, we used 256 sampling points with equal arc length, which were the same as those for computing FDs and CSSDs. In this work, we employed seven statistics derived from NRL, including mean, deviation, skewness, kurtosis, area ratio, zero-crossing count, and boundary roughness [13]. The similarity between the query mass q and a model mass m in the database based on the ith NRL derived feature is given by:

d i NRL = NRLiq − NRLim

(27)

where NRLiq is the ith NRL derived feature of the query mass q and NRLim is the ith NRL derived feature of the model mass m.

2.2.2 Texture Features Mammograms display a variety of textures corresponding to parenchyma, fat, normal tissues, abnormal tissues, and masses. The texture features derived from the gray level co-occurrence matrix (GLCM) [16] have been widely adopted in the mammographic CAD applications [10, 15]. We computed five texture features from the extended margin of the mass including energy, inertia, entropy, inverse difference moment, and difference entropy. The extended margin region was obtained by dilating the binary image of the mass with a circular structuring element. The radius of this structuring element is given by:

Rs = max{7, 0.3Req }

(28)

where Req is the radius of a circle with the same area as the mass. The five texture features were computed in four directions ( θ = 0D , 45D , 90D and 135D ) and with the distance of 1 pixel. In order to preserve rotate invariance, we summed each of the five features over the θ angles, thus obtaining the final five texture features: sum energy, sum inertia, sum entropy, sum inverse difference moment, and sum difference entropy. The similarity between a query mass q and a model mass m in the database based on the above texture features is given by their Euclidean distance as: (29)

2

dTexture (q, m) = (

1 1 1 i i 2 ) T − T ∑ q m 5 i =1

where Tqi is the ith texture feature of the query mass q and Tmi is the ith texture feature of the model mass m. Before computing these texture features, we applied a background correction algorithm [17] to remove non-uniform background that is unrelated to the mass’s characteristic. The intensity range of the background corrected image was linear scaled to 0 and 255. 2.2.3 Intensity Features Five intensity-based statistical features (i.e., mean, deviation, skewness, and kurtosis) were extracted using the pixels within the mass region from the background corrected image. The similarity between a query mass q and a model mass m in the database based on the intensity features is given by their Euclidean distance as: (30)

2

1 1 4 d Intensity (q, m) = ( ∑ I qi − I mi ) 2 4 i =1

where I qi is the ith intensity feature of the query mass q and I mi is the ith intensity feature of the model mass m. 2.3 Feature Normalization In order to combine different individual feature for mass retrieval, we had to normalize each feature into the same range. In this work, we tested two normalization schemes: linear scaling and unit variance linear scaling (NVLS) in [18]. The performance difference between these two normalization schemes was found to be trivial. Therefore, we only present the result using the NVLS normalization scheme here. The normalized feature under this scheme is given by: (31) x−x 1 + 6δ x 2 where x is the average value of x , and δ x is the variance of x . Providing that the data spectrum of a feature fits a

x =

normal distribution curve, the above normalization scheme will produce 99% of truncated the out-of-range component to either 0 or 1.

x in the [0, 1] range. We then

3. EXPERIMENTS AND RESULTS 3.1 Measure of Retrieval Performance To evaluate the performance of the retrieval result, we defined that a retrieved image is considered relevant if it belongs to the same class (benign or malignant) as the query image. The precision [19] of the retrieval performance is then defined as: precsion =

(32)

r k

where r is the number of relevant retrieved images, and k is the total number of retrieved images. Since it is more practical to present radiologists with a smaller number of relevant retrieved images in a busy clinical environment, we chose k = 5 for this study. Our experiment was carried out in a leave-one-out manner. Every image in the database is served as one query image to retrieve top five matched images. The average precision is computed as the final measure of retrieval performance.

3.2 Results 3.2.1 Individual Feature’s Precision Table 2 shows the retrieval precision using each individual feature. The second columns in Table 2 (a), (b), and (c) show the precision of experiments using all 243 masses. We removed the 19 masses (subtlety rank 4-5 out of 5) from the dataset and redid the experiment again using the remaining 225 masses (subtlety rank 1-3 out of 5). The third columns in Table 2 (a), (b), and (c) show the precision. Table 2. The experiment results of individual features. (a) The retrieval precision of shape features. (b) The retrieval precision of texture features. (c) The retrieval precision of intensity features. Feature

Precision (%)

Precision (%)

Feature

Precision (%)

Precision (%)

Region-based moment invariants Boundary-based moment invariants Top 20 NFDs

70.9

73.6

Sum Energy

55.1

54.9

59.5

60.7

74.8

78

Perimeter

73.8

80.1

Area

58.9

59.8

Compactness

78.6

84.1

Elongation

49.9

50.4

Eccentricity

49.5

49.2

Sum Inertia

53.8

55.4

Sum Entropy

53.3

55.6

Sum Inverse Difference Moment Sum Difference Entropy

51.3

51.9

53.2

55.8

Combined Texture

54.7

56.5

(b)

Solidity

75.8

80.2

Top 11 CSSDs

79.9

86

Mean

58.4

58.8

Standard Deviation

57.2

57

Feature

Precision (%)

Precision (%)

Skewness

50.1

49.7

Mean

48.1

47.1

Kurtosis

50.6

51.8

Standard Deviation

53.2

53.2

Zero-crossing count

70

75

Skewness

51.6

52.6

Area ratio

55.5

54

Kurtosis

59.8

62.6

Roughness

75.2

81.7

Combined Intensity

59.5

60.9

(a)

(c)

3.2.2 Combining Different Shape features It is conceivable that many of the shape features are more or less correlated with each other. Therefore, it is unnecessary to combine all these shape features together for retrieval. Extensive search for the optimal feature subset with its corresponding weighting coefficient is unrealistic. For this preliminary work, we combined several shape features to investigate the possible improvement for the overall retrieval precision. The selected features include CSSD, compactness, and solidity. The reasons for selecting these three features are: (1) They are the top three shape features which produced the highest retrieval precision; (2) While CSSD provides the local shape information, compactness and solidity could complement the global shape information. Note that in section 2.3 we do not normalize CSSDs and FDs, since they are in descending order according to the sigma values or normalized magnitude. Top ranked CSSDs (or FDs) have large sigma (or normalized magnitude) values; therefore, they contribute more to the computed distance. Low ranked CSSDs (or FDs) have smaller values; therefore, they contribute less to the computed distance. In order to combine the computed distance using CSSDs (or FDs) with computed distances using the individual features, we had to normalize the computed distance using CSSDs (or FDs) into the same range [0, 1]. We first computed all distances between the paired masses in the database using CSSDs (or FDs). Then the normalized distances between the query mass q and a model mass m in the database is given by:

(33)

d (q, m ) − d 1 d ( q , m ) = + 6δ d 2

where d ( q, m) is the original distance, d is the average distance, and δ d is the variance of distances. Similar to (31), we simply truncated the out-of-range value to either 0 or 1. Therefore, the distances between the query mass q and a model mass m in the database based on these three shape features is given by: (34) d (q, m) = w d (q, m) + w d (q, m) + w d ( q, m) Shape

1 CSSD

2 Compactness

3

Solidity

where dCSSD ( q, m) is the normalized distance computed using CSSD, dCompactness (q, m) is the distance computed using the normalized compactness feature, dSolidity (q, m) is the distance computed using the normalized solidity feature, and wi (i = 1, 2,3) are the weighting coefficients. Here, we simply evaluated the different combinations of

wi , using wi = k , k = 0,1,...,10 . The suboptimal weighting

coefficients for (34) were found to be w1 = 5 , w2 = 5 and w3 = 6 . After normalizing wi (i = 1, 2, 3) , we obtained that w1 = 0.3125 , w2 = 0.3125 and w3 = 0.375 . A more sophisticated scheme to formulate the suboptimal weighting coefficients can be found in [20]. Table 3. The combination of three shape features. The combination of three shape features

Precision (%)

0.3125* dCSSD + 0.3125* dCompactness + 0.375* dSolidity

81.4

3.2.3 Combining Shape, Texture, and Intensity Features After evaluating the suboptimal weighting scheme for combining the shape features, we then tried to combine (34) with the texture-based distance (29) and intensity-based distance (30). Using the similar manner as in (34), the final overall distance combining shape, texture and intensity features is given by: D ( q , m ) = w1 d Shape ( q , m ) + w 2 dTexutre ( q , m ) + w 3 d Intensity ( q , m )

(35)

where d Shape ( q , m ) is from (34), dTexutre ( q , m ) is the distance of (29) computed using the normalized texture feature, d Intensity ( q , m ) is the distance of (30) computed using the normalized intensity feature, and wi (i = 1, 2,3) are the weighting

coefficients. In a similar manner as described in section 3.2.2, the suboptimal weighting coefficients for (35) are found to be w1 = 0.643 , w 2 = 0 and w3 = 0.357 .

Table 4. The combination of shape and intensity features. The combination of three shape, texture and intensity features

Precision (%)

0.643 * d Shape + 0.357 * d Intensity

82.3

3.2.4 Retrieval Examples Several retrieval examples using individual and combined features are shown in Figure 4. The left mass with the blue margin is the query mass, and the remaining five masses on the right section from left to right are the top five retrieved masses arranged in increasing order of distance from the query mass.

(a)

(b)

(c)

(d)

(e) Figure 4. Retrieval result of two query masses using different features (a) The result of one spiculated query mass using CSSD. (b) The result of the same query mass in (a) using CSSD, compactness, and solidity. (c) The result of one circumscribed query mass using CSSD. (d) The result of the same query mass in (c) using CSSD, compactness, and solidity. (e) The result of the same query mass in (c) using CSSD, compactness, solidity and area.

The results shown in Figure 4(a) and (c) demonstrate the effectiveness of CSSD in characterization of circumscribed and spiculated masses. The top five retrieved masses in the two examples are consistent with the query masses in shape. The effects of combining several shape features are shown in Figure 4(e): using combined features, three of the five retrieved masses in Figure 4(e) are different from the result in Figure 4(d), and the retrieved masses in Figure 4(e) have visually similar sizes with the query mass.

4. CONCLUSIONS AND DISCUSSION From the results in Table 2 (a), it is seen that the shape features play an important role in differentiating benign and malignant masses. However, the performance of the shape features, especially contour-based shape features, relies heavily on the obtained mass contour. It brings a challenge to the result of the mass segmentation algorithm. In this work, the mass boundaries are semi-automatically traced. The integration of the automatic mass segmentation algorithm, e.g., [6], into the current system will be of great value in a clinical environment. We also found that the performance depended on whether the subtle cases were included in the image library. The precision increased after removing the subtle cases for most individual features listed in Table 2. The simple weighting scheme for combining different shape features does not provide a significant improvement over the individual shape features. It indicates that the three shape features, i.e., CSSD, compactness, and solidity, may highly correlate with each other. Therefore, their combination will not introduce significant additional information for differentiating benign and malignant masses. The combination may merely adjust the order of top matched masses. However, it does not provide significant rank changes in most cases, and does not contribute significantly to differentiating the subtle cases, that may need additional texture and intensity information. The current combination scheme for shape, texture and intensity features is simple. The improvement after combination is insignificant. More robust combination scheme and effective features will be needed for discriminating those subtle cases, which are of real clinical value. In order to prevent the negative influence exerted by the correlation between different features during combination, a feature analysis and selection step would be necessary before combining these features for retrieval. This part of the study calls for further investigation. The use of shape features outperforms the use of texture and intensity features for this database. The result is consistent with the result presented in [10, 21], where the retrieval and classification precision using individual shape feature is higher than when using individual texture feature. This may be due to the fact that majority of the mammographic masses collected in this database have large size. The spiculation features of the large malignant masses are somewhat well-developed as compared to mid-size and small size masses.

ACKNOWLEDGEMENTS This project was supported in part by an NIH/NCI grant (No. R21CA102960).

REFERENCES 1. J. E. Meyer, D. B. Kopans, P. C. Somper, and K. K. Lindfors, "Occult breast abnormalities: percutaneous preoperative needle localization," Radiology, pp. 335-337, 1984. 2. A. L. Rosenberg, G. F. Schwartz, S. A. Feig, and A. s. Patchefsky, "Clinically occult breast lesions: localization and significance," Radiology, pp. 167-170, 1987. 3. B. C. Yankaskas, M. H. Knelson, M. L. Abernethy, J. T. Cuttino, and R. l. Clark, "Needle localization biopsy of occult lesions of the breast," Radiology, pp. 729-733, 1988. 4. H. Müller, N. Michoux, D. Bandon, and A. Geissbuhler, "A review of content-based image retrieval systems in medical applications-clinical benefits and future directions," International Journal of Medical Informatics, vol. 73, pp. 1-23, 2004. 5. M. Heath, K. W. Bowyer, D. Kopans, P. Kegelmeyer Jr, R. Moore, K. Chang, and S. Munishkumaran, "Current status of the Digital Database for Screening Mammography. Digital Mammography," in Digital Mammography, Kluwer Academic Publishers, 1998, pp. 460-547. 6. L. Kinnard, S. C. B. Lo, E. Makariou, T. Osicka, P. Wang, M. F. Chouikha, and M. T. Freedman, "Steepest changes of a probability-based cost function for delineation of mammographic masses: A validation study," Medical Physics, vol. 31, pp. 2796-2796, 2004.

7. R. M. Rangayyan, N. M. El-Faramawy, J. E. L. Desautels, and O. A. Alim, "Measures of Acutance and Shape for Classification of Breast Tumors," IEEE Trans. on Medical Imaging, vol. 16, p. 799, 1997. 8. M. K. Hu, "Visual pattern recognition by moment invariants," IEEE Trans. on Information Theory, vol. 8, pp. 179187, 1962. 9. J. G. Leu, "Computing a shape's moments from its boundary," Pattern Recognition, vol. 24, pp. 949-957, 1991. 10. B. Sahiner, H. P. Chan, N. Petrick, M. A. Helvie, and L. M. Hadjiiski, "Improvement of mammographic mass characterization using spiculation measures and morphological features," Medical Physics, vol. 28, p. 1455, 2001. 11. D. Zhang and G. Lu, "A comparative study on shape retrieval using Fourier descriptors with different shape signatures," in Proc. of International Conference on Intelligent Multimedia and Distance Education (ICIMADE01), 2001, pp. 1-9. 12. S. Abbasi, F. Mokhtarian, and J. Kittler, "Curvature scale space image in shape similarity retrieval," Multimedia Systems, vol. 7, pp. 467-476, 1999. 13. S. Abbasi, F. Mokhtarian, and J. Kittler, "Enhancing CSS-based shape retrieval for objects with shallow concavities," Image and Vision Computing, vol. 18, pp. 199-211, 2000. 14. L. M. Bruce and R. R. Adhami, "Classifying Mammographic Mass Shapes Using the Wavelet Transform ModulusMaxima Method," IEEE Trans. on Medical Imaging, vol. 18, p. 12, 1999. 15. D. M. Catarious Jr, A. H. Baydush, and C. E. Floyd Jr, "Characterization of difference of Gaussian filters in the detection of mammographic regions," Medical Physics, vol. 33, p. 4104, 2006. 16. R. M. Haralick, I. Dinstein, and K. Shanmugam, "Textural features for image classification," IEEE Trans. on Systems, Man, and Cybernetics, vol. 3, pp. 610-621, 1973. 17. P. Campadelli, E. Casiraghi, and D. Artioli, "A fully automated method for lung nodule detection from posteroanterior chest radiographs," IEEE Trans. on Medical Imaging, vol. 25, pp. 1588-1603, 2006. 18. S. Aksoy and R. M. Haralick, "Feature normalization and likelihood-based similarity measures for image retrieval," Pattern Recognition Letters, vol. 22, pp. 563-582, 2001. 19. J. R. Smith, "Image Retrieval Evaluation," in IEEE Workshop on Content-Based Access of Image and Video Libraries, 1998, pp. 112-113 20. D. Comaniciu, P. Meer, and D. J. Foran, "Image-guided decision support system for pathology," Machine Vision and Applications, vol. 11, pp. 213-224, 1999. 21. H. Alto, R. M. Rangayyan, and J. E. L. Desautels, "Content-based retrieval and analysis of mammographic masses," Journal of Electronic Imaging, vol. 14, pp. 23016:1-17, 2005.

Style template and guidelines for SPIE Proceedings

(USF) Digital Database for Screening Mammography [5]. .... Fourier transform on a shape signature, which is any 1-D function representing a 2-D area or ...

764KB Sizes 0 Downloads 135 Views

Recommend Documents

Proceedings Template - WORD
This paper presents a System for Early Analysis of SoCs (SEAS) .... converted to a SystemC program which has constructor calls for ... cores contain more critical connections, such as high-speed IOs, ... At this early stage, the typical way to.

Proceedings Template - WORD - PDFKUL.COM
multimedia authoring system dedicated to end-users aims at facilitating multimedia documents creation. ... LimSee3 [7] is a generic tool (or platform) for editing multimedia documents and as such it provides several .... produced with an XSLT transfo

Word Template for AC03 Proceedings - CiteSeerX
A final brief observation suggests that the semantic analysis of ..... came ill”. The NPI data suggest that in fact, we have to count with the two assertions in (16.a) ...

Word Template for AC03 Proceedings - CiteSeerX
... to existent discourse referents. This al-. ∗. Zentrum für Allgemeine Sprachwissenschaft and Humboldt University, Berlin. email: [email protected] ...

Author Guidelines for Proceedings
opportunities of interaction with virtual objects, exploration of virtual environments, and interaction ... the best-known applications of the early days of SL is.

Proceedings Template - WORD
Through the use of crowdsourcing services like. Amazon's Mechanical ...... improving data quality and data mining using multiple, noisy labelers. In KDD 2008.

Proceedings Template - WORD
software such as Adobe Flash Creative Suite 3, SwiSH, ... after a course, to create a fully synchronized multimedia ... of on-line viewable course presentations.

Proceedings Template - WORD
We propose to address the problem of encouraging ... Topic: A friend of yours insists that you must only buy and .... Information Seeking Behavior on the Web.

AAAI Proceedings Template
human tutoring when they are having difficulty in courses. Investing time and effort in .... The students were all having difficulty in a science or math course and.

Proceedings Template - WORD
10, 11]. Dialogic instruction involves fewer teacher questions and ... achievment [1, 3, 10]. ..... system) 2.0: A Windows laptop computer system for the in-.

Proceedings Template - WORD
Universal Hash Function has over other classes of Hash function. ..... O PG. O nPG. O MG. M. +. +. +. = +. 4. CONCLUSIONS. As stated by the results in the ... 1023–1030,. [4] Mitchell, M. An Introduction to Genetic Algorithms. MIT. Press, 2005.

Proceedings Template - WORD
As any heuristic implicitly sequences the input when it reads data, the presentation captures ... Pushing this idea further, a heuristic h is a mapping from one.

Proceedings Template - WORD
Experimental results on the datasets of TREC web track, OSHUMED, and a commercial web search ..... TREC data, since OHSUMED is a text document collection without hyperlink. ..... Knowledge Discovery and Data Mining (KDD), ACM.

Proceedings Template - WORD
685 Education Sciences. Madison WI, 53706-1475 [email protected] ... student engagement [11] and improve student achievement [24]. However, the quality of implementation of dialogic ..... for Knowledge Analysis (WEKA) [9] an open source data min

Proceedings Template - WORD
presented an image of a historical document and are asked to transcribe selected fields thereof. FSI has over 100,000 volunteer annotators and a large associated infrastructure of personnel and hardware for managing the crowd sourcing. FSI annotators

AAAI Proceedings Template
developed directly from labeled data using decision trees. Introduction .... extremely easy to use with dialogue analysis due to its included ... Accuracy statistics for the j48 classifier. Category .... exploration is in the combination these classi

Proceedings Template - WORD
has existed for over a century and is routinely used in business and academia .... Administration ..... specifics of the data sources are outline in Appendix A. This.

Proceedings Template - WORD
the technical system, the users, their tasks and organizational con- ..... HTML editor employee. HTML file. Figure 2: Simple example of the SeeMe notation. 352 ...

Proceedings Template - WORD
Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-9116 [email protected]. Margaret J. Eppstein. Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-1918. [email protected]. ABSTRACT. T

Proceedings Template - WORD
Mar 25, 2011 - RFID. 10 IDOC with cryptic names & XSDs with long names. CRM. 8. IDOC & XSDs with long ... partners to the Joint Automotive Industry standard. The correct .... Informationsintegration in Service-Architekturen. [16] Rahm, E.

mrs proceedings template
These data show a number of interesting features for phase devices. .... analyser directions (i.e. at 45º) but, with all of the other flexoelectro-optic advantages ...

AAAI Proceedings Template
Our results support the conclusion that expert tutors feedback is direct, immediate, discriminating, and largely domain independent. We discuss the implication of.

AAAI Proceedings Template
a file, saving a file, sending an email, cutting and pasting information, etc.) to a task for which it is likely being performed. In this demo, we show the current.

Proceedings Template - WORD
Jun 18, 2012 - such as social networks, micro-blogs, protein-protein interactions, and the .... the level-synchronized BFS are explained in [2][3]. Algorithm I: ...