ContentBased Access to Medical Image Collections Juan C. Caicedo, National University of Colombia, Colombia Jorge E. Camargo, National University of Colombia, Colombia Fabio A. González, National University of Colombia, Colombia
Abstract Medical images are a very important resource for the clinical practice and operation. Thousands of them are daily acquired in hospitals to diagnose the health state of patients. However, once they have been archived as part of a large image database, it is very difficult to retrieve the same image again since it requires remembering dates or names. Furthermore, in such cases in which the same image is not required, but a physician is looking for images with particular contents, current technologies are not able to offer such functionality. The ability to find the right visual information in the right place, at the right time, can have great impact in the medical decision making process. This chapter presents two computational strategies for accessing a large collection of medical images: retrieving relevant images given an explicit query and visualizing the structure of the whole collection. Both strategies take advantage of image contents, allowing users to find or identify images that are related by their visual composition. In addition, these strategies are based on machine learning methods to handle complex image patterns, semantic medical concepts, image collection visualizations and summarizations. Keywords Contentbased medical image retrieval, medical imaging, kernel methods, image collection visualization, image clustering Introduction Large amounts of medical images are produced daily in hospitals and health centers. For instance, the University Hospital of Geneva reported a production of 70,000 images per day during 2007 in the Radiology department alone (Pitkanen et al., 2008). The management of those large image collections is a challenging task nowadays, mainly due to the ability of accessing the image database for obtaining useful information. The computational power required to archive and process the image database has been raising during the last few years, making it possible to store large collections of medical images in
specialized systems such as Picture Archiving and Communication Systems (PACS). These systems may be extended to archive more digital images according to the hospital's needs and usually they also support the work flow in the radiology department as well as in other specialized services. However, even though the capacity is expanded, the functionality of these systems remains static and still provides very basic operations to query and search for medical images. The contents of a large image collection in medicine may be used as a reference set of previously evaluated cases, with annotations associated to diagnosis and evolution of patients. Then, a physician that is attending a new patient may check out medical records from other patients, evaluated by other experts, and hence clinical decisions can be highly enriched by the information stored in the database. In addition, clinical training in medical schools may be supported by these real reference collections, allowing students and professors to access the experience accumulated from thousands and thousands of cases previously diagnosed. The actual problem is how to query and explore the collection in an effective way, that is, with an immediate relevant response. The first approach that may be considered is the use of textual annotations through a standard information retrieval system, so users are enabled to write several keywords associated to an information need. However, a collection of medical images does not necessarily have complete and descriptive annotations for all images, so this method would prevent full access to the database. Furthermore, users are not always quite aware of their information needs in terms of keywords, and this would lead to a trialanderror loop for finding right answers from the system. The good news are that physicians may have example images from the current case to query the system, which include the kind of visual patterns that they are interested in. Contentbased Image Retrieval (CBIR) is an interesting alternative to support the decision making process in a clinical workflow. CBIR systems are designed to search similar images using visual contents instead of associated data (Datta et al., 2008). So, given an example image, the system should be able to extract visual features, structure them and find the semantically matching images in the database. This approach is known as the QueryByExample (QBE) paradigm for image search, which has been widely studied. In general, a CBIR system has to consider two main aspects in order to provide that functionality: (1) image content representation and (2) similarity measures. Content representation is related to image processing methods and feature extraction algorithms, and aims to identify characteristic image descriptors such as points, regions or objects. Ideally, image descriptors should clearly match real life objects and concepts, but in practice it is very difficult to get such a result due to the semantic gap, i.e. the lack of coincidence between extracted features and human interpretations (Smeulders et al., 2000). On the other hand, similarity measures are needed to accurately distinguish images that share the same features, so that the system will be able to recommend the most similar images to a user, if an example query image is provided. There are several scenarios in which a physician may have an example image to query the system, for instance, when attending new patients or reading electronic papers. However, having an example image at hand to query the system whenever relevant images are needed is not always the case, for instance, in the middle of a lecture. This problem is known as the zero page problem of the querybyexample paradigm in information retrieval (La Cascia et al., 1998). In such cases, the common alternative is to provide keywordbased search functionality, but the problem of image annotations pops up again. So, the question is: how to supply contentbased access to a large medical image collection for physicians, professors and students that would like to obtain examples of biological structures, diseases or diagnosis, but do not have images at hand to perform a query? Some simple alternatives are used in production CBIR systems, such as offering random images from the database to choose an example and start the query, but this is a suboptimal solution. However, the spirit of offering images from the database to explore the contents of
the collection has inspired other approaches that attempt to make available image examples in a non random way. A new interesting strategy to enable users for searching images in a CBIR system consists of handing over a visualization of the whole image collection using a 2D map metaphor. This strategy tries to exploit the human brain capacity for efficiently recognizing visual patterns, so that an ordered display of many images at the same time may help users to find the right information. The visualization is built such that physicians can see different images distributed in the screen according to their visual similarity and can intuitively start to explore the image collection. Three computational criteria are considered to generate image collection visualizations: (1) image similarity definition, (2) image collection projection in a 2D plane and (3) image collection summarization. First, image similarity definition is alike to standard CBIR systems, taking into account the content representation, a similarity measure and the possible effects of the semantic gap. Second, the image collection projection is usually approached using a dimensionality reduction method of the image representation, obtaining a 2D coordinate for each image. Third, when the database is too large, image visualization overlapping is reduced using a summarization technique so the user can see only a representative set of images. Users are then able to see the image database contents from a global perspective. This chapter presents some approaches to provide contentbased access to a medical image collection, using both, the query by example paradigm and the visualization strategy. A very important aspect of the results provided by any access method is that they have to meet user requirements in terms of use and relevance of the retrieved images. Effective systems must cope with medical concepts and complex human interpretations, since the kind of tasks that these systems will support are medical knowledgedriven processes. The use of Machine Learning methods results suitable for the development of successful systems to access image collections because of their ability to automatically learn in thoroughly different tasks, such as pattern recognition or data structure discovery. Specifically, this chapter will show how supervised and unsupervised strategies are powerful approaches to process an image collection in order to build search indexes or visualizations. Four problems related to accessing a medical image collection are covered in this chapter using Machine Learning methods: First, image contents representation, which requires the ability of the system to recognize visual patterns associated with medical concepts; Second, image retrieval using a query by example paradigm, taking into account that the method should return to the user a list of semantically related images instead of visuallyalike images; Third, image collection projection in a 2D plane to generate a visualization of the image collection, to organize images on the screen following semantic criteria; And finally, image collection summarization, for trying to identify a set of images that represent as many semantic topics in the collection as possible. These four problems are approached using Machine Learning algorithms, the four require to handle semantic concepts for providing effective functionalities in the CBIR system. This chapter shows how Machine Learning methods allow to model strategies that may help to bridge the semantic gap in image retrieval and also shows how they provide robust foundations to design new access methods such as image collection visualization. A set of experiments performed on a collection of real histopathology images are presented in this chapter, using different techniques for solving the problems described above. The collection is composed of 1,502 images that have been annotated by expert pathologists, information that is used as ground truth. In general, each image is associated to several classes from a list of 18 pathological concepts. The most important findings in image retrieval and visualization are reported to illustrate the potential of machine learning to provide effective models in the problem of accessing a large collection of medical images by
content. The organization of this chapter is as follows: Section "Understanding Medical Images" describes two approaches to represent and understand medical image contents according to the domain knowledge. Section "Medical Image Retrieval" describes a medical image retrieval system based on the query by example paradigm, using visual data and semantic information. Section "Image Collection Visualization" describes methods for image collection visualization and summarization to support exploration in a CBIR system. Finally, the last Section presents the concluding remarks and future work.
Understanding Medical Images Obtaining a representation to understand the structure of images is a problem that may be approached from two different perspectives depending on the underlying task. For image transferring and storing applications, the image representation is an important issue, usually referred to as coding, whose main goal is to define an efficient basis, usually sparse or statistically independent. Then, images are processed as signals that may be decomposed in terms of the basis (Engan & Aase, 1999). On the other hand, for computer vision tasks or semantic content analysis, the image representation is usually designed to capture or represent visual patterns or objects that are meaningful in the application domain (Csurka et al., 2004). There is a main difference between both image representation approaches: given an image representation, semantic content analysis do not attempt to reconstruct the original image in terms of smaller information units as required for coding applications, instead, the main goal is to raise a conceptual understanding of the scene. Usually, image representations for coding applications are easily adapted to semantic content analysis, while the other way around is not always true. This Section presents some methods to represent images for semantic content analysis, a requirement to meet user expectations in a system to access a large collection of medical images. The representation for semantic analysis in medical images has been an active research problem during several years. Automatic identification of normal and abnormal biological structures is the main focus of several methods, including segmentation algorithms, geometric analysis and multiresolution approaches, among others. Many algorithms to recognize medical image contents have been designed to work in specific domains such as spine xrays (Long et al., 2005) and mammographies (Qian et al., 1999), resulting in methods that are difficult or even impossible to apply to other problems. On the other hand, more generic descriptors have been proposed to represent medical image contents, such as downscale representations or histograms (Güld et al., 2004). However, these kinds of descriptors have lack of explicit semantics for many image analysis tasks. The decoupling of image representation and image analysis is herein proposed to effectively understand medical image semantics. The image representation may be followed using generic descriptors or adaptable strategies, while the image analysis is tackled using machine learning algorithms. By dividing the problem in these two steps, a system is able to obtain a successful performance in a wide variety of medical image contexts. In contrast, direct identification of image objects in a unique step may lead to more complex models that are not necessarily applicable to different contexts. This Section presents two strategies for image representation and description: the first, based on the bagoffeatures approach and the second, based on a combination of lowlevel features. Both strategies may be used on different kinds of images since they are generic image representations. Moreover, these representations are obtained by the use of machine learning algorithms to adapt the visual image representation according to the particular image collection. Then, the image analysis is performed using supervised learning methods to include the imagesemantics recognition ability into the system.
The BagofFeatures Representation This representation scheme has been inspired from the text processing community, following two main principles: a document is represented by frequencies of a set of words predefined in a dictionary and the relationships among words are ignored. That approach is known as the bagofwords model for text categorization and retrieval. In the computer vision community, the bagoffeatures representation has been proposed to model image contents using a predefined codebook of visual patterns (codeblocks) and their relationships are also ignored in the main representation (Csurka et al., 2004). It is expected that a learning algorithm would be able to find correlations among codeblocks to recognize complex objects or scenes. To build a bagoffeatures representation in a collection of medical images, there are four steps to be followed: (1) feature detection and description, (2) codebook construction, (3) the bag of features construction for image representation, and, finally, (4) training and evaluation of the learning algorithms. This Subsection will discuss the first three steps of the image representation process and the last step is discussed in the final Subsection. In order to illustrate the different steps of the bagoffeatures representation strategy, this Subsection presents a practical application of the strategy to the histopathology image collection mentioned in the Introduction (Caicedo et al., 2009a). First, the feature detection and description is performed applying a dense random sampling of image blocks. Those blocks are set to 9x9 pixels in different image scales. So the detection of features in this application is a random block selection and the description is given by the explicit block pixels, known as rawblock. Second, the construction of the codebook is performed using unsupervised learning, in particular, using the kmeans algorithm on the complete set of rawblocks extracted from a training image collection. This step makes the bag of features approach a flexible framework for image analysis, since the visual vocabulary is built through the analysis of a large number of patterns from the whole collection. Figure 1 shows a codebook of 150 visual words automatically obtained from the collection of histopathology images. Note that each visual word is highly related with visual primitives in histopathology images. The third and last step is basically done by counting the occurrence of each codeblock inside the image, so a histogram is used to represent the structure of an image.
Figure 1. A codebook automatically identified from the histopathology image collection using unsupervised learning.
One of the most important steps of the bagoffeatures representation is the construction of a visual vocabulary. This step is also known as dictionary learning. In image and video coding applications, dictionary learning refers to finding a set of basis function (Olshausen B. & Field D., 1997; Engan K. et al., 1999; Gribonval R. et al., 2003). Each signal (image, video, etc.) is expressed as a combination of these basis functions, in such a way that the signal may be represented by the corresponding coefficients of the combination. The goal is to find a good set of basis functions that allows representing signals in a compact way (a sparse representation). There is a fundamental difference between coding based on dictionary learning and bagoffeatures representation, in the first case the representations allows to reconstruct the original image (in some cases incurring some lost), in the second case, this is not possible since the 'bag' representation is unable to store visual word locations. However, there are important coincidences between both approaches. In both cases, the dictionary gathers basic patterns that constitute the building blocks of the images in the collection, and these patterns are used as the language to represent the image. As it was mentioned before, an image representation generated for coding applications may be adapted for semantic image analysis. In particular, the coefficients of the linear combination in an image representation for coding may be used to learn the image semantics. In fact, it is done in practice in those approaches in which the image representation is based on Fourier, Wavelet or Gabor transforms. LowLevel Feature Combination Lowlevel features describe basic image characteristics related to colors, textures and edges. In particular, histopathology images are described using six different histogram features (Caicedo et al., 2008a): RGB histogram, Gray histogram, Local Binary Patterns, Tamura texture, Sobel histogram and Invariant Feature Histogram. Note that all lowlevel features used in this experimentation are histograms. The aim is to combine all those features in a unique image representation that includes information of different characteristics. A simple combination of histogram features may be achieved by merging all histograms in a unique feature vector. However, this simple approach is not necessarily the best alternative, since it does not take into account the particular structure of histogram data and the relative importance of each histogram feature. A kernel solution is introduced to achieve the combination of lowlevel visual features. Kernel methods have become a popular approach in machine learning and pattern recognition due to its simplicity and robustness. The main characteristic of these methods is that a similarity function is used to map structured objects into a high dimensional feature space, in which linear patterns can be found (ShaweTaylor & Cristianini, 2004). In this problem, the Histogram Intersection kernel (Barla et al., 2003) is used as an image similarity measure between histograms of the same type. Intuitively, this function measures the common area between two histograms, and it has been proved that this value is the dot product in a high dimensional feature space, hence it is a valid Mercer's kernel. Using this kernel function, a high dimensional feature space is generated for each individual lowlevel feature, although the applied methods do not need to explicitly deal with high dimensional vector representations, i.e. the kernel function implicitly generates such a feature space. Instead, kernelbased learning algorithms only need the inner product information among the vectors in the feature space, and that is exactly the information that the histogram intersection kernel provides. The goal is to combine different feature spaces to obtain a unique image representation with many low level visual features. A linear combination of kernel functions is known to be a valid kernel too, and the feature space induced by the new kernel is composed of all dimensions from the feature spaces generated by basic kernels. Hence, by linearly combining kernel functions of individual lowlevel features, a new image representation space is generated with the information of all visual characteristics. This linear
combination may be parameterized using weights for individual features to emphasize the importance of some of them. In this work, the weighting of lowlevel features is learned by maximizing the kerneltarget alignment measure (Kandola et al., 2002). Kernel alignment measures how good a kernel function is for solving certain classification task. Since the linear combination of lowlevel features may lead to many valid kernels by choosing different weights, the kernel alignment allows identifying the most appropriate combination with respect to the problem at hand. Following this scheme, the image representation is adapted according to the important concepts in the collection. As result, a kernel function is obtained to discriminate each concept, so kernelbased learning algorithms can be directly applied. Learning Image Semantics At this stage, image representation has been modeled using two approaches, both based on machine learning analysis. The bag of features approach, based on unsupervised learning to build a visual vocabulary, and the lowlevel feature combination, based on supervised learning to find appropriate feature weights. These representations include intermediate image semantics, since they have been adapted according to the contents of the whole collection in the first case and according to histopathology concepts in the second case. This Subsection presents a classification strategy to build a more explicit semantic representation of images, that is, a model to automatically recognize concepts in images. This approach is also known as automatic image annotation (Jeon et al., 2003) which aims to select the more appropriate keyword descriptions for images using machine learning. As it was mentioned in the Introduction, the histopathology image collection used in this chapter has example images of 18 different medical concepts. These concepts include biological structures such as glands and abnormal findings such as nodules. The purpose of the autoannotation strategy is to evaluate the contents of each image and decide whether the image contains a particular concept or not. Importantly, images in this collection may contain one or many of those concepts. In this framework, an individual classifier is trained to recognize each concept following a oneagainstall strategy, that is, the system needs 18 binary classifiers each specialized in identifying the presence or absence of a concept. Classifiers used in this framework are Support Vector Machines (SVM) (Schölkopf et al., 2002) that are suitable to find a linear boundary between two classes in the feature space induced by a kernel function. The lowlevel feature combination has an associated kernel function, and since the bag of features is the histogram of visualword frequencies, it is processed using also the histogram intersection kernel. The bag of features in these experiments has been built using a codebook with 250 visual patterns. A more extensive experimentation of the bag of features approach on this image collection can be found in (Caicedo et al., 2009a). Having kernel functions to represent image contents allows a direct use of SVM for recognizing image semantics. For experimental purposes, the data set has been split in 80% for training and 20% for testing. SVM parameters are determined following a 10fold crossvalidation procedure on the training set. Finally, the arrange of 18 SVM is applied to each test image in order to detect the presence or absence of concepts. It leads to a set of annotations for each test image in which performance measures are calculated. Table 1. Performance measures for the automatic image annotation task in a histopathology image collection.
Image Representation
Precision
Recall
FMeasure
Bag of Features
0.67
0.16
0.24
Lowlevel Feature Combination
0.70
0.38
0.48
Experimental results are presented in Table 1. Reported values have been averaged over all 18 possible classes, that is, the precision and recall values have been calculated individually for each class and reported results correspond to an overall performance. Results show that the combination of lowlevel features gives a better performance, both in precision and recall, and hence in FMeasure. One of the reasons for this behavior may be the number of features used in each representation. The bag of features stands on a histogram of 250 bins while the lowlevel feature combination has 256 for each histogram. Hence, the feature space induced by a linear combination of those features is of higher dimensionality than the bag of features one. The unsupervised or supervised nature of each strategy may also impact the results, since the lowlevel feature combination includes information about the target classification task. Although a lowlevel feature combination has beaten the bagoffeatures approach, it does not mean that this strategy is not useful in general. This is a particular task on histopathology images, but there are medical images from many other modalities in which this representation scheme may be useful. For example Tomassi et al. (2007) have shown that the bag of features is a successful approach to automatically annotate thousands of radiology images. In such a problem, the combination of the low level features as described in this Section may lead to poorer results since, for example, radiology images do not include color information. However, the proposed scheme to combine different features could be easily extended to include other visual features that are more appropriate for the application domain. Furthermore, the bag of features may be an additional input to this novel approach, so local and global information will be included in a unique representation. Medical Image Retrieval A contentbased image retrieval system in medical applications is oriented to find similar images following a query by example procedure. That is, the user picks an image of interest and requires a set of similar images from the database. Under this scheme, the system processes the input image to evaluate its contents and extract their features. All images in the database have been previously processed using the same algorithms, then, the system matches query features against image database features using a similarity measure. Finally, the system sorts results by similarity and returns the set of most similar images to the user. This mechanism is followed by the majority of the systems for contentbased image retrieval, in which two main components are considered: (1) feature indexing and (2) similarity measures or search algorithms. The problem of accessing an image collection to retrieve useful pictures was firstly approached by the research community in computer vision. Therefore, it was proposed the use of matching algorithms and registration methods to calculate image similarity. The evaluation of all kind of visual features was proposed, including global and local features, Fourier and wavelet coefficients, salient points and regions of interest (Deselaers, 2003), among others. In medical image retrieval some systems have been developed following these strategies (Müller et al., 2004). One of the main issues of these approaches is the semantic gap, i.e. the lack of coincidence between calculated visual features and human interpretations about image contents (Smeulders et al., 2000). In other words, many algorithms proved their ability to match visual properties even under complex deformations, but human beings instantly recognize that, although the
visual appearance is alike, image contents refer to different real world concepts. Subsequent developments restricted the application domain to model in a more precise way the kind of objects and entities present in images. For instance, CBIR systems devoted to search on a spine xray image database (Xu et al., 2008) or HRCT images of the lung (Shyu et al., 1999). In that case, advanced computer vision and image processing techniques may be designed to directly identify or segment objects of interest, so the semantic interpretation is close to the calculated features. However, these techniques are usually not able to handle other visual contents, so they are not extensible to other domains. Machine Learning provides a more flexible framework to deal with such problems. As it was mentioned in the previous Section, following a machine learning approach, it is possible to decouple the image content characterization from the image content interpretation. The former may be obtained using a variety of features to guarantee a precise description of visual contents, and the later may be modeled using different kinds of classification methods. This separation is nowadays followed in many image processing and computer vision tasks, such as image classification, image categorization and object detection. Particularly in image retrieval, the problem has been oriented to an automatic image annotation task, that is, given a set of images with examples of concepts in a restricted vocabulary, the system learns a model to generate semantic descriptions for images that do not have any annotation. This approach has been successfully applied to different medical image retrieval problems (Müller et al., 2008), including histology image retrieval (Tang et al., 2003). This Section presents a general autoannotation strategy to build a semantic index for searching medical images, using content representations described in Section 2. In particular, the image classification methods are used to generate automatic annotations on which a similarity measure is applied to find images semantically related to the query. Semantic indexing of Medical Images The architecture of a system to search images by content using the query by example paradigm is shown in Figure 2. This system considers the characterization of image contents and detection of image semantics in two separate submodules: lowlevel feature extraction and automatic semantic annotations. Using these submodules the image search index is built. The other important component of the system is the retrieval algorithm that stands on a similarity measure. As is shown in the Figure, similarity scores are calculated with respect to semantic annotations, so that the top result images are expected to share meaningful relationships according to the domain knowledge. Additional similarity measures may be included into the system to search using visual information, i.e. for searching visually similar images.
Figure 2. The architecture of a system for semantic image retrieval. The system implementation starts with the incorporation of feature extraction algorithms. Features used in this system include colors, edges and textures as was described in Section 2. When a new image is stored in the system, six histogram features are extracted and stored. In this implementation, histograms constitute a visual index for images in the database. To bridge the semantic gap, the automatic annotation module operates on lowlevel features to generate semantic descriptions for each image. This module requires a semantic vocabulary defined by domain experts and also a set of images with examples of each semantic term to learn the autoannotation model. The vocabulary may be automatically extracted from the metadata associated to images, although, in some cases, metadata is not as reliable as users would expect, and the manual semantic description is required. Importantly, once a subset of images from the database has been annotated, the system automatically learns to identify concepts in other unlabeled images. In addition, if new contents are added to the image database or the vocabulary is extended, the system only requires a set of examples with the new knowledge. In the histopathology image database used through this chapter, a set of 18 different concepts have been identified by expert pathologists. An image in this collection may have one or several associated concepts. For each semantic concept in the predefined vocabulary, a classifier is required to recognize that kind of images. Classification algorithms herein used are Support Vector Machines (SVM) (Schölkopf, B. & Smola, A., 2002) that receive as input a kernel obtained from the combination of different lowlevel features. Each classifier indicates whether an image has the associated concept or not. For the construction of the search index, the binary decisions of SVM’s are not used. Instead, a presence degree or a probability that the image has the associated concept is modeled using the continuous output of the classification function (Caicedo et al., 2008b). In that way, each image has as many probabilities of being associated to concepts as the size of the semantic vocabulary, leading to a semantic feature vector. This representation scheme is very similar to a vectorspace model for information retrieval (Manning et al., 2008), in which documents are indexed using term frequencies. The main difference with the present
strategy is that term frequencies in the image index have been generated from the analysis of visual contents. Once the semantic index for image contents has been built, the next component of the system is related to search algorithms. So, given a query image, the system extracts a set of lowlevel features, process them in the automatic annotation submodule construct a semantic representation and finally the similarity with other images is calculated. The similarity measure should exploit the structure of the image representation to accurately discriminate images with similar contents. For instance, if the search is set to use histogram features, a similarity measure for histograms should be preferred. In this work the histogram intersection kernel is used as similarity measure between visual contents. In the case of semantic descriptions, the cosine similarity or the Tanimoto coefficient may be applied, similarly to textbased information retrieval systems (Manning et al. 2008). Evaluation of CBIR systems This Section has introduced an indexing scheme for medical images with a flexible strategy to represent semantic concepts. In addition, the system architecture may be extended to search using visual features by adding an appropriate similarity measure for that kind of contents. However, although it has been repeatedly highlighted that visual contents alone are not suitable to search images by content, it is desirable to assess the actual performance improvement of a semantic strategy. In general, the evaluation of information retrieval systems is a useroriented task with a subjective bias due to particular user preferences. However, the research community has adopted some measures that evaluate the desired behaviour of a retrieval system that, in general, are related to the relevance of the retrieval results. There are two main measures for CBIR system evaluation: Precision, defined as the proportion of relevant images in a subset of the results, and Recall, the proportion of all relevant images that have been retrieved (Manning et al., 2008). Associated to these, the Mean Average Precision (MAP) can also be calculated. In general, information retrieval is a precisionoriented task, so the higher the scores related with precision the better the system. Other family of measures includes Rank of the First image Retrieved (Rank1), that is, in which position of the result list is the first relevant image, and Average Rank, the average position of all relevant images in a result list. To calculate these measures, an assessor is required to judge the results of a set of particular queries. In other words, the assessor indicates which images are relevant to each query. A set of experiments have been carried out on the histopathology image database to evaluate the performance of both, the visual based search and the semanticbased search. A set of approximately 160 different queries were defined and performance measures were averaged. A list of results for a query are evaluated as relevant or not according to the annotations made by pathologists. Table 2. Performance measures for CBIR experiments on the histopathology image collection Strategy
MAP
Rank 1
Avg. Rank Prec. at 1 Prec. at 20
Best Lowlevel Feature alone
0.10
11.71
588.5
0.56
0.26
Semantic Representation
0.23
16.14
256.3
0.59
0.53
Table 2 shows performance measures of the experimentation, in which the lowlevel feature with best
performance is compared with the semantic search. The best lowlevel feature in these experiments was the Sobel Histogram. The Table shows a very high performance improvement when using the semantic strategy, for instance in terms of MAP, in which more than 100% of improvement is achieved. In addition, from the initial precision (Prec. at 1) to the precision in the 20th result (Prec. at 20), the semantic strategy maintains a better response. That is, more than half of the results in the first 20 images are relevant in average, when searching with the semantic strategy, while using visual features the proportion of relevant images is only a quarter. It shows that the semantic strategy provides a set of results that are more suitable to match user expectations as shown by Figure 3, which illustrates a contentbased query. An image example of the concept lymphocyte infiltrate is used to query the system. Figure 3.a shows that the results obtained using visual search are visually alike to the query image. All of them present similar colors and a global similar appearance. However, only the first two results are relevant in the top 5 results. In contrast, all images retrieved by the semantic strategy are relevant in the top 5 results as is shown in Figure 3.b, even when the query and the results do not share a completely similar appearance.
Figure 3.a. Results obtained when the system uses only visual information.
Figure 3.b. Results obtained when the system uses semantic annotations. Figure 3. Illustration of a contentbased query. The query is the first image from left to right. The top5 results are shown in relevance order from left to right. Results are marked with R if they are relevant and with N if they are not. The query image is used to search for images with lymphocyte infiltrate. (Caicedo et al., 2009b) This Section has presented the first strategy to access a medical image collection: a CBIR system that uses the query by example paradigm. This strategy uses a semantic search index, which is built from visual content analysis using machine learning algorithms. This access method is suitable for users that have an image at hand to query the system, for instance, when physicians receive medical images to diagnose the actual health state of patients or when professors or students are reading digital libraries of medicine. Since the search target for these users are images, a keyword based query is not always the most appropriate strategy to query the system. Methods presented in this Section are intended to enable users to search similar images when some contents caught their attention or when they want to find reference images to support a decision making process. Image Collection Visualization
Image collection visualization consists of methods to visualize collections of images such that user can see the structure of the data set and explore it in an intuitive way. In the medical domain, visualization may be used by physicians to easily find images that provide important information to new medical cases. Image collection visualization techniques provide a good alternative to generate compact representations of the collection allowing to navigate for quickly finding the information needed and to discover the underlying structure of the image collection. The use of projection methods based only on lowlevel features is a common image collection visualization strategy, but its main drawback is that it ignores the semantic content associated to images. Systems such as Google Image Search present image query results using a regular grid image arrangement ordered according to relevance. This type of visualization has some problems: (1) The visualization does not make explicit the relationships among the retrieved images; (2) user can see a limited set of results per page but it is not possible to visualize the global structure of the collection; and (3) navigation controls are not intuitive, they are inspired on relational database controls where users explore the data represented as table records. Machine learning methods and information visualization techniques offer powerful tools that may help to solve these problems. A visualization framework Figure 4 shows a general diagram of the visualization process. First, visual features are extracted to build an image representation based on lowlevel characteristics. Next, an image similarity matrix is calculated. Ideally, the similarity measure among images should involve semantic knowledge instead of only a visual similarity as has been discussed in previous Sections. Next, a summary is built, i.e., an image collection subset that faithfully represents the entire collection. Finally, it is necessary to reduce the high dimensionality of the original image representation into a low dimensional space for producing a set of coordinates that are projected into a 2D layout.
Figure 4. An overview of the process for visualizing and summarizing an image collection
The image feature extraction process was already discussed in the "Image Understanding" Section. In that Section, a specialized kernel function that optimally combines lowlevel features to better represent the collection concepts was also presented. This kernel is in fact a similarity measure and, thus, it can be used to compute the similarity matrix required by the visualization process. The summary is built using clustering strategies such as kmeans and spectral clustering. The summary is constructed in the original high dimensional space and then projected into two dimensions. Finally, the summary is projected into a 2D space using one of the projection methods that will be described in the next Subsection. Making explicit the image relationships In general, an image is represented by a large set of features, which implies a highdimensional representation space. The visualization of this space requires its projection into a lowdimensional space, typically 2D or 3D, without losing much information. The main problem is how to project the original image space into a 2D space. Projection methods formally state this problem as follows. Let D = {d1,... , dn} be the image collection and let S: D2→R be a similarity measure between two images. The goal is to find a function P() such that Corr(S(di, dj), ||(xi, xj, yi yj)|| 2 ) ~ 1 where (xi, yi) = P(di) and (xj, yj) = P(dj). That is to say, a projection function such that there is an inverse correlation between the similarity of two arbitrary images and the Euclidean distance between their corresponding projections. This general problem has been dealt with using different approaches, which are briefly discussed in the following paragraphs. There are different methods for reducing the dimensionality of a set of data points. Generally these methods select the dimensions that best preserve the original information. Methods like Multidimensional Scaling (MDS) (Torgerson, 1958), Principal Component Analysis (PCA) (Jolliffe, 1989), and Isometric Feature Mapping (Isomap) (Tenenbaum et al., 2000), have been useful for this projection task. MDS is a technique that focuses on finding the subspace that best preserves the interpoint distances. The linear algebra solution for the problem involves the calculation of Eigenvalues and Eigenvectors of a scalar product matrix and a proximity matrix. The input is a similarity matrix of images in a high dimensional space and the result is a set of coordinates that represent the images in a low dimensional space (Zhang, 2008). ISOMAP uses graphbased distance computation in order to measure the distance along local structures. The technique builds the neighborhood graph using knearest neighbors and the Dijkstra’s algorithm to find shortest paths between every pair of points in the graph. The distance for each pair is assigned to the length of this shortest path and finally, when the distances are recomputed, MDS is applied to the new distance matrix (Nguyen & Worring, 2008). Additionally to ISOMAP, which is a method that preserves the nonlinear structure of the relationships, there exist other methods like Locally Linear Embedding (LLE) (Roweis, 2000), an unsupervised learning algorithm that computes lowdimensional neighborhood preserving embeddings of high dimensional data. SNE (Hinton & Roweis, 2003) is a method based on the computation of neighborhood probabilities
assuming a Gaussian distribution, in both the high dimensional and the 2D space. The method then tries to match the two probability distributions. Nguyen & Worring (2008), proposes a combination of nonlinear methods to build new methods. The methods described above help to make the relationships among images explicit. This is accomplished by mapping similar images to close coordinates in the lowdimensional representation space. This implicitly associates regions of the representation space with different low and highlevel patterns shared by the images mapped to the respective region. Building an image collection summary In a relative large image collection it is not possible to simultaneously display all images to the user. Therefore, it is necessary to provide a mechanism that summarizes the entire collection. This summary represents an overview of the dataset and allows the user to start the exploration process. How to build an image collection summary? How to measure the quality of the summary? How many images are sufficient and necessary to build a summary that expresses the underlying structure of the entire collection taking into account the limitation of screen devices? Is it possible to use this summary as a structure for indexing the collection? These questions are a matter of current active research, however the following paragraph offer some insights that may help to answer them. The summary can be built using clustering methods (Stan & Sethi, 2003; Simon et al, 2007), similarity pyramids methods (Chen et al, 2000), graph methods (Cai et al., 2004; Gao et al., 2005), neural networks methods (Deng, 2007), among others. For example, kmedoids is a clustering algorithm related to the k means algorithm. It breaks the dataset up into groups and attempts to minimize squared error, the distance between points in a cluster and a point designated as the centroid of that cluster. This algorithm chooses k data points as centers (images in this case). The main goal here is to obtain the k most representative images of the collection in order to show them to user. A good summary may be a collection subset where a user can easily find one or more interesting images. A bad summary may be one that does not permit finding images easily. The quality of the object collection summaries may be calculated using quality measures from clustering algorithms (NG & Jiawei, 2002). Measures such as separation and cohesion may be used to measure the compactness of images belonging to the same clusters (similar images) and the distance among images belonging to different clusters (dissimilar images). Entropy, a concept from the information theory, may be also used to measure the homogeneity of the image summary with respect to the a priori knowledge (class labels) in the training phase.
Figure 6. Visualization of the histopathology collection highlighting the most representative images obtained in the summary A prototype system was built for testing some of the proposed visualization strategies. Different image collection summaries of the histopathology data set were obtained using different approximations. Figure 6 shows a visualization of the histopathology image collection obtained with the framework described in the first Subsection. Herein the RGB lowlevel feature representation and Isomap projection method for visualizing were used. In this Figure are highlighted the most representative images (medoids) to illustrate its layout with respect to the entire collection. In the ideal case, a summary composed by 18 images should have one representing of each class. Figure 7 shows the summary obtained previously but only visualizing the images belonging to summary.
Figure 7. Visualization of the collection summary
The visualization may be improved by the user feedback. Relevance feedback (Salton & Buckley, 1990) is a common mechanism in information retrieval systems for query reformulation based on the user interaction. The main idea is to choose important terms from previous queries that returned relevant documents in order to build a new expanded query. In this case, it is possible to learn from the user selection actions to automatically modify the visualization. This Section has presented the second strategy to access a medical image collection: a visualization framework that allows the physicians to visually explore the collection. This access method is suitable when user does not have an image at hand to query the system. With the methods presented in this Section, users can see the global structure of the data set and visually navigate it. In this exploration process user can find a useful image based on the visual similarity among images. Future Research Directions This chapter has presented two different strategies to access a medical image collection. Although the presented methods provide effective solutions to find and explore relevant images, there are still different research problems to be addressed. This Section highlights some future research directions. Medical Image Representation
Understanding image structure is one of the main problems to access an image collection by content. The bag of features is an interesting approach that may be adapted in different ways, for instance, instead of using static blocks, algorithms to detect relevant points or regions in medical images may be considered. In addition, more robust visual descriptors may also be applied. Another interesting research direction for image representation is the study of the theoretical properties of dictionary learning from image coding. The number of components in the dictionary, their orthogonality or statistical independence is a well defined problem in image coding, while this is usually ignored in the bagoffeatures approach. So that, the connections between both disciplines may lead to more robust models for image content analysis. Kernel methods to represent medical images are also an interesting strategy that both, describes image contents and provide similarity measures. In particular, kernel methods provide a framework to deal with structured data instead of simple feature vectors. An example of structured data was presented in this chapter, using a multiple histogram representation, but other more complex representations may be designed using trees or graphs. Medical Image Retrieval Contentbased image retrieval is an active research area nowadays. The main problem of image retrieval is the definition of semantic strategies that allow finding images with the right information. The state of the art in image retrieval includes the use of several strategies such as modeling invariant and statistical image signatures, classification and clustering of image representations, relevance feedback, automatic image annotation and multimodal fusion. All these approaches address the problem of finding semantically related images from a collection of very diverse contents taking into account that usually only visual approaches do not match human interpretations. In particular, autoannotation models to explain image contents in a detailed way using words will offer an effective image access solution. On the other hand, multimodal fusion will provide strategies to take advantage of visual features and surrounding text annotations together, as may be present in many medical documents, including scholarly articles and health records. Image Collection Visualization Medical image collection visualization is an unexplored area that offers interesting and challenging problems. Exploration issues can be addressed in order to learn from user interaction and to improve the visualization according to the browsing process. Currently, in the literature of image collection visualization, there is a lack of formal methods for measuring the quality of different visualizations. The majority of the works propose experimental setups with user participation in which aspects like search time, usage easiness and user experience are experimental goals. However, although these aspects are very important, it would be useful to define formal measures that allow to objectively measuring the quality of the methods. Psychophysical experiments with physicians to evaluate the visualization framework should be also addressed in future work. It would be interesting to address visualization issues from a human computerinteraction perspective; new devices like multitouch screens for interacting with the screen and making easier the exploration process are also interesting challenges. Access Performance The summary structure obtained in a summarization process may be used for building an index to search for similar images, since this structure represents a synthesis of the semantic content of the whole collection. Images in the summary structure may be used as index pivots to solve querybyexample
requests. So, when a query is executed, instead of calculating the similarity measure against all images in the database, the algorithm only takes into account the pivots. The query may be propagated through the most similar pivots to the next level of the index structure and thus to find the most similar images in the collection. Although evidently the performance of executing a query will be improved, probably other performance measures like precision and recall may be affected. So it is an interesting challenge to find an index structure that takes this into account. Conclusions The problem of accessing a medical image collection has been considered in this Chapter. The huge amount of medical images produced routinely in health centers demands effective and efficient techniques to search explore and retrieve useful information. Traditional information systems are able to deal with just alphanumeric data in relational databases and since images are a more complex data type with implicit semantics, that kind of systems are not useful to provide full access to an image collection. Moreover, standard computational methods to manage information are not enough to deal with image collection complexities. Currently, academic image collections for classroom study or advanced research in medicine are managed by an expert who carefully organizes images according to domain knowledge criteria. However, these collections have no more than a few hundred images, since the capacity of human beings to deal with large data collections is limited. On the other hand, computers are able to deal with large amounts of data but do not have the ability to interpret or understand knowledge as human experts do. An approach to deal with such complex knowledge is machine learning, a branch of artificial intelligence that enables computers to learn from examples or to discover patterns in large data sets. This chapter presented two contentbased strategies to access medical image collections, using machine learning as core approach, the first is a search system based on the querybyexample paradigm and the second is an exploration system based on the visualization of the whole collection structure. Both strategies pose different computational challenges that are effectively approached using machine learning algorithms taking into account that image semantics and image collection structure are determined by complex patterns. Starting from the medical image understanding in terms of its visual structure and semantic interpretation, supervised and unsupervised learning offered effective methods to represent image contents, using a bag of features approach and an optimized combination of lowlevel features. The architecture of a system to search images by content was presented, in which a semantic index to search for medical images is built using supervised learning. In particular, SVM classifiers are used to determine from visual contents whether images contain or not certain concept from a predefined semantic vocabulary. Then, the system is able to search for images with similar concepts even in the absence of descriptive metadata. In fact, the semantic contentbased index is designed to face the problem of incomplete or ambiguous text descriptions associated to medical images, so the system learns to recognize a set of concepts to automatically annotate all images in the database. Furthermore, the index may be used such that the user do not need to define a keywordbased query, instead, an example image is provided and the system is responsible of doing the work. On the other hand, a visualization framework was presented as a new strategy to access medical images, especially when example images are not available to query the system. This strategy allows exploring the image collection structure to identify relevant information in an intuitive way, since the contentbased
relationships are made explicit. Information visualization aims finding new ways to display information to users due to the fact that conventional methods are not sufficient. Contentbased access methods to medical image collections will improve the quality of health services in modern hospitals, using advanced and easytouse systems to support the decision making process in medicine. In addition, effective access methods will allow students, professors and researchers to deal with larger collections of medical images, in which a wide variety of clinical knowledge currently remains unused.
References Barla, A., Odone, F., & Verri, A. (2003). Histogram intersection kernel for image classifi cation. Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on, 3:51316. Cai, D. He, X. Li, Z. Ma, W.Y. & Wen, J.R. (2004). Hierarchical clustering of www image search results using visual, textual and link information. Paper presented at the 12th annual ACM international conference on Multimedia, 952–959. Caicedo, J. C., Cruz, A., & Gonzalez, F. (2009a). Histopathology image classi cation using bag of features and kernel functions. Arti cial Intelligence in Medicine Conference, AIME 2009, LNAI 5651:126135. Caicedo, J. C., Gonzalez, F. A., & Romero, E. (2009b). Contentbased medical image retrieval Using a Kernelbased Semantic Annotation Framework. Technical Report UNBI200901. Bioingenium Research Group National University of Colombia. Caicedo, J. C., Gonzalez, F. A., & Romero, E. (2008a). Contentbased medical image retrieval using low level visual features and modality identi cation. CLEFF 2007 Proceedings in the LNCS Series. Caicedo, J. C., Gonzalez, F. A., & Romero, E. (2008b). A semantic contentbased retrieval method for histopathology images. Information Retrieval Technology, LNCS 4993:5160. Chen, J. Y. Bouman, C. A. & Dalton, J. C. (2000) Hierarchical browsing and search of large image databases. IEEE Transactions on Image Processing, 9(3), 442–455. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision. Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, in uences, and trends of the new age. ACM Comput. Surv., 40(2):160. Deng, D. (2007). Contentbased image collection summarization and comparison using selforganizing maps. Pattern Recognition, 40(2), 718727. Deselaers, T. (2003). Features for Image Retrieval. PhD thesis, RWTH Aachen University. Aachen, Germany.
Gao, B. Liu, T.Y. Qin, T. Zheng, X. Cheng, Q.S. & Ma, W.Y. (2005). Web image clustering by consistent utilization of visual features and surrounding texts. Paper presented in MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, pages 112–121, New York, NY, USA, 2005. ACM. Güld, M. O., Keysers, D., Deselaers, T., Leisten, M., Schubert, H., Ney, H., & Lehmann, T. M. (2004). Comparison of global features for categorization of medical images. Medical Imaging, 5371:211222. Hinton, G. & Roweis, S. (2003). Stochastic neighbor embedding. In Advances in Neural Information Processing Systems 15, (pp. 857872). MIT Press. Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross media relevance models. In ACM SIGIR Conference on Research and development in informaion retrieval, pages 119126, New York, NY, USA. ACM Press. Jolliffe, I. (1989). Principal component analysis. SpringerVerlag. Kandola, J., ShaweTaylor, J., & Cristianini, N. (2002). Optimizing kernel alignment over combinations of kernel. Technical report, Department of Computer Science,Royal Holloway, University of London, UK. La Cascia, M., Sethi, S., & Sclaroff, S. (1998). Combining textual and visual cues for ontentbased image retrieval on the world wide web. In In IEEE Workshop on Contentbased Access of Image and Video Libraries, pages 2428. Long, L., Antani, S., & Thoma, G. (2005). Image informatics at a national research center. Computerized Medical Imaging and Graphics, 29(23):171193. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. Müller, H., KalpathyCramer, J., Jr, C. E. K., Hatt, W., Bedrick, S., & Hersh, W. (2008). Overview of the ImageCLEFmed 2008 medical image retrieval task. Working Notes for the CLEF 2008 Workshop. Müller, H., Michoux, N., Bandon, D., & Geissbuhler, A. (2004). A review of contentbased image retrieval systems in medical applications, clinical bene fits and future directions. International Journal of Medical Informatics, 73(1):123. Müller, H., MarchandMaillet, S. & Pun, T. (2002). The truth about Corel Evaluation in Image Retrieval. Paper presented at the International Conference on the Challenge of Image and Video Retrieval (CIVR 2002), pages 3849, SpringerVerlag, 20. Ng, R. T. & Jiawei, H. (2002). CLARANS: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5), 10031016. Nguyen, G. P. & Worring, M. (2008). Interactive access to large image collections using similaritybased visualization. Journal of Visual Languages & Computing, 19(2), 203–224.
Pitkanen, M. J., Zhou, X., Hyvarinen, A., & Muller, H. (2008). Using the grid for enhancing the performance of a medical image search engine. In ComputerBased Medical Systems, 2008. CBMS '08. 21st IEEE International Symposium on, pages 367372. Qian, W., Li, L., & Clarke, L. P. (1999). Image feature extraction for mass detection in digital mammography: In uence of wavelet analysis. Medical Physics, 26(3), 402408. Roweis, L. S. S. (2000). Nonlinear dimensionality reduction by locally linear embedding. (Tech. Rep. Science 290 (5500) 2000, 2323, 2326). Salton, G & Buckley, C. (1990). Improving Retrieval Performance. Journal of the American Society for Information Science, 41(4), 355364. Schölkopf, B. & Smola, A. (2002). Learning with kernels. Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press. ShaweTaylor, J. & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. New York, NY, USA, Cambridge University Press. Simon, I. Snavely, N. & Seitz, S. M. (2007). Scene summarization for online image collections. Paper presented in IEEE 11th International Conference on Computer Vision. ICCV 2007. Shneiderman, B. (1997). Designing the User Interface: Strategies for Effective HumanComputer Interaction. (3rd ed.). Boston, MA, AddisonWesley Publishing. Shyu, C.R., Brodley, C., Kak, A., Kosaka, A., Aisen, A. M., & Broderick, L. S. (1999). ASSERT: A physicianintheloop contentbased retrieval system for hrct image databases. Computer Vision and Image Understanding, 75:111132. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Contentbased image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12), 13491380. Stan, D. Sethi, I. K. (2003). eID: a system for exploration of image databases. Information Processing & Management, 39(3), 33536. Tang, H. L., Hanka, R., & Ip, H. H. S. (2003). Histological image retrieval based on semantic content analysis. Information Technology in Biomedicine, IEEE Transactions on, 7(1):2636. Tenenbaum, V., J. B. de Silva & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 260:2319–2323. Tommasi, T., O. Orabona F., & Caputo B. (2007). CLEF2007 Image annotation task: An SVMbased cue integration approach. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary. Torgerson, M. (1958). Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419. Xu, X., Lee, D.J., Antani, S., & Long, L. R. (2008). A spine xray image retrieval system using partial shape matching. Information Technology in Biomedicine, IEEE Transactions on, 12(1):100108.
Zhang, J. (2008). Visualization for Information Retrieval. Springer.