A Machine Learning Framework for UNIVERSIDAD Image Collection Summarization BioIngenium NACIONAL DE COLOMBIA Research Group
Jorge E. Camargo M.Sc, Fabio A. González Ph.D
SEDE BOGOTÁ D.C.
Doctorado en Ingeniería de Sistemas y Computación Bioingenium Research Group Universidad Nacional de Colombia
[email protected]
Objective We propose a machine learning framework for summarizing and visualizing large collections of images. Due to it is not possible to visualize all images, it is necessary to visualize a summary that represents the entire collection. In the process, image features are extracted, domain knowledge is involved in the computation of the similarity function, and a summary is built using clustering methods for selecting an image collection subset that faithfully represents the complete data set. Finally, dimensionality reduction techniques are used for projecting images into a 2D space.
Framework Image collection
Projection
Involving domain knowledge
Feature extraction
Clustering
Results Histopathology image collection
Corel image collection
Collection summary
Methodology Feature extraction Images are processed for extracting low-level features: Texture, borders, color and grays
Conclusions This framework is a new strategy to access medical images, specially when example images are not available to query the system.
Involving domain knowledge Domain knowledge is involved using kernel alignment method. We combine low-level features and expert annotations in an optimal way for obtaining a semantic representation Clustering Distance matrix is clustered for selecting the k most important images. Similar images will be in the same cluster Visualization Dimensionality reduction techniques like Multidimenisional Scaling and Isomap are applied for obtaining a 2D representation
This strategy allows to explore the image collection structure to identify relevant information in an intuitive way, since the content-based relationships are made explicit. Information visualization aims finding new ways to display information to users due to the fact that conventional methods are not sufficient. With the exponential growth of multimedia content and the easiness of publishing, new information access mechanisms are required.
References [1] Cai, D. He, X. Li, Z. Ma, W.-Y. & Wen, J.-R. (2004). Hierarchical clustering of www image search results using visual, textual and link information. 12th annual ACM international conference on Multimedia, 952–959. [2] Chen, J. Y. Bouman, C. A. & Dalton, J. C. (2000) Hierarchical browsing and search of large image databases. IEEE Transactions on Image Processing, 9(3), 442–455. [3] Deng, D. (2007). Content-based image collection summarization and comparison using self-organizing maps. Pattern Recognition, 40(2), 718-727. [4] Simon, I. Snavely, N. & Seitz, S. M. (2007). Scene summarization for online image collections. Paper presented in IEEE 11th International Conference on Computer Vision. ICCV 2007. [5] Stan, D. Sethi, I. K. (2003). eID: a system for exploration of image databases. Information Processing & Management, 39(3), 335-36.