Nonparametric Bottom-Up Saliency Detection Using ...

Viewer
Transcript

Nonparametric Bottom-Up Saliency Detection Using Hypercomplex Spectral Contrast ∗ Ce Li, Jianru Xue, Nanning Zheng, Zhiqiang Tian

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University Xi’an, Shaanxi, P.R.China, 710049

{celi, jrxue, nnzheng, zqtian}@aiar.xjtu.edu.cn

ABSTRACT Saliency detection is an useful technique for image semantic analysis such as auto image segmentation, image retargeting, advertising design and image compression. Inspired by two existing saliency detection algorithms, named spectral residual (SR) and phase spectrum of quaternion Fourier transform (PQFT), we propose a new bottom-up saliency detection method which is featured with the introduction of hypercomplex spectral contrast (HSC) in saliency detection. The proposed HSC algorithm introduces the HSV color image vector space in hypercomplex number, and is better comprehensive to consider amplitude spectral contrast into saliency model as well as phase spectral contrast. Meanwhile, we also incorporate the human vision nonuniform sampling into our model, which is a common phenomenon that directs visual attention to the logarithmic center of image in natural scenes. Experimental results on two public saliency detection datasets show that our approach performs better than four state-of-the art approaches remarkably.

Categories and Subject Descriptors I.2.10 [ARTIFICIAL INTELLIGENCE]: Vision and Scene Understanding—Perceptual reasoning

General Terms Algorithm, Experimentation, Performance

Keywords Visual saliency, Hypercomplex Fourier Transform, Spectral Contrast, Nonuniform Sampling

1.

INTRODUCTION

Selective visual attention is a important mechanism in human brain and visual system. Human always can quickly ∗Area chair: Tat-Seng Chua

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’11, November 28–December 1, 2011, Scottsdale, Arizona, USA. Copyright 2011 ACM 978-1-4503-0616-4/11/11 ...$10.00.

1157

focus on some salient regions in an image or a video. These salient regions often include important semantic content, applied for image semantic analysis such as image segmentation [5], image editing [2] and image compression [14] etc. Current methods of saliency detection are usually considered as a process of bottom-up or top-down. Most existing saliency detection models are based on the bottom-up computational framework. Among these models, Itti and Koch’s model [9] is the most famous one. According to the Feature Integration theory [13], they detected saliency map by the center-surround operator and normalization a set of low-level features. Based on the model, N.Bruce et al. proposed an information maximization detection model [3]; Liu and Zheng decided visual attention by CRF learning algorithm[11]; Goferman introduced context information in salient object detection [6] etc. Recently, different from classic image statistical models, Hou [8] designed a quick Fourier spectral residual analysis algorithm for static image saliency. In this method, amplitude spectral residual is considered as an important key factor to stimulate visual attention. Furthermore, Guo [8] proposed saliency detection algorithm by phase spectrum of quaternion Fourier transform. Achanta [1] gave out a simple and effective solution of salient region detection by frequency tuned. Compared with the above models, visual saliency detection in frequency domain has become one of research focus. But for saliency detection, it is a problem which is the amplitude spectrum important or phase spectrum important? From Piotrowski’s viewpoint [12], We tend to think that the phase spectrum includes image structure information, and the amplitude spectrum carries the visual perception magnitude information. Hence, we propose a algorithm named HSC which is a comprehensive consideration with amplitude spectrum and phase spectrum to saliency detection in a multi-scale hypercomplex [4] HSV color space, supported by existing theories [13, 10]and existing saliency detection methods [9, 11, 8, 7, 1, 6]: (1). In the frequency domain, amplitude spectrum and phase spectrum are both significant to saliency detection. Only salient amplitude or phase spectrum could not reconstruct whole saliency map. (2). Saliency map is the product of various visual features of comprehensive stimulate. United Multi-feature vector expression would be a rapidly computation method. (3). Pixel’s position is important to saliency detection in an image. It is different that the visual attention with the different position in an image. The contribution of this paper is twofold. On one hand,

In order to facilitate the next step, the hypercomplex number HSV image q’s pixel is given by pixel symplectic decomposition as Eq.(2):

we propose new saliency detection by hypercomplex spectral contrast (Section 2.1). On the other hand, we introduce a log-polar bias sampling mechanism to imitate a nonuniform sampling of human vision for visual attention (Section 2.2). Section 3 reports the experimental results carried out for comparing our proposed model with four state-of-the art methods on more than 1000 natural and psychological images. Finally, we conclude with a discussion in section 4.

2.

q=f1 + f2 j; f1 = Hi; f2 = S + V i

2.2

Saliency Detection Using HSC

Usually, salient visual stimulus is often generated by strong contrast signals in bottom-up model. These signals often have larger energy of spectrum. In other words, some strong spectral contrast of amplitude and phase are the main components in salient signals. In this paper, we calculate the amplitude spectrum and phase spectrum using hypercomplex Fourier Transform(HFT) [4] of HSV color image. Based on Eq.(2), HFT of the hypercomplex image q can be calculated by two complex Fourier transforms of the symplectic parts, such as Eq.(3):

OUR APPROACH

The framework of our approach is illustrated in Figure 1. Inspired by[7] and [8], we compute amplitude spectrum contrast and phase spectrum contrast in multi-scale HSV color space, by hypercomplex Fourier transform respectively . In this case, the saliency map could be produced using two hypercomplex spectral contrast map at same time by reconstruction and nonuniform sampling. Our HSC method mainly contains four steps:

Q[u, v]=F1 [u, v] + F2 [u, v]j

Step 1 : Converting a raw image I to the HSV color Space, then I was blurred by 2D Gaussian on three level pyramid to eliminate fine texture details as well as to average the energy of image I. Step 2 : Representing image pixels by pure quaternion (hypercomplex) on HSV color space, then we calculate the amplitude spectrum and phase spectrum of the image by hypercomplex Fourier transform[4] in different scales. Step 3 : Under various scales of raw image, we calculate the spectral contrast of amplitude and phase between raw image and blurred image then to reconstruct these contrast maps using amplitude spectral contrast and phase spectral contrast. Step 4 : Normalizing these reconstructed spectral contrast maps, and using log-polar nonuniform sampling, we could obtain the final saliency map.

Q= kQk ejφ

Hypercomplex of HSV Color Image

q=Hi + Sj + V k 2

(5)

where kQk , φ and j are the amplitude spectrum, phase spectrum and unit pure hypercomplex number, respectively. In the next subsection, we first define single-scale saliency of HSC. Furthermore, we would introduce multi-scale analysis into the HSC method in order to enhance our saliency detection result. Single-scale saliency of HSC : First, we consider a single scale l, given input raw image I, we can obtain a blurred image Ib by 2D-gaussian filter(σ Eq.(1-5), we ° ° Using ° =° 4). calculate amplitude spectrum (°QlI °,°Qlb °) and phase spectrum (φlI , φlb ) of raw image and blurred image in HSV color space, respectively, as follows:

Quaternion is a kind of hypercomplex numbers. Color image pixels have inherently 3-D components. And they can be represented in quaternion form using pure quaternion [4]. A commonly used color space that corresponds more naturally to human perception is the HSV color space, whose three components are hue, saturation and value. Therefore, in this paper, each pixel of the raw image is represented by hypercomplex numbers(quaternion)consisting of HSV three color components, which do not consider color opponentcomponent(RG or BY) and intensity, in diffierence to[7]. Thus a hypercomplex number HSV image q(x,y) is defined as follows: 2

(3)

We define each part of the forward and inverse hypercomplex Fourier Transform of Eq.(3)as follows Eq.(4): PM −1 PN −1 −j2π((xu/N )+(yv/M ) 1 e fi (x, y) Fi [u, v] = √M N P y=0 P x=0 M −1 N −1 j2π((xu/N )+(yv/M ) 1 fi [x, y] = √M e F i (u, v) v=0 u=0 N (4) where (x,y) is the spatial location of each pixel, (u,v ) is the frequency domain. M and N are the image’s height and width. Furthermore, using above equations Eq.(1-4), we completed the transform from q to Q in hypercomplex frequency domain. Q is also defined in ploar form as follows:

Figure 1: Overview of the HSC saliency detection framework.

2.1

(2)

° ° jφl QlI = °QlI ° e I ° ° l Qlb = °Qlb ° ejφb

(6)

Then, our amplitude spectral contrast CQlam and phase spectral contrast CQlph are obtained by Eq.(7) ° ° l CQlph = °QlI ° ejφI , kQl k l CQlam = °° lI °° ejφI

° l° °QI ° = 1

°Q ° b

CQl =CQlam +CQlph ° °±° ° l ={°QlI ° °Ql °+1}ejφI

(1)

(7)

b

l

2

where CQ is the total of hypercomplex spectral contrast. Inspired by[8], the blurred image has average spectrum en-

wherei, j, k satisfies i = j = k = −1, i⊥j, j⊥k, i⊥k, k = ij.

1158

ergy in hypercomplex frequency. Thus, the amplitude spectral contrast CQlam could represent the salient energy in hypercomplex frequency domain. The phase spectral contrast CQlph could represent the salient structure information in hypercomplex frequency domain. According [8], the final hypercomplex spectral contrastCQlf inal can be expressed as follows: ° l° °QI ° jφlI ° CQlf inal = (log( ° °Ql ° +1))e b

(8)

Hence, we use Eq.(4) to obtain the reconstruction of CQlf inal as cq lI , represented as follows: cq lI =a + bi + cj + dk

(9)

l

Finally, our HSC saliency map S at scale l is obtain by Eq.(10) ° °2 ° ° S l =fgaussian ∗ °cqIl ° ,

σ = 4.

(10)

Multi-scale saliency of HSC : Similarly as [6], in order to enhance our saliency detection result, using 2D-gaussian pyramid, we can obtain the set of multi-scale blurred images whose scales are l = {1, 0.5, 0.25}. Thus, the average of HSC saliency map at various scales can be obtained as follows: S=

L 1X l S L

Figure 2: Comparison of our method with [7, 8]on psychological patterns,the frist column is the raw image, the second to fourth column are results produced by our method(HSC) and [7, 8], respectively.

(11)

code. All the tests are run at MATLAB 7.0 on Windows XP and are performed on the PC with P4 2.2G CPU and 2G Memroy.

l=1

2.3

Nonuniform Sampling and Saliency Map

Our understanding of nature is often through nonuniform observations in space or time. Usually, humans often browse one natural image from its center. This means that the pixel’s position is important to saliency detection in a image. From these above views, we design a simple method of log center bias weight to simulate log-polar nonuniform sampling transform from starting the image center location. We can calculate the log-center-distance Dl og(x, y) between each pixel(x,y) and the image center. Hence, we obtain the final saliency map as follows: SMf inal (x, y) =S(x, y)/(1 + Dlog (x, y))

3.1 Responses to Psychological Patterns

(12)

In the HSC algorithm, we use multi-scale hypercomplex spectral contrast and log-center-bias to implement the saliency detection. The method is simple and effective of image preprocessing to more digital media applications.

3.

EXPERIMENTAL RESULTS

To evaluate the performance of our method, two comparison experiments on two publicly available datasets of more than 1000 images with correspondence ground-truth [1][8] and five psychological patterns [3]. We compare our approach with four state-of-art saliency detection approaches. They are classic model (IT[9]), recently excellent method(CA[6]) and two being related to our methods(SR[8], PQFT[7]). For the IT,CA,SR algorithm, we used the authors’ provided code on the Web, while for PQFT, we implemente the method in Matlab using qtfm toolbox[4] since we could not have access to the author’s

We test our model on various psychological stimulis which are commonly used to represent pre-attentive visual features and some mixted stimulis [3]. In this experiment, we use five stimulation patterns to test our approach, and our model does not include nonuniform Sampling technology for psychological testing of fairness. As shown in Figure 2, we compare our result with SR and PQFT. In Figure 2, the first and second rows are “color” and “curvature” basic stimulus test results, respectively. Ours and PQFT successfully find stimulus target, but SR finds one error “color”. The third row is a pattern of combination stimulus with “intersection” and “color”, our method is stronger salient stimulus than PQFT and SR. In particular, the last two rows are complex stimulis cases, which are stimulus patterns of combined with “line orientation” and “color”, PQFT and SR almost fail to find them all because they consider amplitude spectral or phase spectral separately. Conversely, in the case, our method has a good performance for comprehensive considering contrast of amplitude and phase spectral in saliency detection model.

3.2 Natural Images In this section, we test our method on salient object detection dataset provided by Achanta[1] and saliecny detection dataset based on frequecy domain provided by Hou[8].The total of two datasets have 1062 images with correspondence ground-truth. For fairly to test our method, we set the saliency map at the resolution of 320*240 in all experiments, then resize to raw size. For better visual effect, a 2D gaussian

1159

filter with σ = 4 is performed on all the results. We evaluate and compare our approach with four exiting methods using qualitative and quantitative performance evaluation, respectively.

Figure 4: The precision-recall bar of HSC and the others methods on two datasets[1,8]

detect salient regions from an image and better responses to psychological patterns. Experimental results show that our method has better performance in comparison with other four state-of-the-art methods on two public static image datasets. For the future work, we are going to extend it to spatiotemporal salient object detection in video semantic analysis.

5. ACKNOWLEDGMENTS

Figure 3: Comparison of our method with [9, 8, 7, 6], the frist column is the raw image, the last column is the hand-labeled image, the second to sixth column are results produced by [9, 8, 7, 6] and our method, respectively.

This work was supported by the 973 Program under Grant No.2010CB327902, and the NSFC Nos.(60875008, 90920301 and 60805044).

6. REFERENCES In qualitative comparison, we show our saliency map and compare to other four state-of-the-art algorithms (IT[9], SR [8], PQFT[7], CA[6]) in Figure 3. It can be seen that our approach is more robust and can detect saliency region which is closer to human hand-labeled images than other models. Although similar as ours SR and PQFT use frequency domain to saliency detection, our method is better performance than the two methods since we consider not only amplitude spectral but also phase spectral for global contrast in an image. Although the saliency map from [6] is very similar to ours, our method average compute cost is 0.2 second to obtain a saliency map while CA needs 60 second average time, under the same computing conditions. In quantitative performance evaluation, we compare our model with the above four methods using a measure introduced in [1]. The results are shown in Figure 4. Average values of precision, recall, and F-Measure are obtained over the same ground-truth datasets. F-Measure is defined as same as [1]: Fβ =

(1+β 2 )Precision × Recall β 2 × Precision + Recall

[2] R. Achanta and S. Susstrunk. Saliency detection for content-aware image resizing. ICIP, 2009. [3] N. Bruce and J. Tsotsos. Saliency based on information maximization. NIPS, 2006. [4] T. Ell and S. Sangwine. Hypercomplex Fourier transforms of color images. IEEE TIP, 16(1):22–35, 2007. [5] Y. Fu, J. Cheng, Z.L. Li, and H.Q. Lu. Saliency cuts: An automatic approach to object segmentation. ICPR, 2008. [6] S. Goferman, L. Zelnik-Manor, and A. Tal. Context-aware saliency detection. CVPR, 2010. [7] C.L. Guo, Q. Ma, and L.M. Zhang. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. CVPR, 2008. [8] X.D. Hou and L.Q. Zhang. Saliency detection: A spectral residual approach. CVPR, 2007. [9] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI, 20(11):1254–1259, 1998. [10] C. Koch and T. Poggio. Predicting the visual world: silence is golden. Nature, 2:9–10, 1999.

(13)

[11] T. Liu, J. Sun, N.N. Zheng, X.O. Xiao, and H.Y. Shum. Learning to detect a salient object. CVPR, 2007.

2

In Eq.(13), β = 0.3 according to [1]. From figure 4, we can see that our method performs better than other methods in human hand-labeled results.

4.

[1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency-tuned salient region detection. CVPR, 2009.

[12] L. Piotrowski and F. Campbell. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception, 11:337-346, 1982. [13] A. Treisman and G. Gelade. A feature-integration theory of attention. Cognitive psychology, 12(1):97–136, 1980.

CONCLUSIONS

This paper proposes a new saliency detection by using comprehensive amplitude and phase spectral contrast in hypercomplex Fourier domain and nonuniform sampling for natural images. The method is able to effective and quickly

[14] J.R. Xue, C. Li, and N.N. Zheng. Proto-object based rate control for JPEG2000: an approach to content-based scalability. IEEE TIP, 20(4):1177–1184, 2011.

1160

Nonparametric Bottom-Up Saliency Detection Using ...

Dec 1, 2011 - Experimental results on two public saliency detection datasets show that our approach performs better than four state-of-the art ... Fourier spectral residual analysis algorithm for static im- age saliency. In this method, amplitude ...

Download PDF

2MB Sizes 2 Downloads 253 Views

Report

Nonparametric Bottom-Up Saliency Detection Using ...

Recommend Documents