1579-Shrink Boost for Selecting Multi-LBP Histogram Features in ...

Viewer
Transcript

Shrink Boost for Selecting Multi-LBP Histogram Features in Object Detection Cher Keng Heng1, Sumio Yokomitsu2, Yuichi Matsumoto2, Hajime Tamura2 1 Panasonic Singapore Laboratories, 2Panasonic System Networks

1. Introduction Object detection in computer vision has many practical applications. These include face detection in surveillance, pedestrian detection in automotive safety and even cat, dog, or bird detection in camera auto-focus for pet photography. To handle variety of objects, high discriminative histogram features like HOG [1, 2], COHOG [3], LBP [4, 5] are used in a general object detection system. Histogram features are high dimensional and feature selection is required to keep classifier small for fast speed and low storage memory usage. Current feature selection methods however cannot handle histogram features which are sparse. For “filter” methods which select features solely based on statistical feature properties (e.g. class correlation [6], dominant occurrence [7]), sparsity give unreliable feature statistics when train samples are limited. Similarly,

High pass

Gray

Cb

Feature selection from sparse and high dimension features using conventional greedy based boosting gives classifiers of poor generalization. We propose a novel “shrink boost” method to address this problem. It solves a sparse regularization problem with two iterative steps. First, a “boosting” step uses weighted training samples to learn a full high dimensional classifier on all features. This avoids over fitting to few features and improves generalization. Next, a “shrinkage” step shrinks least discriminative classifier dimension to zero to remove the redundant features. In our object detection system, we use “shrink boost” to select sparse features from histograms of local binary pattern (LBP) of multiple quantization and image channels to learn classifier of additive lookup tables (LUT). Our evaluation shows that our classifier has much better generalization than those from greedy based boosting and those from SVM methods, even under limited number of train samples. On public dataset of human detection and pedestrian detection, we achieve better performance than state of the arts. On our more challenging dataset of bird detection, we show promising results.

Cr

Abstract

Low pass

[email protected], {yokomitsu.sumio, matsumoto.yuichi, tamura.hajime}@jp.panasonic.com

Multiple channels Multiple LBP quantizations

Concatenated histograms

High dimension sparse feature

boosting

shrink boost

shrinkage

Selected features and trained classifiers

Figure 1: Overview of our system

for “wrapper” methods which use trained classifier to select feature based on their classification error, sparsity causes over-fitting. Conventional boosting [8] based on greedy forward selection is an example. Lastly “embedded” methods select features by enforcing sparsity during training of the full dimension classifier. Such selection methods [9] are seldom used in object detection problem. One reason is the efficiency in training when the dimension is high. Another reason is such methods are usually limited to learning linear classifier, whereas it is desired to use non-linear classifiers (e.g. RBF, intersection kernel) for better performance with that histogram features [1, 2]. To overcome the limitations of previous feature selection methods, we propose a novel method known as “shrink boost” in this paper. Shrink boost is an embedded method. It uses proximity solver to minimize exponential classification loss in a sparsity enforcing regularization setup. It consists of two iterative steps. First, a “boosting” step performs gradient descend on exponential loss and uses weighted training samples to learn a full high dimensional classifier on all features. This avoids over fitting to few features and improves generalization. Next, “shrinkage step” shrinks the least discriminative classifier dimension to zero to remove redundant features. For training non-linear classifier at high dimension, we apply “explicit feature mapping” for intersection kernel used in [2] for fast training.

In this paper, we use shrink boost for selecting sparse features from histograms of “multi-LBP”. These are LBP of multiple quantizations and image channels [11]. Our system overview is shown in Figure 1. Specifically, our main contribution is a feature selection for sparse histogram features based on iterative boosting. To our knowledge, this is the first paper that applies boosting to sparsity regularized minimization problem for embedded feature selection method in object detection. The remainder of this paper is organized as follows. We begin with a review of related work in Section 2. We describe the formulation of shrink boost in Section 3. Finally, we perform detailed experimental evaluation in Section 4 and conclude in Section 5.

2. Related work Feature selection can be broadly divided into filter, wrapper and embedding methods, see [12]. We describe how previous work uses these methods in classification problem, especially for LPB histogram features.

LBP. These include using PCA, LDA, locality preserving projection and partial least squares. We refer such methods as “feature reduction” and not “feature selection”.

3. Shrink boost for feature selection In this section, we describe our solution “shrink boost” for histogram feature selection and learning a reduced set of additive LUTs in the classifier. We explain how our formulated learning objective and solution ensures good classifier generalization even when the features are sparse.

3.1. Problem statement Let (xm, ym) denote a set of M training samples. Vector x ∈ ℤ D denotes the input high dimensional histogram feature and y ∈ {-1, +1} denotes its target class label. The integer element xd represents the histogram voting count and has a restricted range of 0≤xd≤N. Our classifier F is modeled by a set of additive lookup tables LUTd indexed by each element xd : D

Filter methods: Filter method selects discriminative LBP patterns as a preprocessing step using the feature properties. The classifier is next trained independently from the selected patterns. Smith and Windeatt [6] used fast correlation based filtering FCBF, with symmetric information gain to select LBP patterns most correlated to the target class. Guo et al. [13] used fisher separation criterion in FSC-LBP. Lao et al. [7] used most frequently occurred patterns for dominant-LBP. Maturana et al. [14] used decision tree algorithm to learn tree-quantized LBP patterns known as DT-LBP. Wrapper methods: In wrapper method, classifier performance is used to select discriminative LBP patterns. The classifier is hence learned together with the features. Boosting [8] is a popular wrapper method based on sequential forward greedy selection. Boosting is used for selecting optimum LBP settings (e.g. neighbor topology, block region). Trefny and Matas [20] used boosting for selecting the most discriminative bins of LBP histogram. Embedded methods: Embedded method performs selection in the process of training which solves a regularized minimization problem. This results in sparse classifier which uses reduced features. Embedded methods are seldom used in feature selection for LBP. However, they have been used in L1-norm minimization learning [21] and group lasso [9] for HOG features, which is of lower dimension than LBP. Other related methods: There is another broad class of methods that uses projection to reduce high dimension of

F (x) = ∑ LUTd [ xd ] .

(1)

d =1

Our learning goal is to minimize the objective value: F = argmin { L( F ) + R( F )} . (2) F

L() is the classification loss and R() is the regularisation for feature selection. Classification model: We first decide what classifier model to use, i.e. what information the LUT encodes. For this, we use histogram intersection explicit mapping by Maji et al. [2] which maps histogram intersection operation into linear dot product. Since F(x) is essential nonlinear, the feature map can help to linearize our problem for fast practical learning. Moreover, histogram intersection is also a robust similarity measure for histogram features which makes training linear classifier in the mapped space discriminative. Let Π denotes the “intersection mapping”. Vector x is then mapped into matrix Z = Π (x) , where Z ∈ ℝ DxN and:

1 if xd ≥ n zdn =  . (3) 0 if xd < n Similarly, the classifier F is mapped into matrix L = Π ( F ) , where L ∈ ℝ DxN and: if n = 1  LUTd [ n] . ldn =  LUT [ n ] − LUT [ n − 1] if n >1  d d

(4)

Using “ • ” to denote the matrix Frobenius inner product, we see that Z1 • Z 2 represents the histogram intersection of vectors x1, x2 and L • Z represents F(x):

D

N

Intersect (x1 , x 2 ) = ∑∑ z1, dn z2, dn = Z1 • Z 2 .

(5)

Algorithm 1: “Shrink boost” for learning reduced additive LUT classifier

(6)

Inputs: Training samples (xm, ym) for 1≤m≤M. Target sparsity γ. Output: classifier LUTd[n] for 1≤d≤D, 1≤n≤N.

d =1 n =1

D

N

F (x) = ∑∑ ldn zdn = L • Z . d =1 n =1

Using Eq. (4) and (5), we can express the learning objective of Eq. (2) terms of L. Learning Objective: We explain the choice of L() and R(). For classification loss, we use boosting exponential loss because of its good differentiability and large margin property. To remove a feature xd from the LBP histogram, we require LUTd[n]=0, or equivalently ldn=0, for all 0≤n≤N. This can be achieved by enforcing group sparsity using L1/L2mixed-norm in the regularisation of L. Hence the final learning objective is: L = argmin { L(L) + R (L)} L

M  (7) = argmin ∑ exp(− ym L • Z m ) + L 1,2  . L  m =1  We highlight here that by using explicit mapping, the dimensionality of the problem has exploded from D in x to DN in Z, which is inefficient to solve by conventional regularization methods. For example, our experiments use histogram features with dimension in the range of D=200,000 to 400,000, see Section 4.

To minimize the sum of two convex functions in Eq. (7), we use the iterative proximity method “Forward Backward Splitting (FOBOS)” [22] with two simple iterative steps:

2)

Unconstrained gradient descent step: Lt+1/2 = Lt − λ t ∆Lt , where ∆L = ∂L(L) / ∂L . Regularization step: 2 1  Lt+1 = argmin  Lt − Lt+1/2 + λt R(L)  . L 2 

Initialize and LUTd[n]=0. Set iteration number t=0. While training error is not optimum: • Compute sample weights D

wm = exp (-ym ∑ LUTdt [ xm, d ]) . d =1

• Gradient descent step: Compute gradient ∆LUTdt [n] using “dual coordinate descend” with random variables, see Algorithm 2. • Update noisy classifier: LUTdt +1/ 2 [n]=LUTdt [n] − ∆LUTdt [n] . • Regularization step: Shrink noisy classifier LUTdt +1/ 2 [n] to target γ-sparse classifier LUTdt +1[ n] using “mixed norm L1/L2 shrinkage”, see Algorithm 3. • Increment t=t+1 Figure 2: Shrink boost algorithm

3.2. Solution

1)

• • •

(8)

(9)

t is the iteration number. λ is the learning rate and we use λ =1. We called the resulting solution “shrink boost”, which is shown in Figure 2. Our method always learns a full high dimensional classifier on all features, to avoid over fitting to few features, and shrinks the classifier later. In particular, the gradient descent step of our solution involves solving of large scale regularized least square problem in the expanded dimension after explicit feature mapping. We take advantage that linear operations in the expanded mapped dimension can be efficiently perform in the original dimension and derive an approximate least square solution based on SVM-like dual formulation.

Gradient descent step: The gradient descent step is the boosting step. Similar to gentle boost, Taylor expansion is applied to the exponential loss function in Eq. (7) to form the regularized weighted least square (RWLS) problem for gradient ∆L in Eq. (8). Writing wm = exp(− ym L • Z m ) , we have: M 1  ∆L = argmin  || ∆L ||2 +∑ wm (∆L • Z m − ym ) 2  . (10) 2 ∆L m =1   Regularization || ∆L ||2 is used to keep ∆L small enough for valid Taylor expansion. To solve the very large scale RWLS in Eq. (10), we adopt dual coordinate descend with random selection of variables, similar to paper [2]. Rewriting Eq. (10) into its primal and dual from, we have: M 1 P (∆L, ξ m ) = || ∆L ||2 + ∑ wmξ m2 2 Primal: , m =1 s.t. ym − ∆L • Z m = ξ m .

(11) 2 m

1 1 α − ∑αm ym . (12) ∑∑αmαnZmZm' + 2 ∑ 2 m'=1 m=1 m=1 2wm m=1 Using Newton’s method, Eq. (12) be solved by iteratively by coordinate descend of random selected dual variables α . To take advantage of the linear properties for intersection, we compute the first and second derivatives efficiently in M

Dual: D(αm ) =

M

M

M

the x space, instead of mapped Z space:

D '(α i ) = ∆L • Zi − yi +

Algorithm 2: Dual coordinate descend of random variables for learning approximate weighted least square with intersection mapping

αi 2wi

D

αi

d =1

2wi

= ∑ ∆LUTd [ xid ] − yi + D "(α i ) = Zi • Zi +

. (13)

1 2 wi

D

1 = ∑ xi , d + . 2 wi d =1 (14) Then, Newton’s update of the dual solution and the classifier L are: D '(α i ) , αi = αi − D "(α i ) (15)

Inputs: Weighted train samples (wm, xm, ym) for 1≤m≤M. Output: Classifier ∆LUTd [n] for 1≤d≤D, 1≤n≤N. • • •

• Chose a random coordinate i under the distribution {wm}. • Compute first and second derivatives D α D '(α i ) = ∑ ∆LUTd [ xid ] − yi + i 2wi d =1

M

∆L = ∑αmZm ,

D

D "(α i ) = ∑ xi , d +

m=1

i.e. ∆ldn

Initialize all ∆LUTd [n] =0. Initialize all dual variables αm = 0. While αm are not optimal,

D '(α i ) = ∆ldn − zi , dn . D "(α i )

d =1

1 . 2 wi

• Update all lookup tables by Newton’s method: D '(α i ) ∆LUTd [n] = LUTd [n] − MIN (n, xid ) D "(α i )

(16) If we define ∆LUT = ∏−1 (∆L) in Eq.(16), we can update F directly: D '(α i ) ∆LUTd [n] = LUTd [ n] − MIN ( n, xid ) (17) D "(α i ) Finally, the algorithm overview is shown in Figure 3.

Figure 3: Dual coordinate descend of random variables algorithm

Regularization step: The regularization step is the shrinking step. From [22], the solution of Eq.(8) is given as:   γ  t +1/ 2 Ltd+1 = 1 − Ld . (18) 2  Ltd+1/ 2  +   Ld is the d-th row of matrix L and L d = [ld 1 ... ldN ] .

[ ]+ is

the shrinkage operator. γ is term to control the

sparsity of the solution, which is an input training parameter. The shrinking algorithm is shown in Figure 4 below.

Algorithm 3: Mixed norm L1/L2 shrinkage with intersection mapping

Inputs: Noisy classifier LUTd[n] for 1≤d≤D, 1≤n≤N Target sparsity γ ɶ [ n] Output: sparse classifier LUT d •

Compute all values ldn using Eq. (4).

•

Compute L d

We explain our object detection system used in our experiments.

•

Sort L d

4.1. Multi LBP histogram feature setup

•

T = (1- γ)-th value. Shrink the values

4. Experiments and results

N

= ∑ ldn2 . n =1

2

in ascending order. Set shrinkage threshold

 ɶ = 1 − T L d  Ld 

Channel setup: Let RGB(x,y) denotes the input color image. In our experiment, the color image is transformed into 5 channels using linear and nonlinear filtering. •

Gray and color channels: The input RGB image is converted in three YCbCr channels in the standard way. For Cb and Cr, we applied quantization of step=4 to avoid effects of color noise in the image.

2

2

  L . d  + 

Compute the sparse classifier by inverse mapping ɶ = ∏−1 (L ɶ): LUT n

ɶ [n] = ∑ lɶ LUT d di i =1

Figure 4: Mixed norm L1/L2 shrinkage algorithm

Low-pass filtered channel: A low-pass 2x2 average box filer is applied to the Y gray channel. The filtered image captures low frequency image structural characteristics. High-pass filtered channel: A high-pass 3x3 Laplacian filter is applied to the Y gray channel. We use absolute value of the filtered output. The filtered image captures high frequency image structural characteristics, like edges.

Quantization Setup: For LBP quantization, we use the neighbor topology in a 3x3 local region shown in Figure 5. Let p, q be a pair of pixels selected from neighbors {n1, … , n9}. Then, LBP pattern value is computed by comparison of eight sets of pixel pairs pi, qi: 8  1if z is true . LBP = ∑2i −1δ ( pi > qi ) , δ ( z) =  (19) i =1 0 if z is false In total, we use up to 4 different LBP quantizations of different pair sets. They capture different local image channel characteristics. Each quantization has 28=256 discrete LBP values.

LBP-1: This is the original LBP [20] with center-surround quantization, i.e. comparing the center pixel with the 8-neigbhours. LBP-2: This quantization is used in [20]. It compliments the center-surround quantization above by encoding the relationship between the 8-neigbhours at inter-neighbor distance of 1. LBP-3: We introduce a new quantization to encode the relationship between the 8-neigbhours at inter-neighbor distance 2 and 2√2. Note that CS-LBP [20] quantization uses a subset of our pixel pairs. LBP-4: We introduce yet another new quantization to encode the relationship between the 8-neigbhours at inter-neighbor distance 2 and 2√2 Schemes

8 pixel pairs

LBP-1 n2 n3 n4 n9 n1 n5 n8 n7 n6

LBP-2

3x3 neighbor topology LBP-3

LBP-4

Figure 5: LBP quantization schemes

Histogram feature setup: We use scan window based object detection system. Each window is divided into non-overlapping B blocks. For each block, C channels are computed as described in Section 4.1. For each channel, H LBP histograms are computed using the quantizations in Section.4.1.2. All the different LBP histograms are concatenated into one long feature vector. Since each LBP histogram has length 256, the dimension of our final multi-LBP histogram feature vector is DIM=B*C*H*256. Unlike other previous works in [1, 4], we do not apply histogram normalization. We find that that by using multiple LBP, the feature representation is robust against image noise or various parameter setting (including choice of histogram normalization scheme).

4.2. Experiments We evaluate the performance of shrink boost for learning reduced LUT classifiers for multi LBP histogram features. We use a total of 3 dataset for human, pedestrian and bird detection. Figure 6 shows the datasets and the multi-LBP histogram feature characteristics, including their dimension and average sparsity. Sparsity of histogram feature vector x is defined as the count of non-zero features, i.e. norm-0: Sparsity(x) = x 0 . (20) From Figure 6, we verify that multi LBP histogram features are indeed high dimensional and sparse. Next we explain the experimental objectives and results for different dataset.

4.3. INRIA human detection Experiment objectives: INRIA dataset [1] is a well established dataset for human detection. For this dataset, our first objective is to compare generalization performance classifier trained by our proposed shrink boost against those from conventional greedy selection boosting [8, 16] and SVM-intersection kernel [2]. For shrink boost we try different target train sparsity from 100% (no shrinking), 50%, 10%, 5% and 1%. For greedy selection boosting, we use it to select 10%, 5% and 1% of the features. All classifiers are trained with the same number of train samples: 2416 +ve samples and 1,2180 -ve samples, randomly bootstrapped from 1218 background train images. For testing, we used 1132+ve samples provided and 10,000 -ve samples, randomly bootstrapped from 453 background test images. Our next objective is to compare our shrink boost performance with other state of art results. In this case, we use the standard bootstrapping training process used by SVM [1, 2, 3]. An initial classifier is first learned from the initial samples. False positives from the background train images of this initial classifier are added as new –ve train samples to learn the final classifier. Note that this approach

Dataset 1. INRIA Human[1] 2. DaimlerChrysler Pedestrian [10] 3. Bird

Train 2416 +ve samples 1218 –ve images 3 set of: 4800 +ve samples 5000 –ve images 5000 +ve samples 100000 –ve samples

Test

Scan window size (WxH)

1132 +ve samples 453 –ve images 3 set of: 4800 +ve samples 5000 –ve images 2500 +ve samples 10000 –ve samples

Available LBP Features* C Q B Dim Sparsity

48x112

5

4

84

430,080

12.3%

16x38

3

4

72

230,400

5.0%

40x40

5

4

64

327,680

10.6%

* C=no. of channels, Q=no. of quantizations, B=no. of blocks, DIM=CxQxBx256

1. INRIA human +ve

-ve

-ve

3. Bird 40

+ve

-ve

112

40

48

2. DC pedestrian +ve 18

36

5x5 block

8x8 block block

Mean edge and color images

3x3 block block Figure 6: Evaluation data and multi LBP histogram feature characteristics

uses significantly less train samples then conventional greedy selection based boosting. For testing, we follow the test protocol used in [1]. For multi LBP histogram features, we use all available channels and quantization in Section.4.1 for both comparisons described above. For block division, we use 84 8x8 non-overlapping blocks in the inner 48x112 region of the given 64x128 scan window, to prevent boundary artifacts as observed by authors [18]. The final dimension of the concatenated histogram vector is 430,080 and the average sparsity or percentage of non-zero vector elements is about 12.3%.

Results and analysis: Figure 7(a) shows the generalized performance of our training method and conventional method. We compare hit rate at FPR=1e-4. First we note that when train sparsity parameter is 100% (i.e. there is no shrinking and shrink boost becomes gentle boosting at full dimension.) shrink boost achieves hit rate of 96.0%, about 1.5% better than SVM. Performance generally degrades as features are reduced. At 10% sparsity, hit rate only drops by 1% to 95%. At 1% sparsity, hit rate drops to 85%. Greedy selection boosting failed to give good results, e.g. its best hit rate is 5% at FPR is 3e-3 with 10% selected features. This is not surprising because of the high sparsity of the features and limited train samples cause greedy methods to overfit. Figure 7(b) shows our results compared to the state of the arts. We show that just one bootstrapping round is sufficient

for shrink boost. Our final classifier uses train sparsity of 10%, or enforcing the classifier to use 43,008 features, the hit rate is 1% less than using full features. It achieves 95.4% at FPR=1e-5. This is better than the current best results of 94.8% with LBP-HOG [4] and 94.2% with co-occurrence features [17] in previous work as shown in Figure 7(c).

4.4. DaimlerChrysler (DC) pedestrian detection Experiment objectives: DaimlerChrysler (DC) dataset [10] is a dataset for pedestrian detection. The image samples are of small size 18x36 and grayscale. The sparsity of multi LBP histogram features from DC set is only 5% and dimension is 230,400. Both are smaller than those of INRIA set due to smaller block size of 3x3, and lower block count. For this dataset, the objective is to investigate selective power of shrink boost, i.e. if the number of features for selection increases and becomes noisy or irrelevant, will the performance of shrink boost dropped? In this evaluation, we set train sparsity = 30,000/dimension, i.e. that the number of selected features is always 30,000. We measured different classifier performance for (i) using gray Y channel only and increasing the number of LBP quantizations from 1 to 4 and (ii) using 4 LBP quantizations and increasing the number of channels from 1 to 3 (gray Y and the low- and high-pass filtered channels). Finally, we also compared our results with the state of art. In all experiments, we follow evaluation protocols from [10] and show the average ROC results.

Hit Rate

Hit Rate

1.00

1.00

Hit rate@ FPR=1e-5

Methods

0.95

0.98

Multi-LBP + shrink boost, proposed #1 (feature=100%*)

96.1%

0.96

Multi-LBP + shrink boost, proposed #2 (feature=10%)

95.4%

0.94

Co-occurrence + PLS, SCHWARTZ [17]

94.2%

0.92

LBP-HOG + SVM-linear, WANG [4]

94.8%

0.90

COHOG + SVM-linear, T.WANABE [3]

92.0%

JROG + realBoost, HUANG [16]

93.8%

HOG + SVM-Linear, DALALS [1]

80.0%

0.90

1.E-5

1.E-4

1.E-3

1.E-2

0.85 1.E-6 1.E-1

1.E-4

1.E-3

1.E-2

False Positive Rate

False Positive Rate shrink boost (100%*) shrink boost (50%) shrink boost (10%) shrink boost (5%) shrink boost (1%)

1.E-5

SVM-intersect (100%) greedy boost (10%) greedy boost (5%) greedy boost (1%)

proposed #1 proposed #2 SCHWARTZ [17] WANG [4]

T.WANABE [3] HUANG [16] DALALS [1]

* % of features selected

* % of features selected

(b)

(a)

(c)

Figure 7: Evaluation results for INRIA human detection

Results and analysis: Figure 8(a) shows the hit rate at FPR=0.05 obtained from the average ROC for different combination of channels and quantizations. Hit rate increases as more LBP types are added. We note that hit rate stabilizes at 97% when sufficient discriminative features are present. Performance does not degrade even when “more than enough” features are added. This is important for making a general object detection system to detect multiple objects with a shared pool of features of large size. To make classifier for each object, we can apply shrink boost to select most a small set of discriminative features for each. Figure 8(b) shows our results compared to other methods. Our ROC curve is based on “case 4” of Figure 8(a), using one Y channel and four quantizations LBP1+2+3+4. We obtained 97% hit rate compared to 96% using 2nd-Order HOG [15], the current best published results.

Case 1

Channels Y (Gray)

2

Y

3 4

Quantization LBP1

Hit rate@ FPR=0.05 81.6%

LBP1+2

85.7%

Y

LBP1+2+3

89.0%

Y

LBP1+2+3+4

97.0%

5

Y + low

LBP1+2+3+4

97.3%

6

Y + low + high

(a)

Hit Rate 1.00

LBP1+2+3+4 97.2% Feature selected fixed at 30,000

0.95

0.90

4.5. Bird detection Experiment objectives: The last dataset consist of our own collected short-beak bird samples 40x40. There are 5000 +ve train samples and 10,0000 –ve train samples. The bird samples are normalized with its beak location as shown in Figure 6. We use all channels and quantizations for multi LBP histogram features. The block size is 5x5 and histogram feature dimension is 32,7680 with average sparsity of 10.6%. The objective of using this dataset is to evaluate the performance of shrink boost classifiers for more challenging object such as birds of different appearance and colors.

0.85

0.80 0.00

0.02

0.04

0.06 0.08 0.10 False Positive Rate

Multi-LBP + shrink boost, proposed (case 4) 2nd Order HOG + SVM-linear, CAO [15] COHOG + SVM-linear, T.WANABE [3] Haar Wavelet + adaBoost, DOLLAR [19] Multi-scale-HOG + SVM-intersect, MAJI [2]

(b) Figure 8: Evaluation results for DC pedestrian detection

Results and analysis: Figure 9(a) shows the ROC for apply shrink boost at different train sparsity. To trade off between classifier size and accuracy, we select sparsity=10% as our final classifier. It has hit rate of about 90% at FPR 1e-4. For comparison, greedy boost cannot achieve good results at sparsity 10%. Intersection SVM using full features is about the same of shrink boost at sparsity 50%. These results are similar to INRIA pedestrian experiments at Section 4.3. Figure 9(b) shows some detection results after arbitration of the hit windows of the final classifier. The evaluation shows that with multi LBP histogram features and shrink boost, it is possible to detect challenging objects like bird.

5. Conclusion We presented a novel method “shrink boost” for selecting sparse and high dimension histogram features for object detection. By using boosting and shrinkage in the regularization-based learning problem, we are able to train classifier with low dimension and good generalization. Our experiments show that shrink boosted classifier has better generalization better than conventional greedily boosted classifier and SVM classifier, even under limited train samples. Hit Rate 1.00

0.95

0.90 shrink boost (100%*) shrink boost (50%) shrink boost (10%) shrink boost (5%) shrink boost (1%) SVM-intersect (100%) Greedy boost (10%)

* % of features selected 1.E-4

1.E-3

False Positive Rate

1.E-2

0.85

0.80

1.E-1

(a)

(b) Figure 9: Evaluation results for our bird detection

References [1] N. Dalal and B. Triggs. Histogram of oriented gradient for human detection. In CVPR, 2005. [2] S. Maji, A. Berg, and J. Malik. Classification using intersection kernel support vector machines is efficient. In CVPR, 2009. [3] T. Watanabe, S. Ito, and K. Yokoi. Co-occurrence histograms of oriented gradients for pedestrian detection. In PSIVT, 2009. [4] X. Wang, T.X. Han, and S. Yan. An HOG-LBP human detector with partial occlusion handling. In ICCV, 2009. [5] S. Hussain and B. Triggs. Feature sets and dimensionality reduction for visual object detection. In BMVC, 2010. [6] R.S. Smith and T. Windeatt. Facial expression detection using filtered local binary pattern features with ECOC classifiers and platt scaling. In JMLR Workshop on Applications of Pattern Analysis, 2010. [7] S. Liao, M. Law, and C.S. Chung. Dominant local binary patterns for texture classification. IEEE Trans. on Image Processing, 2009. [8] C. Huang, B. Wu, H. Ai, and S. Lao. Omni-directional face detection based on real adaboost. In ICIP, 2004. [9] L. Zini and F. Odone. Efficient pedestrian detection with group lasso. In ICCV Workshop IEEE International Workshop on Visual Surveillance, 2011. [10] S. Munder and D. M. Gavrila. An experimental study on pedestrian classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2006. [11] P. Dollár, Z. Tu, P. Perona and S. Belongie. Integral channel Features. In BMVC, 2009. [12] I. Guyon and A. Elisseeff. An introduction to variable and feature Selection. JMLR, 3, 2003 [13] Z. Guo, L. Zhang and D. Zhang. Completed modeling of local binary pattern operator for texture classification. IEEE Trans. on Image Processing, 2010. [14] D. Maturana, A. Soto, and D. Mery. Face recognition with decision tree-based local binary patterns. In ACCV 2010. [15] H. Cao, K. Yamaguchi, T. Naito and Y. Ninomiya. Pedestrian recognition using second-order HOG Feature. In ACCV, 2009. [16] C. Huang and R. Nevatia. High performance object detection by collaborative learning of joint ranking of granule features. In CVPR, 2010. [17] W.R. Schwartz, A. Kembhavi, D. Harwood and S.L. Davis. Human detection using partial least squares analysis. In ICCV, 2009. [18] P. Dollár, C. Wojek, B. Schiele and P. Perona. Pedestrian detection: a benchmark. In CVPR, 2009. [19] P. Dollar, Z. Tu, H. Tao and S. Belongie. Feature mining for image classification. In CVPR, 2007. [20] J. Trefny and J. Matas. Extended set of local binary patterns for rapid object detection. In CVWW, 2010. [21] R. Xu, B. Zhang, Q. Ye, and J. Jiao. Cascaded L1-norm minimization learning (CLML) classifier for human detection. In CVPR, 2010. [22] J. Duchi and Y. Singer. Efficient learning using forward-backward splitting. In NIPS, 2009.

1579-Shrink Boost for Selecting Multi-LBP Histogram Features in ...

... derivatives efficiently in. Figure 2: Shrink boost algorithm. Page 3 of 8. 1579-Shrink Boost for Selecting Multi-LBP Histogram Features in Object Detection.pdf.

Download PDF

725KB Sizes 2 Downloads 200 Views

Report

1579-Shrink Boost for Selecting Multi-LBP Histogram Features in ...

Recommend Documents