ROBUST IMAGE FEATURE DESCRIPTION, MATCHING AND APPLICATIONS A Thesis Submitted In Partial Fulfillment of the Requirements for the Award of Degree of

DOCTOR OF PHILOSOPHY

Submitted by Shiv Ram Dubey Under the Supervision of Dr. Satish Kumar Singh & Dr. Rajat Kumar Singh to the

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING

INDIAN INSTITUTE OF INFORMATION TECHNOLOGY, DEVGHAT, JHALWA, ALLAHABAD-211012 (U.P.)-INDIA

June, 2016

INDIAN INSTITUTE OF INFORMATION TECHNOLOGY

ALLAHABAD (A Centre of Excellence in Information Technology Established by Govt. of India)

CERTIFICATE

It is certified that the work contained in the thesis titled “Robust Image Feature Description, Matching and Applications,” by “Shiv Ram Dubey,” has been carried out under our supervision and that this work has not been submitted elsewhere for a degree.

Dr. Satish Kumar Singh (Supervisor) Assistant Professor Indian Institute of Information Technology, Devghat, Jhalwa, Allahabad-211012

Dr. Rajat Kumar Singh (Supervisor) Associate Professor Indian Institute of Information Technology, Devghat, Jhalwa, Allahabad-211012

June 21, 2016

iii

INDIAN INSTITUTE OF INFORMATION TECHNOLOGY

ALLAHABAD (A Centre of Excellence in Information Technology Established by Govt. of India)

CANDIDATE DECLARATION

I, Shiv Ram Dubey, Roll No. RS136 certify that this thesis work entitled “Robust Image Feature Description, Matching and Applications” is submitted by me in partial fulfillment of the requirement of the Degree of Ph.D. in Department of Electronics and Communication Engineering, Indian Institute of Information Technology, Allahabad.

I understand that plagiarism includes: 1. Reproducing someone else's work (fully or partially) or ideas and claiming it as one's own. 2. Reproducing someone else's work (Verbatim copying or paraphrasing) without crediting. 3. Committing literary theft (copying some unique literary construct). I have given due credit to the original authors/ sources through proper citation for all the words, ideas, diagrams, graphics, computer programs, experiments, results, websites, that are not my original contribution. I have used quotation marks to identify verbatim sentences and given credit to the original authors/sources. I affirm that no portion of my work is plagiarized. In the event of a complaint of plagiarism, I shall be fully responsible. I understand that my Supervisor may not be in a position to verify that this work is not plagiarized.

Name: Shiv Ram Dubey Enrolment No: RS136 Indian Institute of Information Technology, Devghat, Jhalwa, Allahabad-211012, U.P., India

v

Date: June 21, 2016

Abstract Most applications of computer vision such as image correspondence, image retrieval, object recognition, texture recognition, face and facial expression recognition, 3D reconstruction, etc. required matching of two images. In order to match two images, or in other words to find out the similarity/dissimilarity between two images, some description of image is required because the matching of raw intensity values of two images will be more time consuming and it will be affected by any small variations in its inherent properties such as brightness, orientation, scaling, etc. Thus, the images can be matched with its description derived from the basic properties of the image such as color, texture, shape, etc. This description is called as the feature descriptor/signature of the image. The main objectives of any descriptor are 1) to capture the discriminative information of the image, 2) to provide the invariance towards the geometric and photometric changes, and 3) to reduce the dimension of the feature to be matched. The main focus of the thesis is to construct the image descriptors with discriminative power, robustness against image variations and low dimensionality. We have proposed an interleaved intensity order based local descriptor (IOLD) for region based image matching under various geometric and photometric transformation conditions. We have also proposed four grayscale local image descriptors namely local diagonal extrema pattern (LDEP), local bit-plane decoded pattern (LBDP), local bit-plane dissimilarity pattern (LBDISP), and local wavelet pattern (LWP) for biomedical image retrieval in MRI and CT databases. We have reported four color based local descriptors namely local color occurrence descriptor (LCOD), rotation and scale robust hybrid descriptor (RSHD), multichannel adder based local binary pattern (maLBP), and multichannel decoder based local binary pattern (mdLBP) for natural and texture image retrieval. An illumination compensation mechanism has been reported as a preferred pre-processing step. Bag-of-filters and SVD based approaches have been proposed for boosting the performance of descriptors.

vii

Dedicated to my teachers and family

Acknowledgements Gratitude is first due to all my teachers who have guided me throughout my life, for without them, it would not have been possible to reach this milestone in my life. Specifically, I would like to thank my supervisors, Dr. Satish Kumar Singh & Dr. Rajat Kumar Singh, for their guidance and advice during my PhD Research at the Indian Institute of Information Technology Allahabad. They gave me excellent advice, especially in guiding my research efforts to the right problems. Their top level guidance has made me to enrich my knowledge in the field of Image and Vision Processing. I am very grateful to our Hon’ble Director Prof. S. Biswas for providing best possible research facilities. I would like to extend my sincere thanks to the SPGC and DPGC chairs. I like to thank Dr. Anurag Mittal, with whom I worked for a short period of time at Indian Institute of Technology Madras. I would also like to thank Prof. Anand Singh Jalal, who guided me during my studies at GLA University Mathura. Thanks are due to many friends and colleagues in the institute with whom I have had many fruitful discussions and from whom I have learnt a lot: Vijay Bhaskar Semwal, Soumendu Chakraborty and Sumit Kumar among others. Last, but not the least, this dissertation would not have been possible without the understanding and support of my family. My grandparents and parents were also very accommodating in allowing me to pursue higher studies. My wife and daughters have shown great patience during this time and has been a constant source of love and inspiration in my life. At last I want to present my extraordinary thank to almighty God. My singular thank to all the people who always motivated and supported directly or indirectly during my PhD work.

xi

Table of Contents Certificate

iii

Abstract

vii

Acknowledgments

xi

List of Tables

xvii

List of Figures

xix

List of Acronyms 1

2

3

xxvii

Introduction

1

1.1 Motivation

1

1.2 Image Feature Descriptors

3

1.2.1 Where to Compute the Descriptors?

4

1.2.2 How to Compute the Descriptors?

4

1.2.3 How to Compare the Descriptors?

5

1.3 CBIR: An Application of Image Descriptors

6

1.4 Objective of the Thesis

7

1.5 Outline of the Thesis

7

Literature Review

9

2.1 Region Based Local Descriptor

9

2.2 Local Gray Scale Descriptors

11

2.3 Color Image Descriptors

12

2.4 Brightness Robust Image Descriptors

15

2.5 Boosting Performance of Image Descriptors

16

2.6 Problem Formulation

19

2.7 Summary

20

Interleaved Intensity Order Based Local Descriptor

21

3.1 Proposed Descriptor Construction

21

3.1.1 Pre-processing, Feature detection and Normalization

22

Table of Contents

3.1.2 Rotation Invariant Local Features

22

3.1.3 Local Neighbor Partitioning Into Interleaved Sets

23

3.1.4 Computing Multiple Local Intensity Order Patterns

26

3.1.5 Descriptor Construction

27

3.2 Experiments and Results

28

3.2.1 Evaluation Criteria

29

3.2.2 Performance Evaluation Over Oxford Dataset

29

3.2.3 Performance Evaluation Over Complex Illumination Dataset

33

3.2.4 Performance Evaluation Over Large Image Matching Dataset

34

3.3 Observations and Discussions

4

34

3.3.1 Effect of Drastic Illumination Change Over Descriptor

36

3.3.2 Effect of Noise Over Descriptor

37

3.3.3 Matching Time Analysis using Number of Matched Key-points

37

3.4 Summary

38

Local Gray Scale Image Descriptors for Biomedical Image Retrieval

41

4.1 Local Gray Scale Image Descriptors for Biomedical Image Retrieval

42

4.1.1 Local Diagonal Extrema Pattern (LDEP)

42

4.1.2 Local Bit-plane Decoded Pattern (LBDP)

44

4.1.3 Local Bit-plane Dissimilarity Pattern (LBDISP)

49

4.1.4 Local Wavelet Pattern (LWP)

52

4.2 Similarity Measurement and Evaluation Criteria

4.3

5

56

4.2.1 Similarity Measurement

56

4.2.2 Evaluation Criteria

56

Results and Discussion

56

4.3.1 Biomedical Databases Used

57

4.3.2 Experimental Results

61

4.4 Summary

66

Local Color Image Descriptors for Natural and Textural Image Retrieval

67

5.1 Local Color Image Descriptors and Retrieval

68

5.1.1 Local Color Occurrence Descriptor (LCOD)

68

5.1.2 Rotation and Scale Invariant Hybrid Descriptor (RSHD)

71

5.1.3 Multichannel Decoded Local Binary Pattern

74

xiv

Table of Contents

5.2

6

78

5.2.1 Databases

79

5.2.2 Effect of Distance Measures

80

5.2.3 Experimental Results

80

5.2.4 Analyzing Robustness of the Descriptors

85

5.3 Summary

85

Brightness Invariant Image Retrieval Using Illumination Compensation

87

6.1

Illumination Compensation Mechanism

87

6.1.1 Color Intensity Reduction

88

6.1.2 Contrast Mapping

90

Brightness Invariant Image Retrieval

91

6.2

6.3 Similarity Measures and Evaluation Criteria

91

6.4

Experiments and Results

93

6.4.1 Datasets

93

6.4.2 Experimental Results

93

6.5

7

Results and Discussions

Comparison and Analysis

100

6.5.1 Comparison with Existing Illumination Compensation Methods

101

6.5.2 Performance Evaluation using Illumination Invariant Descriptors

101

6.5.3 Performance Evaluation using the Descriptors of Chapter 5

101

6.6 Summary

102

Boosting Local Descriptors with BoF and SVD

103

7.1 Natural and Textural Image Retrieval using BoF-LBP

104

7.1.1 Methodology

104

7.1.2 Experiments, Results and Discussions

106

7.2 NIR Face Image Retrieval using SVD and Local Descriptors

8

109

7.2.1 Methodology

109

7.2.2 Experiments, Results and Discussions

112

7.3 Summary

116

Conclusions and Future Directions

119

8.1 Conclusions

119

8.2

120

Future Scopes

xv

Table of Contents

References

123

Publications

131

xvi

List of Tables 2.1

Analysis of some important reviewed descriptors in terms of its properties

18

3.1

Matching time reduced by IOLD1125 over LIOP1116 in % over each category

31

of the Oxford dataset 75

5.1

Truth Table of Adder and Decoder map with 3 input channels

7.1

Image databases summary

107

7.2

ARP values using BoF-LBP when number of top matches are 10

107

7.3

Performance comparison of

-

with

ARP values over each database when

xvii

in terms of the 108

List of Figures 1.1

Comparing pixels of two regions (images are taken from Corel-database [6]).

1.2

Comparing using descriptor function (images are taken from [7] and the

2 3

patterns are computed using our RSHD descriptor explained in Chapter 5). 1.3

Descriptor construction from interest regions of an image [5].

4

1.4

Extracting regions using (a) Grid, (b) Key-points, and (c) Global approach [5].

5

1.5

Similarity depends upon the application requirements.

2.1

Illustration of four types of the multichannel feature extraction technique using

5 13

two input channels, (a) Each channel is quantized and merged to form a single channel and then descriptor is computed over it, (b) Binary patterns extracted over each channel are concatenated to form a single binary pattern and then histogram is computed over it, obviously this mechanism results the high dimensional feature vector, (c) Histograms of binary patterns extracted over each channel are concatenated to from the final feature vector, obviously the mutual information among each is not utilized, and (d) Binary patterns extracted over each channel are converted into other binary patterns using some processing and finally histogram of generated binary patterns are concatenated to form the final feature vector (generalized versions will be proposed in chapter 5). 3.1

Generating a circular patch/region of fixed size by normalizing the detected

23

patch/region of elliptical shape and arbitrary size returned by the affine invariant region detectors such as Harris-Affine/Hessian-Affine. 3.2

Rotation invariant coordinate system to compute the location of the local

23

features, O is the center of the patch and Xi is the sample point. 3.3

Considering local neighborhood as a set of different interleaved local neighborhood. The original N neighbors are divided into k neighboring sets having d=N/k neighbors each.

24

List of Figures

3.4

Illustration of proposed concept of local neighborhood division into multiple

25

interleaved sets and construction of IOLD pattern using an example, (a) example patch for pixel Xi, (b) intensity values of 8 local neighbors of considered patch, (c) partitioning of 8 local neighbors into 2 interleaved sets having 4 local neighbors each and its orders, (d) ordering patterns over each set, (e) weighted ordering patterns, and (f) final pattern for pixel Xi. 3.5

IOLD descriptor construction process, B support regions is used with C sub-

26

regions in each support region. The IOLD descriptor is constructed by accumulating local descriptor in each sub-region from all support regions. 3.6

Comparison between the pattern dimension using LIOP and proposed

28

approach. 3.7

Descriptors performance for kd=14, 24, 15, 25 and 16 when B=1 and C=1

30

using Harris-Affine region detector over the Oxford dataset. 3.8

Descriptors performance for kd=14, 24, 15, 25 and 16 when B=1 and C=1

31

using Hessian-Affine region detector over the Oxford dataset. 3.9

(a) Matching results and (b) matching time over the Oxford dataset in

32

conjunction with B and C while kd=13 and 23 and BC=22 for Harris-Affine detector. 3.10 Comparison of IOLD with LIOP, SIFT and HRI-CSLTP over the Oxford

32

dataset in terms of (a) ROC and (b) matching time using Harris-Affine detector. 3.11 Images of (a) Corridor and (b) Desktop category of the Complex illumination

33

change dataset. 3.12 (a-d) Descriptors performance and (e) matching time for kd=14, 24, 15, 25 and

33

16 when BC=11 using both region detector over the Complex illumination change dataset. 3.13 Comparison of IOLD with LIOP, SIFT and HRI-CSLTP over the Complex

35

illumination change dataset in terms of (a) recall-precision and (b) matching time using Harris-Affine detector. 3.14 Comparison of IOLD with LIOP, SIFT and HRI-CSLTP over the Large image matching dataset in terms of (a) recall-precision and (b) matching time using Harris-Affine detector.

xx

35

List of Figures

3.15 Visualization of the performance of SIFT, LIOP and IOLD under illumination

35

change, (a) a patch from the 1st image of the corridor, (b) same patch as of (a) but from the 4th image of the corridor,(c) the difference between the pattern of both patches for each descriptor, and (d) normal distribution of dissimilarities (c) at zero mean (µ=0). 3.16 Similarity between histograms of original and noised frames (effect of

36

Gaussian noise), the first desktop frame is the original frame and the remaining frames are obtained after adding Gaussian noise in original frame with zero mean and σ variance. 3.17 Matching time vs number of matched key-points using IOLD descriptor and

37

LIOP descriptor. 4.1

The biomedical image retrieval using local descriptors.

4.2

(a) The axis of the image in Cartesian coordinate system, and (b) the origin of

42 43

the axis is at the upper and left corner of the image and 𝑃𝑖,𝑗 is the pixel of image at coordinate (𝑖, 𝑗). 4.3

The computation of 𝐿𝐷𝐸𝑃𝑖,𝑗 pattern for center pixel 𝑃𝑖,𝑗 (intensity value 𝐼 𝑖,𝑗 )

43

using the flow diagram with an example. 4.4

𝑖,𝑗

The local neighbors (i.e. 𝑃𝑅,𝑁,𝑡 for ∀ 𝑡 ∈ [1, 𝑁]) of a center pixel (𝑃𝑖,𝑗 ) in polar

45

coordinate system. 4.5

(a) Cylindrical coordinate system axis, (b) the local bit-plane decomposition.

46

The cylinder can be thought of the B + 1 horizontal slices. The base slice of the cylinder is composed of the original center pixel and its neighbors with the center pixel at the origin. The remaining B slices correspond to the B bit-planes of the local neighbors of base slice. The (𝑘 + 1)𝑡ℎ slice from the base corresponds to the 𝑘 𝑡ℎ bit-plane of the base slice. 4.6

An example of bit-plane decomposition into 𝐵 = 8 bit-planes, (a) A sample

47

pattern with 𝑅 = 1 and 𝑁 = 8, (b) The decomposed bit-planes of the neighbors; the ‘red’ and ‘green’ circles represent ‘1’ and ‘0’ respectively. 4.7

Example of local bit-plane transformed values map for each bit-plane, (a) sample image, (b) LBP map over sample image, (c-j) local bit-plane transformed value maps for each bit-plane.

xxi

47

List of Figures

4.8

The LBDISP maps are plotted in (𝑘 + 1)𝑡ℎ column corresponding to the 𝑘 𝑡ℎ bit-plane for

51

𝑘 = 1 to 𝑘 = 8 respectively, for the input image in the 1st

column. It can be seen that the higher bit-plane (i.e. MSB) results more coarse information, whereas, the lower bit-plane (i.e. LSB) results more detailed information. The original input image is taken from the OASIS-MRI database. 4.9

𝑖,𝑗

The transformation of an 𝑁-dimensional vector 𝐼𝑅,𝑁 to another 𝑁-dimensional

54

𝑖,𝑗 ,𝑙

vector 𝑊𝑅,𝑁 at 𝑙 𝑡ℎ level using 1-D Haar wavelet. 4.10 OASIS-MRI example images, four images from each group.

58

4.11 Images from Emphysema-CT database, one image from each class.

58

4.12 Sample images from each category of the NEMA-CT database.

58

4.13 Some images of TCIA-CT database, one image from each category.

59

4.14 EXACT09-CT example images, one image from each group.

59

4.15 The performance comparison of LDEP, LBDP, LBDISP and LWP descriptors

60

with LBP, LTP, CSLBP, CSLTP, LDP, LTrP, LTCoP, LMeP and SS3DLTP descriptors over OASIS-MRI Database by using D1 distance measure in terms of the ARP, ARR, F-Score and ANMRR as a function of number of top matches (i.e. number of retrieved images). 4.16 The performance comparison descriptors over Emphysema-CT, NEMA-CT,

62

TCIA-CT and EXACT09-CT Databases by using D1 distance measure in terms of the ARP and ANMRR as a function of number of top matches (i.e. number of retrieved images). 4.17 The performance of LDEP, LBDP, LBDISP and LWP descriptors in terms of

63

ARP with different distance measures such as Euclidean, Cosine, Emd, Canberra, L1, D1 and Chi-square when either 25 or 50 images are retrieved. 4.18 The results comparison for different levels of wavelet decomposition of LWP

64

descriptor in terms of the ARP vs ω over (a) TCIA-CT, (b) EXACT09-CT, and (c) NEMA-CT databases. The values of N and R are 8 and 1 in this analysis so the possible levels of wavelet decomposition are 1, 2 and 3. 4.19 Fig. 4.19. Retrieval results from TCIA-CT database using LBP (1st row), LTP (2nd row), CSLBP (3rd row), CSLTP (4th row), LDP (5th row), LTrP (6th row), LTCoP (7th row), LMeP (8th row), SS3DLTP (9th row), LDEP (10th row), LBDP (11th row), LBDISP (12th row) and LWP (13th row) feature vectors. The first image in each row is the query image and rests are the retrieved images in order of deceasing similarity from left to right. Note that the images in red rectangles are the false positives. xxii

65

List of Figures

5.1

An illustration to compute the local colour occurrence binary pattern for (a) D

70

= 2, and (b) D = 1. The number of shades is considered as 5 in this example. 5.2

Five structure element containing (a) only one, (b) two consecutive, (c) two

72

non-consecutive, (d) three consecutive, and (e) four consecutive active elements. 5.3

Six patterns derived from the five structure elements, representing (a) no

72

structure, (b) type 1 structure, (c) type 2 structure, (d) type 3 structure, (e) type 4 structure, and (e) type 5 structure. 5.4

Extraction of structure map for each quantized shade; spρ represents the pattern

72

over ρth quantized shade for a particular pixel. In this example, the number of quantized color shade is set to 4. 5.5

Three examples to illustrate the computation of RSHD patterns over each

73

quantized shade for a particular pixel; in this example also, the number of quantized shade is set to 4. 5.6

The local neighbors 𝐼𝑡𝑛 (𝑥, 𝑦) of a center pixel 𝐼𝑡 𝑥, 𝑦 in 𝑡𝑡ℎ channel in polar

75

coordinate system for 𝑛 ∈ [1, 𝑁] and 𝑡 ∈ [1, 𝑐]. 5.7

(a) RGB image, (b) R channel, (c) G channel, (d) B channel, (e) LBP map over

76

R channel, (f) LBP map over G channel, (g) LBP map over B channel, (h-k) 4 output channels of the adder, and (l-s) 8 output channels of the decoder using 3 input LBP map of R, G and B channels. 5.8

The flowchart of computation of multichannel adder based local binary pattern

77

feature vector (i.e. maLBP) and multichannel decoder based local binary pattern feature vector (i.e. mdLBP) of an image from its Red (R), Green (G) and Blue (B) channels. 5.9

The performance of LCOD, RSHD, maLBP and mdLBP descriptors with

80

varying distances over Corel-1k, Corel-10k, MIT-VisTex and STex-512S databases. 5.10 The result comparison of LCOD, RSHD, maLBP and mdLBP descriptors with

81

LBP, cLBP, mscLBP, mCENTRIST, SEH and CDH descriptors over (a) Corel-1k, (b) Corel-10k, (c) MIT-VisTex and (d) STex-512S databases in terms of the ARP and ANMRR. 5.11 The results of LCOD, RSHD, maLBP and mdLBP descriptors over each category of Corel-1k database in terms of the average precision.

xxiii

82

List of Figures

5.12 Top 10 retrieved images using each descriptor for a query image from Corel-1k

82

database. Note that 10 rows corresponds to the different descriptor such as LBP (1st row), cLBP (2nd row), mscLBP (3rd row), mCENTRIST (4th row), SEH (5th row), CDH (6th row), LCOD (7th row), RSHD (8th row), maLBP (9th row) and mdLBP (10th row) and 10 last columns corresponds to the 10 retrieved images in decreasing order of similarity for the query image in first column. 5.13 Top 10 retrieved images using each descriptor for a query image from MIT-

83

VisTex database. Note that 10 rows corresponds to the different descriptor such as LBP (1st row), cLBP (2nd row), mscLBP (3rd row), mCENTRIST (4th row), SEH (5th row), CDH (6th row), LCOD (7th row), RSHD (8th row), maLBP (9th row) and mdLBP (10th row) and 10 last columns corresponds to the 10 retrieved images in decreasing order of similarity for the query image in first column. 5.14 The results comparison of different descriptors over (a) Corel-1k-Rotate, (b)

84

Corel-1k-Scale and (c) Corel-1k-Illumination database. 6.1

Work flow of illumination compensation in RICGICBIC color space.

88

6.2

Visualization of the illumination compensation steps (1st row) original images

90

nd

having uniform illumination differences, (2 row) intensity subtracted images, and (3rd row) contrast stretched images. This example image is taken from the Phos database [178]. 6.3

Visualization of the illumination compensation steps (a) original images having

91

non-uniform illumination differences, (b) intensity subtracted images, and (c) contrast stretched images. This example image is taken from the Phos database [178]. 6.4

Image retrieval using illumination compensation mechanism.

91

6.5

Sample images from (a) Corel-uniform and (b) Corel-non-uniform dataset.

92

6.6

…... Results in terms of ARP and ARR curves for different features with and

94

without illumination compensation using dSEH and dCDH similarity measures over Phos illumination benchmark dataset. 6.7

Retrieval results using each feature to a query image of Phos dataset.

95

6.8

ARP and ARR curves for different features with and without illumination

96

compensation using dSEH and dCDH similarity measures over Corel-uniform illumination synthesized dataset. 6.9

Retrieval results using each feature from Corel-uniform dataset.

xxiv

97

List of Figures

6.10 ARP and ARR results for different features with and without illumination

98

compensation using dSEH and dCDH similarity measures over Corel-non-uniform synthesized dataset. 6.11 Image retrieval results overt Corel-non-uniform dataset using dCDH distance.

99

6.12 Comparison between proposed illumination compensation method and existing 100 illumination compensation methods in terms of the retrieval performance over Corel-non-uniform dataset using GCH, CCV, BIC, CDH, SEH and SSLBP feature descriptors. 6.13 Performance evaluation of proposed illumination compensation approach using 100 illumination invariant feature descriptors LBP, CSLBP, LIOP, LTP, CSLTP and HOG over Corel-non-uniform dataset. 6.14 Performance evaluation of the proposed illumination compensation 102

approach using the a) RSHD, b) LCOD, c) maLBP and d) mdLBP descriptors proposed in Chapter 5 over Phos illumination database. 7.1

The working framework of the proposed Content Based Image Retrieval 104 (CBIR) system using Bag-of-Filters and Local Binary Pattern (LBP).

7.2

The five types filters used in this chapter as the Bag-of-Filters, (a) Average 105 filter, i.e. 𝐹1 , (b) Horizontal-vertical difference filter, i.e. 𝐹2 , (c) Diagonal filters, i.e. 𝐹3 , (d) Sobel edge in vertical direction, i.e. 𝐹4 , and (e) Sobel edge in horizontal direction, i.e. 𝐹5 .

7.3

(a) An example image, (b-f) the image obtained after applying the 5 filters with 105 mask 𝐹𝑎 |𝑎=1,2,3,4,5 respectively over the example image of (a).

7.4

The performance comparison BoF-LBP descriptor with LBP, SLBP, SOBEL- 107 LBP, LTP, LDP, LTrP, and SS-3D-LTP descriptors over Corel-1k database using (a) ARP (%) and (b) ARR (%).

7.5

The performance comparison proposed descriptor with other descriptors over 108 Corel-10k database using (a) ARP (%) and (b) ARR (%).

7.6

Performance comparison using ARP (%) over (a) MITVis-Tex and (b) STex- 108 512S databases.

7.7

The proposed framework for NIR face retrieval using SVD and local 110 descriptors.

7.8

Illustration of the sub-band formation (i.e. S, U, V and D sub-bands) from the 111 SVD factorization of any input PL using an example of size 4×4.

xxv

List of Figures

7.9

The performance comparison of LBP, SLBP, DBC and LGBP descriptors in 113 the 1st, 2nd, 3rd and 4th column respectively over different sub-bands of the SVD in terms of the ARP (%) vs 𝜂 (in the 1st row) and ARR (%) vs 𝜂 (in the 2nd row) over PolyU-NIR face database.

7.10 The performance comparison of LBP, SLBP, DBC and LGBP descriptors in 114 the 1st, 2nd, 3rd and 4th column respectively over different sub-bands of the SVD in terms of the ARP (%) vs 𝜂 (in the 1st row) and ARR (%) vs 𝜂 (in the 2nd row) over CASIA-NIR face database. 7.11 Comparison among different level of SVD decomposition using S sub-band in 114 conjunction with different descriptors over (a) PolyU-NIR and (b) CASIA-NIR face databases in terms of the F (%) and ANMRR (%). 7.12 Retrieval results from CASIA-NIR face database using LBP (1st row), SVD-S- 114 LBP (2nd row), SLBP (3rd row), SVD-S-SLBP (4th row), DBC (5th row), SVD-S-DBC (6th row), LGBP (7th row) and SVD-S-LGBP (8th row) descriptors. The first image in each row is the query face and rest images are retrieved faces. The faces in rectangles are the false positives. 7.13 Retrieval results from PolyU-NIR face database using LBP (1st row), SVD-S- 115 LBP (2nd row), SLBP (3rd row), SVD-S-SLBP (4th row), DBC (5th row), SVD-S-DBC (6th row), LGBP (7th row) and SVD-S-LGBP (8th row) descriptors. The first image in each row is the query face and rest images are retrieved faces. The faces in rectangles are the false positives.

xxvi

List of Acronyms ANMRR AP AR

ARP ARR

BIC BoF CBIR CCV CDH

Average Normalized Modified Retrieval Rank Average Precision Average Recall Average Retrieval Precision Average Retrieval Rate Border-Interior Classification

Bag-of-Filters Content Based Image Retrieval

cLBP CLBP

Color Coherence Vector Color Difference Histogram Color Local Binary Pattern Completed Local Binary Pattern

CSLBP CSLTP

Centre Symmetric Local Binary Pattern Centre Symmetric Local Ternary Pattern

CT DBC DCT DICOM Emd EXACT GCH GLOH HOG HRI HSI IOLD IRFET LBDISP LBDP LBP LBPu2 LBPriu2 LCOD LDEP LDP LEBP LFRs LGBP LGH LIOP LMeP LSB LTCoP LTP LTrP

Computed Tomography Directional Binary Code Discrete Cosine Transform Digital Imaging and Communications In Medicine Earth Mover’s Distance Extraction of Airways from CT Global Color Histogram Gradient Localization Oriented Histogram Histogram of Oriented Gradients Histogram of Relative Intensities Hue Saturation Intensity Color Space Interleaved Intensity Order Based Local Descriptor Illumination Robust Feature Extraction Transform Local Bit-plane Dissimilarity Pattern Local Bit-plane Decoded Pattern Local Binary Pattern Uniform Local Binary Pattern Rotation Invariant Uniform Local Binary Pattern Local Color Occurrence Descriptor Local Diagonal Extrema Pattern Local Derivative Pattern Local Edge Binary Pattern Local Feature Regions Local Gabor Binary Pattern Logarithm Gradient Histogram Local Intensity Order Pattern Local Mesh Pattern Least Significant Bit Local Ternary Co-occurrence Pattern Local Ternary Pattern Local Tetra Pattern

List of Acronyms

LWP maLBP mdLBP MRF MRI MSB mscLBP NEMA NIR NNDR OASIS OSID PET PS+HE RGB RIFT RSHD SIFT SEH SLBP SOBEL-LBP SQI SS-3D-LTP SSLBP SURF SVD TCIA

Local Wavelet Pattern Multichannel Adder Based Local Binary Pattern Multichannel Decoder Based Local Binary Pattern Markov Random Field Magnetic Resonance Imaging Most Significant Bit Multi-Scale Color Local Binary Pattern National Electrical Manufacturers Association Near-InfraRed Nearest Neighbor Distance Ratio Open Access Series of Imaging Studies Ordinal Spatial Intensity Distribution Positron-Emission-Tomography Plane Subtraction and Histogram Equalization Red Green Blue Color Space Rotation Invariant Feature Transform Rotation and Scale Invariant Hybrid Descriptor Scale Invariant Feature Transform Structure Element Histogram Semi Structure Local Binary Pattern Sobel Local Binary Pattern Self-Quotient Image Spherical Symmetric 3D Local Ternary Pattern Square Symmetric Local Binary Pattern Speeded Up Robust Features Singular Value Decomposition The Cancer Image Archive

xxviii

Chapter 1 Introduction

Human beings perceive their environment through audio, visuals, heat, smell, touch, etc. and react accordingly. Out of these media, the visual system is the key component of the human brain to collect most of their environmental information. The computer vision is also dedicated to design the artificial systems to sense and realize the surroundings. It is likely to improve the quality of life and upgrade the society by several means; such as, it can be used for the disease diagnosis purpose to improve the decision making in health care, it can be used to retrieve the information from image databases, it can be used for face, facial expression, gesture and action recognition, it can be used for the surveillance purpose to improve the security, it can be used to facilitate the monitoring in agriculture, and many more. In most of the computer vision problem, the information contained by one image or part of the image is required to match with the information contained by another image or part of the image [1]. The basic aim of image matching is to automatically recognize whether two digital images contain the similar scene on the basis of the features derived from them [2]. The challenges in image matching are generated by the potential photometric and geometric transformations, such as scale change, rotation, compression, blur, affine change, viewpoint variations, and illumination changes [1]. It is very difficult to match images of the same object taken under different imaging condition. To facilitate the image matching, image descriptors came into existence and later on, the images are being matched using the descriptors [1-5]. This chapter is organized in the following manner: Section 1.1 provides the motivation behind designing the robust image feature descriptors; Section 1.2 briefs the image descriptors and matching criteria to be used; Section 1.3 discusses the Content Based Image Retrieval (CBIR), i.e. an important application of image descriptors; Section 1.4 highlights the objective of this thesis; and finally, Section 1.5 outlines the content of this thesis.

1.1. Motivation Most of the computer vision problems have the great demand to detect and classify the visual features. Are these two images representing the same scene captured under different

Chapter 1

Introduction

conditions? Is the biomedical image taken having some sort of diseases? Which type of object is present in this image? A human face is present in this image or not? Are some defects present in the images of fruits, vegetables and plants? In order to automatically detect and take the decision over the characteristics of the images and videos in real time, both highly discriminating and the most efficient image descriptors really have the extreme need to be investigated. Bela Julesz [3] has done the revolutionary work to describe the images using the appearance based features in 1962. Too many image descriptors have been reported since then and also obtained good results while solving several computer vision problems, matching, retrieval, object classification, face recognition, and others. The real life images are also having the tremendous amount of differences even of the same scene at different times, which turns the computer vision problems more and more challenging and requires the adaptive solutions. Rotation, scaling, viewpoint change, non-rigid deformations, occlusion, background clutter, lighting variations, blurring, noises, etc. are the major type of changes present in the real life images. Over the last few years, the researchers are trying to reduce the inter class similarity and intra class dissimilarity by using the machine learning techniques. The machine learning techniques require feature descriptor of training database as input and try to learn the intra and inter class patterns. If the descriptors are not good, then the learned pattern will also not be too correct to be used for classification. Thus, ultimately the goodness of descriptors is the backbone of the most of the computer vision problems. The above issues are transformed into the problem of image similarity. Are given two images/ part of images similar or not? Regional or global descriptors are the most appropriate tools to solve the matching problems. Matching can be done in many ways and the simplest way is to compare the raw intensity values of the pixels. It can be observed in Fig. 1.1 that the intensity values of two corresponding windows are quite different due to the variation in orientation, scale and color of the similar object of the dinosaur. Hence, the comparison using descriptor becomes necessary. Basically, the image feature description facilitates the way to compare two images/regions and it is required to match the descriptors of images/regions in order to match the two images/regions. The problem of image matching becomes more challenging in case of presence of some effect like rotation, scaling, illumination, etc. In Fig. 1.2, two images in the left hand side are having rotation variation; if we try to match these two images pixel by pixel using raw intensity values, then certainly it is not possible to decide that these two images are having the same scene.

Fig. 1.1. Comparing pixels of two regions (images are taken from Corel-database [6])

2

Chapter 1

Introduction

Fig. 1.2. Comparing using descriptor function (images are taken from [7] and the patterns are computed using our RSHD descriptor explained in Chapter 5)

The RSHD pattern (Refer Chapter 5 for RSHD) computed over these images are shown in the right hand side. It can be observed that the RSHD patterns of the images at the left hand side are quite similar and based on its comparison correct decision can be made. Thus, the robust matching is facilitated by the descriptors which provide some degree of transformations in images being matched correctly.

1.2. Image Feature Descriptors In the area of computer vision, image descriptors or visual descriptors are the characterizations of the information hidden in the images. It is described by basic characters such as the motion, texture, color and shape. A descriptor must be highly distinctive to differentiate one type of object from other type of object and sufficiently robust to facilitate matching in the presence of noise and various geometric and photometric transformations [1]. For last few decades, a rapid growth is observed in the amount of visual content in digital form due to the highly and diversified uses of the internet and new technologies. In order to index, retrieve and categorize the multimedia content efficiently and accurately, it is extremely important to investigate the systems which characterize the multimedia information as per the application requirement. The appearance based image descriptors are actually the key medium which represent the information carried by any image/region/object and allow the effective and accurate decision making. Fig. 1.3 depicts the formation of descriptors which can be summarized as follows: a) Extract features from the image as small regions, b) Describe each region using a feature descriptor, and c) Use the descriptors in application for comparison, training, classification, etc. The main problems associated with the development of effective image descriptors are categorized in the form of following three questions to be answered [5]: a) Where to compute the descriptors? b) How to compute the descriptors? And c) How to compare the descriptors?

3

Chapter 1

Introduction

Fig. 1.3. Descriptor construction from interest regions of an image [5] 1.2.1.

Where to Compute the Descriptors?

The descriptors are computed over interest regions; which can be extracted by using following three approaches, namely Grid [8], Key-Points [9-15], and Global [16-25]. Fig. 1.4 shows the feature detection using grid, key-point and global approach. In the grid based approach, the image is divided into several regions using rectangular grid and each grid represents a region. The descriptor is computed over each image grid separately. In the keypoint approach, some interest points are extracted in the image and the descriptors are computed in the neighborhood of each interest point. In the global approach, the image itself is treated as the single region and a descriptor is formed over it. The dimension of the grid and key-point region based descriptors is generally high as both compute the descriptors over multiple regions of the image; whereas, the dimension of the global region based descriptors is generally low as it computes a single descriptor. Global region based descriptors are mostly used for large image databases. The good features should have the following properties [9, 26-27]: Locality: The features must be local to prevent the occlusion probability caused by the viewdependent image deformations. Pose invariance: The orientation, scale, etc. must be automatically identified and selected by the feature-point detector. Distinctiveness: A low false positive rate and high detection rate must be obtained by the feature descriptors. Repeatability: Under different transformations, the same point must be detected. 1.2.2.

How to Compute the Descriptors?

A descriptor can be computed in several ways. The construction of the descriptor totally depends upon the application and requirement. Different applications require different invariance therefore require different descriptors. For example, the images in Fig. 1.5 are similar if we consider only the shape of the image; whereas, these are dissimilar if we consider the color distribution of the image.

4

Chapter 1

Introduction

(a)

(b)

(c)

Fig. 1.4. Extracting regions using (a) Grid, (b) Key-points, and (c) Global approach [5]

Fig. 1.5. Similarity depends upon the application requirements

Color, texture and shape are the main characteristics of most of the images and also the basic type of features to describe the image [24]. The most basic quality of visual information is color. RGB histogram, Opponent histogram, Hue histogram, rg histogram, transformed color distribution, color moments, color moment invariants, etc. are the basic type of color features [28]. Texture is also an essential characteristic of an image in order to characterize the image regions or textures. It basically finds the mean, variance, energy, correlation, entropy, contrast, homogeneity, cluster shade, etc. in the region [29]. In order to semantic description of the content of an image, shape representation plays an important role. Fourier descriptors, curvature scale space descriptor, angular radial transform, image moments, etc. are the important shape based image feature descriptors [30]. Several descriptors are being proposed for different applications using the combinations of different color, texture and shape features. Some descriptors are also investigated in this thesis for different applications which are described in rest of the chapters. 1.2.3.

How to Compare the Descriptors?

Generally image descriptors are proposed along with its distance measure to match the descriptors. The most commonly distance measure used and reported in the literature which computes the dissimilarity between descriptors is the Euclidean distance. If the Euclidean distance between two descriptors is less, it means two descriptors are more likely to be similar; hence the corresponding images are also similar [31]. Other common distances are L1, Earth Mover’s Distance (Emd), Cosine, Canberra, D1, and Chi-square [32-35]. The fundamental work of similarity measure is to find the dissimilarity between the descriptors of two images. Let the descriptors of two images &

and

are denoted as respectively, where

is the dimension of feature vectors. The different distances are defined as follows:

5

Chapter 1

Introduction

Euclidean Distance

L1 Distance

Cosine Distance

Emd Distance

where cdf is the cumulative distribution function computed by the cumulative sum. Canberra Distance

The

Distance

Chi-square Distance

The above mentioned distances are used in this thesis to find the dissimilarity between two regions/images for local region matching and image retrieval problem. The suitability of these distances with different descriptors is also investigated in the coming chapters for different applications.

1.3. CBIR: An Application of Image Descriptors The image descriptors are used in several applications such as image indexing and retrieval, biometric verification and identification, object recognition, motion tracking, 3D reconstruction etc. Content-based image retrieval (CBIR): A technique, has gained the extensive attention in the era of information technology [36]. CBIR provides a solution for searching of similar images from the large databases. It searches the images based on its

6

Chapter 1

Introduction

content, such as color, texture, shape, etc. rather than the metadata such as tags, keywords, etc. Any type of database can be used for CBIR such as natural scene database, texture database, biomedical database, face database, etc. Recently, several feature descriptors have been proposed for CBIR over complex datasets i.e. biomedical images [32-34, 37], natural images [20, 22-24], texture images [20, 38-40], etc. It is highly recommended to design the more discriminative, robust and efficient retrieval algorithms as the internet and technology is causing the tremendous increase in the image and video databases.

1.4. Objective of the Thesis The objective of this thesis is given as follows, which mainly covers the distinctive, robust and low dimensional image descriptors for computer vision problems: 

To carry out the literature survey and analysis of various image feature descriptors for computer vision applications.



To construct the distinctive descriptors for biomedical, natural, textural and facial image retrieval.



To construct the robust descriptors against scaling, rotation, illumination, blur, compression, and viewpoint variations.



To construct the low dimensional descriptors to minimize the matching time.



To construct the multichannel descriptors for color natural and texture image retrieval.

1.5. Outline of the Thesis We have organized this thesis in the following chapters covering from the literature survey, problem formulation, proposed methodologies, experimental results and observations to conclusions: Chapter 2: Literature Review 

In this chapter, we summarized the state-of-the-art findings of discriminative, efficient and robust image feature description and presented a survey of efforts done in the recent past by various researchers to address this problem.

Chapter 3: Interleaved Intensity Order Based Local Descriptor 

In this chapter, we described a local image descriptor for region description based on the interleaved intensity orders which is inherently rotation and monotonic intensity change invariant. We demonstrated the effectiveness of the descriptor in region matching problem under various geometric and photometric transformation scenarios.

7

Chapter 1

Introduction

Chapter 4: Local Gray Scale Image Descriptors for Biomedical Image Retrieval 

In this chapter, we developed the local image descriptor by utilizing the center and neighboring pixels relationship information to describe the gray scale images and applied to the biomedical image retrieval, mainly over the MRI and CT image databases.

Chapter 5: Local Color Image Descriptors for Natural & Texture Image Retrieval 

In this chapter, we described the color images from multichannel information in multiple ways to enhance its discriminative ability as well as to increase its robustness for image rotation and image scaling. We used natural and textural color databases for the performance evaluation of these color based descriptors using the image retrieval framework.

Chapter 6: Brightness Invariant Image Retrieval Using Illumination Compensation 

In this chapter, we developed a framework for the illumination compensation of both uniform as well as non-uniform type of the color image and applied to the brightness invariant image retrieval.

Chapter 7: Boosting the Performance of Local Descriptors using BoF and SVD 

In this chapter, we investigated the effect of Bag-of-filters and Singular value decomposition over local image descriptors such as local binary pattern and apply to the image retrieval over natural, texture and near-infrared face images.

Chapter 8: Conclusions and Future Scopes 

In this chapter, we provided the main findings of this thesis and also identified the scopes of some further research.

8

Chapter 2 Literature Review

This chapter presents an extensive literature survey of image feature descriptors in many application areas of the computer vision. We covered the image descriptors designed for the gray scale to color, including natural, texture, biomedical and face images. One of the aspects of this chapter is also to explore the distinctiveness, discriminativeness, robustness and the effectiveness of the existing state-of-the-art descriptors. We also surveyed various preprocessing and post-processing approaches employed to improve the quality of the descriptors mainly for the content based image retrieval over gray scale natural, texture and face databases. We organized this chapter in the following manner: we reviewed the region-based local descriptors mainly with gradient distribution and order property in Section 2.1; we reviewed the local gray scale descriptors in Section 2.2 primarily for the biomedical images; we surveyed the local color descriptors for natural and textural images in Section 2.3; we also surveyed the brightness robust local descriptors in Section 2.4; we reviewed the pre and post processing approaches for boosting the local descriptors in Section 2.5; we listed the research gaps identified from the literature survey in Section 2.6; and finally, we summarized this chapter in Section 2.7.

2.1. Region Based Local Descriptor Computer vision researchers widely studied local feature descriptors constructed over the detected interest regions. Recent years, local features have been frequently used in large number of vision application problems such as 3D reconstruction, panoramic stitching, object recognition, image classification, facial expression recognition, and structure from motion [9, 41-45]. The main focus while describing the local image features is to enhance the distinctiveness and maintain the robustness to the various image transformations. The basic goal is to first find the affine invariant interest regions and then the extract feature pattern descriptor for each of them. Hessian–Affine and Harris–Affine [46-47] detectors have been widely used for the extraction of interest regions. After detecting region of interest, feature

Chapter 2

Literature Review

descriptors are constructed over it in order to facilitate the region matching. In the literature, many feature descriptors have been proposed with the increasing interest in region detectors [48], and it is observed that the performance of image descriptors based on some distributions are significantly better than the descriptors based on the spin image, shape context, and steerable filters [49-51]. The distributions of the gradient are widely used by distributionbased methods. For example, a histogram of gradient orientation on 4×4 location cells are computed by the SIFT descriptor [9]. Many other local image feature descriptors such as GLOH, SURF, and DAISY similar to SIFT have been introduced in the literature were encouraged by the success of the SIFT descriptor [52-55]. Some recent works involved Gaussian shapes as a feature descriptor [56], a descriptor for face recognition applications under image blur condition [57], using alternate Hough and inverted Hough transforms for robust feature matching [58]. Although theoretically rotation invariant feature descriptors (i.e. Rotation Invariant Feature Transform (RIFT) and spin image [50]) also exist in the literature, but these descriptors discard spatial information and becomes less distinctive. Exact orders have been utilized for image feature descriptor by Kim et al. [14]. They combined the exact global and local orders to generate the EOD descriptor. Orthogonal LBPs are combined with color information to describe the image regions in [59]. Distribution-based descriptors are partially or fully robust to many of the geometric image transformations, such as rotation, scale, occlusions etc., but can’t handle more complex illumination changes. To ease this problem, some researchers have proposed to consider the orders of local intensities rather than the raw intensity values, because invariance to monotonic difference is being obtained by using the order of the intensity values. Some common order based descriptors are OSID, LBP, uniform LBP, CS-LBP, HRI, CSLTP and LIOP [11-12, 16-17, 60-62]. A local binary pattern (LBP) creates the pattern for every pixel based on the ordering information [62]. The major benefit of LBP is its simplicity in the computation and also it’s invariance to the illumination changes, but LBP has some drawbacks, such as, computation of feature with high dimension and sensitivity to Gaussian noise in the uniform areas. Observing that small subset of the LBP contains the most of the textural information, a uniform LBP is proposed in [16]. The CS-LBP reduces the dimension of LBP by comparing only center-symmetric pixel intensity differences [17]. The CS-LTP descriptor is introduced by considering only diagonal comparisons among the neighboring points [11]. HRI and CS-LTP contains complementary information and it is combined to construct a single HRI-CSLTP descriptor [11]. Recently, Wang et al. [12] proposed LIOP descriptor to encode the intensity order pattern among the neighbors located at a fixed radius from a given pixel. They assigned a unique order to each neighbor and partitioned the whole patch into different regions according to the global ordering of each pixel in the patch and calculated the LIOP in each region and concatenated them in order to obtain a single pattern.

10

Chapter 2

Literature Review

The main problem associated with LIOP is its exponentially increasing dimension with an increase in the number of neighboring pixels considered for descriptor construction. LIOP has utilized only 4 neighbors and shown very promising performance under several imaging conditions; however, it is not tested by its inventors for more than 4 neighbors.

2.2. Local Gray Scale Descriptors In the field of medicine, images play a crucial role in data management, disease diagnosis and training purposes. Now various types of images are being generated by the medical imaging devices such as computed tomography (CT), magnetic resonance imaging (MRI), visible, nuclear imaging, etc. to capture the characteristics of the body parts [63]. These images are treated as a source for the diagnosis aid. However, due to the rapid increase in the number of the medical images day by day, the patient diagnosis in medical institutions and hospitals is becoming more challenging and required more accurate and efficient image searching, indexing and retrieving methods. Content-based image indexing and retrieval is turning up continuously to combat this problem on the basis of the image digital content [64] such as color, texture, shape, structure, etc. The extensive and comprehensive literature survey on content based image retrieval (CBIR) is presented in [36, 65]. Medical image retrieval systems are being used mostly by the physicians who are experts to point out the disorder present in the image by retrieving the most similar images from the related reference images with its associated information. Many medical image retrieval systems are presented by various researchers through the published literature [66-72]. Muller et al. have reviewed the medical CBIR approaches on the basis of the clinical benefits [73]. The feature vectors are extracted from each image in order to facilitate the image retrieval and the feature vector of query image is compared with the feature vectors of the database images. The performance and efficiency of any CBIR system are heavily dependent upon the feature vectors. The feature vectors being used in recent retrieval and classification systems use the visual information of the image such as shape [74-75], texture [32-33], edges [34, 76], color histograms [77-78], etc. Texture based image descriptors have been widely used in the field of pattern recognition to capture the fine details of the image. Ojala et al. [62] introduced the local binary pattern (LBP) for texture classification. The LBP operator became more popular due to its reduced computational complexity and enhanced performance in several applications such as face recognition [79], analysis of facial paralysis [80], analysis of pulmonary emphysema [81], etc. Several other LBP variants [16-20, 82-84] also proposed for texture representation in view of the high success of LBP. Center symmetric local binary pattern (CSLBP) is investigated to reduce the dimension of the LBP for local region matching [17]. Local ternary pattern (LTP)

11

Chapter 2

Literature Review

is introduced as the generalization of LBP for face recognition under varying lighting situations [18]. These methods are generally illumination invariant. Peng et al. extracted the texture cues in chest CT images on the basis of the uniformity of structure and brightness in the image [85]. They depicted the structure and brightness in the image using extended rotation invariant local binary pattern and difference in gradient orientations. Region of interest retrieval is proposed by Unay et al. [37] in brain MR images on the basis of the local structure exist in the image. SVM-based feature selection is applied over the textural features for tumor recognition [86] in wireless capsule endoscopy images. Felipe et al. have used the co-occurrence based gradient texture feature for tissue identification by CBIR system [87]. To reduce the memory required for image storage, physiological kinetic feature is presented by Cai et al. [88] for positron-emission-tomography (PET) image retrieval. Some methods designed for medical image retrieval using distance metric/similarity learning are depicted in [89-91]. Wavelet based features are also presented by some researchers in medical CBIR systems [92-93]. These methods mainly used the wavelet transformation over the images globally [94] (i.e. 2-D wavelet transformation of images). The local feature descriptors presented through the published literature have utilized the relationship of a referenced pixel with its neighboring pixel [16, 18, 32]. Some approaches also tried to utilize the relationship among the neighboring pixels with success in some extent but at the expense of high dimensionality which are generally more time consuming for image retrieval [33, 83].

2.3. Color Image Descriptors The performance of any image retrieval system heavily depends upon the image feature descriptors being matched [36]. Color, texture, shape, gradient, etc. are the basic type of features to describe the image [36]. Color and texture based image feature descriptor is very common in the research community. A performance evaluating of color descriptors such as color SIFT, Opponent SIFT, etc. are made for object and scene Recognition in [28]. These descriptors first find the regions in the image using region detectors, then compute the descriptor over each region and finally the descriptor is formed by using bag-of-words model. Researchers are also working to upgrade the bag-of-words model [95]. Another interesting descriptor is GIST which is basically a holistic representation of features and has gained wider publicity due its high discriminative ability [96-98]. Recently, local pattern based descriptors have been used for the purpose of image feature descriptor [1, 16-19, 45, 79, 84, 99-101]. These approaches are introduced basically for gray images, in other words only for one channel and performed well, but most of the times in real scenarios the natural color images are required to be characterize which are having more than one channel. To describe

12

Chapter 2

Literature Review

the color images using local patterns, several researchers adopted the multichannel feature extraction approaches. These techniques can be classified in five categories.

Channel 1 Quantization (Single channel)

Binary Pattern

Histogram

Concatenated Binary Pattern

Histogram

Channel 2

(a) Channel 1

Channel 2

Binary Pattern 1

Binary Pattern 2

(b) Channel 1

Binary Pattern 1

Histogram 1

Channel 2

Binary Pattern 2

Histogram 2

Concatenated Histogram

Channel 1

Binary Pattern 1 Transformation

Channel 2

Binary Pattern 2

Binary Pattern 1

Histogram 1

°

°

° Binary ° t Pattern

° Histogram t °

Concatenated Histogram

(c)

(d) Fig. 2.1. Illustration of four types of the multichannel feature extraction technique using two input channels, (a) Each channel is quantized and merged to form a single channel and then descriptor is computed over it, (b) Binary patterns extracted over each channel are concatenated to form a single binary pattern and then histogram is computed over it, obviously this mechanism results the high dimensional feature vector, (c) Histograms of binary patterns extracted over each channel are concatenated to from the final feature vector, obviously the mutual information among each is not utilized, and (d) Binary patterns extracted over each channel are converted into other binary patterns using some processing and finally histogram of generated binary patterns are concatenated to form the final feature vector (generalized versions are proposed in chapter 5).

13

Chapter 2

Literature Review

The first category as shown in Fig. 2.1(a) first quantizes each channel, then merges each quantized channel to form a single channel and form the feature vector over it. Some typical example of this category is Structure Element Histogram (SEH) [24], Color Difference Histogram (CDH) [102] and Color CENTRIST [103]. The major drawback of these methods is the reduced robustness against rotation and scale of the image. The second category simply concatenates the binary patterns of each channel into the single one as depicted in the Fig. 2.1(b). The dimension of the final descriptor is very high and not suited for the real time computer vision applications. In the third category (see Fig. 2.1(c)), the histograms are computed for each channel independently and finally aggregated to form the feature descriptor, for example, [104-108]. Heng et al. [104] computed the multiple types of LBP patterns over multiple channels of the image such as Cr, Cb, Gray, Low pass and High pass channels and concatenated the histograms of all LBPs to form the single feature descriptor. To reduce the dimension of the feature descriptor, they selected some features from the histograms of LBPs using shrink boost method. Choi et al. [105] computed the LBP histograms over each channel of a YIQ color image and finally concatenated to from the final features. Zhu et al. [106] have extracted the multi-scale LBPs by varying the number of local neighbors and radius of local neighborhood over each channel of the image and concatenated all LBPs to construct the single descriptor. The histograms of multi-scale LBPs are also aggregated in [107] but over each channel of multiple color spaces such as RGB, HSV, YCbCr, etc. To reduce the dimension of the descriptor, Principle Component Analysis is employed in [108]. A local color vector binary pattern is defined by Lee et al. for face recognition [108]. They computed the histogram of color norm pattern (i.e. LBP of color norm values) using Y, I and Q channels as well as the histogram of color angular pattern (i.e. LBP of color angle values) using Y and I channels and finally concatenated these histograms to form the descriptor. The main problem with these approaches is that the discriminative ability is not much improved because these methods have not utilized the inter channel information of the images very efficiently rather simply combined the color information. In order to overcome the drawback of the third category, the fourth category came into the picture where some of bits of the binary patterns of two channels are transformed and then the rest of the histogram computation and concatenation takes place over the transformed binary patterns as portrayed in the Fig. 2.1(d). mCENTRIST [109] is an example of this category where Xiao et al. [109] have used at most two channels at a time for the transformation. In this method, the problem arises when more than two channels are required to model, then the author suggested to apply the same mechanism over each combination of two channels which in turn increases the computational cost of the descriptor and also increases the redundancy in the descriptor.

14

Chapter 2

Literature Review

2.4. Brightness Robust Image Descriptors Illumination robust image matching and retrieval has been the bottleneck from the last two decades. In the published literature various approaches have been introduced to tackle the image retrieval problem [110-114]. Most of the descriptor is based on the low-level features such as color, texture, shape, sketch, etc. Color is a very important visual cue to differentiate between the two images. Color is widely used low-level feature in CBIR systems because of its simplicity and invariance. Scale and rotation invariance property of color histograms boost this feature for image retrieval and classification. Simplest color encoding is the global color histogram (GCH) which represents the frequency of each color in the image [115]. Some other color based feature descriptors are Color Coherence Vector (CCV) [116], and Color difference histogram (CDH) [102]. CCV encodes the color information into connected regions, whereas CDH represents the image using a color difference of two pixels in the image for each color and edge orientation. Recently, color information is also encoded in the form of histogram over the local feature regions (LFRs) extracted using the Multi-scale Harris-Laplace detector [117]. Texture feature is important visual information of the image and widely adopted to represent the images. Recently, square symmetric local binary pattern (SSLBP) is introduced by Shi et al. [118] which is a variant of LBP. The CLBP is used for Apple fruit disease recognition [119]. Some texture features adopted for image retrieval are Border-interior pixel classification (BIC) [120] and Structure element histogram (SEH) [24]. Angular information is also used to represent the texture of the image [121-122]. Color and texture features are integrated by Wang et al. for CBIR [123]. Shape and sketch based descriptors are also used to represent the features of the image in the form of pattern and have shown attractive results for image retrieval [124-126]. A study of shape based retrieval techniques is done by Shahabi and Safar [124]. Saavedra and Bustos have applied the concept of local keyshapes as an alternative to the local keypoints to encode the shapes of the object for sketch-based image retrieval [125]. Shape retrieval is also incorporated in the sketch synthesis to obtain the information from both shape and sketch [126]. Shape and sketch based descriptors generally require edge detection and image segmentation which limit its applicability. Some other retrieval approaches are quadtree classified vector quantization [127], spatial relations [128], similarity refinement [129] and the short and long term learning fusion [130]. Low-level feature descriptors represent the information in the image efficiently and used in several image matching problems, but these features are sensitive to the illumination differences (i.e. photometric transformations) and fail to produce better result. Some approaches also reported in the published literature to cope with the problem of illumination sensitivity. Illumination robust feature extraction transform (IRFET) is reported recently to detect the interest points in the image using contrast stretching functions (i.e.

15

Chapter 2

Literature Review

contrast signatures) [131]. In [132], bi-log transformation is used to remove the effect of the illumination variation, but this approach needs to break the intensity value into illumination and reflectance part which imposes extra computation and not always desirable. Some methods are also suggested for illumination robust face recognition [133, 134]. Logarithm Gradient Histogram (LGH) is proposed using spectral wavelength, the magnitude and direction of the illumination for the face recognition problem under varying illuminations [133]. In [134], illumination differences are compensated and normalized using discrete cosine transform (DCT) in the logarithmic domain by truncating low-frequency DCT coefficients. However, this method requires a parameter tuning for the number of DCT coefficients to be truncated which differs in different illumination conditions. Texture is also represented by the Markov random field (MRF) [135] to achieve the spectrum invariance for CBIR systems. Ranganathan et al. [136] utilized probability distribution of descriptor space of training data to model the lighting changes in the feature descriptor and to learn the system. To attain the anisotropic diffusion and to retain the gradient magnitude information, intensity level is multiplied with a large and constant weight (i.e. 2D image patches are embedded as 3D surfaces) which leads to a descriptor having invariance to intensity changes [137]. Order based approaches have been shown the promising image matching results under uniform illumination change [11-12, 60, 138]. In [138], orders between certain pixels are used to penalize the difference between the patches. Only locally stable pixel pairs are chosen and order among each chosen pair is summarized to find the penalty required. Histogram of relative intensities (HRI) is used to encode the orders of pixels relative to the entire patch by Raj et al. [11]. They also proposed the Center-Symmetric Local Ternary Patterns (CSLTP) which is a generalization of the CSLBP and combined it with the HRI to aggregate the complementary information. Local intensity order pattern (LIOP) is introduced which is intrinsically invariant to monotonic intensity changes [12]. LIOP is constructed from the orders among the intensity values of the neighboring sample points of each pixel in the image. Tang et al. [60] introduced local OSID feature descriptor from ordinal spatial distributions. OSID descriptor is obtained by grouping the pixel intensity values into both spatial and ordinal spaces. While these features perform well to match the images having uniform intensity changes, but they can’t handle the complex or non-uniform illumination differences.

2.5. Boosting Performance of Image Descriptors Various feature descriptors are proposed and used for image retrieval and gained wide publicity [20-21, 83, 113]. These methods have utilized color, texture and local information of the image to design the feature descriptor. The LBP and its variants are mainly computed over the raw intensity values [1, 16, 18, 20, 142-147]. In order to utilize the richer local

16

Chapter 2

Literature Review

information, many researchers performed some kind of preprocessing before the feature extraction. Some typical examples are Local Edge Binary Pattern (LEBP) [21], Sobel Local Binary Pattern (SOBEL-LBP) [139], Semi Structure Local Binary Pattern (SLBP) [140] and Spherical Symmetric 3D Local Ternary Pattern (SS-3D-LTP) [83]. These descriptors have utilized only a few filters to boost the performance. James has compared the preprocessed images directly which is obtained by multiple filtering [141] for face recognition, whereas he has not utilized the power of descriptor over the multiple filtered images. Near-infrared (NIR) facial analysis is also an open challenge in current days [142, 148]. The images of the same face under different lighting directions in visible light are negatively correlated, whereas, closely correlated face images of the same individual are produced by the Near-infrared imaging [148]. Moreover, the Near-infrared imaging is more useful for the indoor and cooperative-users [148]. The performance of human as well as machine under near-infrared light and visible light is investigated by Hollingsworth et al. using periocular biometrics [149]. In a recent study of visual versus near-infrared face image matching [150], the authors have used an NIR face image as the probe while enrollment is done with the visual face images. The Singular Value Decomposition (SVD) is one of the most elemental matrix computations in numerical linear algebra [151]. It can be simply termed as the extended version of the theory of linear filtering or a tool for the image enhancement [152, 153]. SVD has several uses such as in signal analysis [154], gate characterization [155], image coding [156], image watermarking [157], image steganalysis [158], image compression [159], and face recognition [160], etc. A novel concept of sub-band decomposition and multiresolution representation of digital color images using SVD is introduced by Singh et al. [161]. They have basically applied the SVD over regions of the image and finally form the 4 sub-bands. The SVD over the face image is used for the illumination robust face recognition in [162-163]. The above surveyed literature points out that SVD is very effectively used in several applications as the feature; whereas it is not investigated in conjunction with some descriptors. We summarized the various descriptors in Table 2.1 on the basis of several characteristics. The third column depicts that the descriptors is based on the key-point regions or global region using  and  symbols respectively. The type of the descriptors is categorized in column 4 into gradient based, binary, order based, difference based, structure based, etc. The invariance property of the state-of-the-art descriptors are compared in column 5-8 against scale, rotation, illumination and viewpoint change respectively. In column 5-8, ‘Y’, ‘P’ and ‘N’ represent that the particular invariance property is ‘present’, ‘partially present’ and ‘not present’ in a particular descriptor respectively. The last column points out that the descriptors are introduced over gray scale or color image and represented by ‘G’ and ‘C’ respectively.

17

Chapter 2

Literature Review

Table 2.1. Analysis of some important reviewed descriptors in terms of its properties Method

Reference Keypoint Based

Type

Scale Rotation Illumination Viewpoint Color/ Invariance Invariance Invariance Invariance Grayscale

SIFT

[9]



Gradient

Y

Y

N

N

G

Edge-SIFT

[10]



Binary

Y

Y

N

N

G

MROGH

[15]



Order+ Gradient

P

Y

N

Y

G

HRI

[11]



Order

N

Y

P

N

G

LIOP

[12]



Order

N

Y

Y

N

G

MRRID

[13]



Order

P

Y

Y

Y

G

EOD

[14]



Order

N

Y

Y

N

G

LBP

[16]



Difference

N

N

Y

N

G

CS-LBP

[17]



Difference

N

N

Y

N

G

LTP

[18]



Difference

N

N

Y

N

G

CS-LTP

[11]



Difference

N

N

Y

N

G

LDP

[19]



Gradient+ Difference

N

N

Y

N

G

LTrP

[20]



Direction

N

N

Y

N

G

LEBP

[21]



Filter+ Difference

N

N

Y

N

G

SEH

[24]



Structure

N

N

N

N

C

CDH

[102]



Gradient

P

Y

N

N

C

LMeP

[33]



Difference

N

N

Y

N

G

LTCoP

[32]



Difference

N

N

Y

N

G

SS-3D-LTP

[83]



Filter+ Difference

N

N

P

N

G

SLBP

[140]



Filter+ Difference

N

N

Y

N

G

SOBEL-LBP

[139]



Filter+ Difference

N

N

Y

N

G

DBC

[142]



Difference

N

N

Y

N

G

SSLBP

[118]



Difference

N

N

Y

N

G

GCH

[115]



Color Histogram

N

Y

N

N

C

CCV

[116]



Color Histogram

N

Y

N

N

C

BIC

[120]



Color Histogram

N

Y

N

N

C

cLBP

[105]



Difference

N

N

Y

N

C

mscLBP

[106]



Difference

Y

N

Y

P

C

LCVBP

[108]



Norm+Angular +Difference

N

N

N

N

C

mCENTRIST

[109]



Difference

N

N

Y

N

C

18

Chapter 2

Literature Review

2.6. Problem Formulation On the basis of the literature survey and comparative analysis of various recent state-of-theart methods, we found the following research gaps to be addressed by the research work reported in this thesis: 

Local ordering based descriptor has shown very promising results for region matching under various transformations. The dimension of such descriptor increases exponentially with the number of local neighbors considered [12]. Thus, the matching of two regions becomes infeasible after a certain number of local neighbors. It is identified as one of the problems of recent order based local descriptor to employ a number of local neighbors in the construction of the descriptor while at the same time maintaining reasonable feature length.



Most of the local descriptors have considered either the relation of center with its neighbors or the relation among neighbors [16, 18-20, 32-33, 83]. Very few descriptors have considered both kinds of relationships; however the dimension of such descriptors is very high. We identified it as research gap where descriptors are needed to be investigated that encode both kinds of relationships, i.e. among the local neighbors as well as between the center and its neighbors without increasing the dimension too much.



The color and texture descriptors designed so far [105-106, 108-109] using multichannel information of the color image is suffering from either the robustness or the discriminative capability. Moreover, most of the existing multichannel descriptors have not utilized the cross-channel information to encode the descriptor. It is also pointed out one of the research problems where distinctive and robust descriptors are required to be explored which utilizes the cross-channel information.



The illumination differences are one of the major difficulties in computer vision. There are methods in the literature which are not robust against illumination changes such as [115-116, 120]. Some other existing methods work well for a certain degree of illumination changes, but fail in case of drastic illumination changes [16, 18-19]. We identified the illumination robustness as one of the research gaps which need to be tackled since illumination difference in image is quite obvious due to several environmental factors.



The performance of local descriptors is improved when extracted in conjunction with some other techniques [21, 139-140] for different applications. It is also identified as one of the research gaps where some techniques are required to be examined that can be used with existing local descriptors in order to boost its performance for different computer vision applications.

19

Chapter 2

Literature Review

2.7. Summary In this chapter, we surveyed the image feature descriptors on the basis of their properties such as it is region based or not, it is designed for the grayscale image or color image, it is designed for the natural image or biomedical image or textural image or facial image, it is having some kind of robustness towards different photometric and geometric transformations, it is combined with some pre or post processing algorithms. Literature survey and analytical comparison have been done among recent state-of-the-art descriptors; on the basis of which 5 most challenging research gaps are identified which need to be addressed.

20

Chapter 3 Interleaved Intensity Order Based Local Descriptor The region descriptors using local intensity ordering patterns have become more popular recent years for image matching due to its enhanced discriminative ability. However, the dimension of these descriptors increases rapidly with the slight increase in the number of local neighbors under consideration and becomes unreasonable for image matching due to time constraint. In this chapter, we reduce the dimension of the descriptor and matching time significantly while keeping up the comparable performance by considering the number of neighboring sample points in an interleaved manner. The proposed interleaved order based local descriptor (IOLD) considers the local neighbors of a pixel as a set of interleaved neighbors and constructs the descriptor over each set separately and finally combines them to produce a single pattern. Image matching experiments suggest that the proposed IOLD descriptor outperforms in terms of improved performance and reduced time. This chapter is set up in the following manner: Section 3.1 presents the detailed construction process of the proposed IOLD descriptor; Section 3.2 illustrates the detailed results of image matching experiments over Oxford database, Drastic illumination change database and large image matching database; Section 3.3 discusses the effect of complex illumination and noisy condition; and Section 3.4 summarizes the chapter with observations.

3.1. Proposed Descriptor Construction We present the construction process of proposed IOLD descriptor in this section. First, we explain the steps involved in preprocessing, region detection and normalization, and then we discuss the concept of generating rotation invariant local features, and propose the partitioning of neighboring pixels into interleaved sets. The final pattern is generated by concatenating the LIOP [12] calculated over each set.



The content of this chapter is published in the following research article:  S.R. Dubey, S.K. Singh, and R.K. Singh, “Rotation and Illumination Invariant Interleaved Intensity Order-Based Local Descriptor,” IEEE Transactions on Image Processing, Vol. 23, No. 12, pp. 5323-5333, 2014.

Chapter 3

3.1.1.

Interleaved Intensity Order Based Local Descriptor

Pre-processing, Feature detection and Normalization

The steps involved for pre-processing, feature detection and normalization are similar to [1213, 15, 47, 52, 164]. To remove the noises, a Gaussian smoothing with σp is used initially. To find a position and the neighboring structure of point of interest, Harris-Affine/HessianAffine region detectors are considered. A circular region of size 41×41 pixels with radius 20.5 similar to other approaches [12-13, 15, 52, 165] is generated by normalizing the detected region (Fig. 3.1). Finally, a Gaussian filter with sigma σn is applied again to cope up with the noise introduced by interpolation. 3.1.2.

Rotation Invariant Local Features

In order to facilitate local feature extraction in rotation invariant manner, we considered a local coordinate approach, similar to [12-13, 15, 50]. Fig. 3.2 illustrates such a coordinate system, where O represents the center of the patch, Xi is any pixel within the patch. Then a local rotation invariant coordinate system is generated from O and Xi for the sample pixel Xi centered at Xi by considering positive y-axis as, 

positive y -axis  OX i

(3.1)

{Xi1, Xi2,… , XiN} are the N neighbors of the Xi equally spaced in a circle of radius R centered at Xi. Angle ϕ is defined as, 1 Py ) Px

  tan (

(3.2)

where Px and Py are the co-ordinates of the pixel Xi with respect to the center of the patch O. The coordinates of N neighbors of Xi with respect to Xi are given by, ; where

(3.3)

and angle θ is defined as, (3.4)

We represent the coordinate of

w.r.t.

as

using

and

as, (3.5)

From (3.3) and (3.5),

is written as, (3.6)

22

Chapter 3

Interleaved Intensity Order Based Local Descriptor

Fig. 3.1. Generating a circular patch/region of fixed size by normalizing the detected patch/region of elliptical shape and arbitrary size returned by the affine invariant region detectors such as Harris-Affine/Hessian-Affine.

Fig. 3.2. Rotation invariant coordinate system to compute the location of the local features, O is the center of the patch and Xi is the sample point.

We represent (3.6) using Euler's formula and

in Euler form is given as, (3.7)

where

. The intensity value of any neighboring pixel

is

determined by the gray value at the coordinate

w.r.t.

and it is denoted by

and we

refer all the neighboring intensity values of pixel

as

. Using this coordinate system,

the rotation invariance is obtained inherently. It is easily observable that the position of each Xik w.r.t. Xi remains unchanged under rotation. 3.1.3.

Local Neighbor Partitioning Into Interleaved Sets

The main problem with the earlier intensity order based descriptor [12] is with the rapid increase in the dimension of the descriptor with a slight increase in the number of neighboring points. In this section, we proposed to partition the N neighbors into k interleaved sets to overcome the problem of rapid increase in the descriptor’s dimension with N. Fig. 3.3 illustrates the proposed approach to divide the original neighbors into multiple interleaved sets of local neighbors. Fig. 3.3(a) shows the original N neighbors

23

for

of a

Chapter 3

Interleaved Intensity Order Based Local Descriptor

sample point

which are equally spaced at a distance R from center

. Figs. 3.3(b-d)

represent k interleaved neighboring sets having d neighbors in each set generated from original N neighbors, where d = N/k. The coordinate of uth neighbor of Xi in vth neighboring set is given as, (3.8) where

and

By solving (3.8),

. is represented as, (3.9)

If

, then

is written as, (3.10)

The ranges of

and

is computed from the ranges of u and v as, (3.11) (3.12)

The range of

is computed from (3.11) and (3.12) as, (3.13)

If

and

, then the range of

becomes, (3.14)

Now, the range of with

is the same as the range of , k with 1 and d with N,

used in (3.7). By replacing

in (3.10) becomes (3.15)

k 1 X k 2 X i i

X k 1 i

Y

Y

Xk i

X 2k i

X

X N k 1 i

X N k 2 X N i i

O

i

X2 i X1 i

X

(a) Neighbors of Xi

X k 2 i

X N k 1 i

O

X

Y X

i

i

X2 i

...

Y

Xk i

X 2k i

X

i

X1 i

X

X N k 2 i

O

X

(b) Neighboring set 1 (c) Neighboring set 2

XN i

O

X

(d) Neighboring set k

Fig. 3.3. Considering local neighborhood as a set of different interleaved local neighborhood. The original N neighbors are divided into k neighboring sets having d=N/k neighbors each.

24

Chapter 3

Interleaved Intensity Order Based Local Descriptor

(a) Example Patch

(b) 8 Local Neighbors of the Patch

O

(c) Interleaved Orders

(d) Ordering Patterns

(e) Weighted Ordering Patterns

(f) Final Pattern

Fig. 3.4. Illustration of proposed concept of local neighborhood division into multiple interleaved sets and construction of IOLD pattern using an example with two interleaved sets, (a) example patch for pixel Xi, (b) intensity values of 8 local neighbors of considered patch, (c) partitioning of 8 local neighbors into 2 interleaved sets having 4 local neighbors each and its orders, (d) ordering patterns over each set, (e) weighted ordering patterns, and (f) final pattern for pixel Xi.

From (3.4) and (3.14),

is written as, (3.16)

From (3.7) and (3.16), we conclude that, (3.17) It means, we consider original neighbors without division only if

and

which is

used by LIOP [12] (i.e., LIOP is a special case of our proposed approach). We also observed across the Fig. 3.3 that neighboring points in each neighboring set is also equally spaced in a circle of radius R having center at Xi. This is an advantage of our local neighborhood division and in this way it retains the symmetric information in the pattern. We also illustrated the proposed idea of a local neighbor partitioning into multiple interleaved sets using an example in the Fig. 3.4(a-c). An example patch for any pixel Xi is shown in the Fig. 3.4(a). We have considered 8 local neighbors of Xi here in this example as depicted in the Fig. 3.4(b) and partitioned it into 2 interleaved sets consisting of the 4 local neighbors each. The intensity values of the local neighbors in each set are demonstrated in the Fig. 3.4(c).

25

Chapter 3

3.1.4.

Interleaved Intensity Order Based Local Descriptor

Computing Multiple Local Intensity Order Patterns

In this subsection, for each interleaved set, we construct the corresponding LIOP [12] pattern and then concatenate all the LIOPs to find the final pattern for a particular pixel. Let the intensity values of elements of the neighboring set v (i.e. points that fall in vth interleaved set) are defined as, (3.18) where v = [1, k] and

is the intensity value of point

. Note that the value of k is chosen

in such a way that d should be a positive integer. We calculate a weighted ordering pattern over each neighboring set

using the method introduced in [12] as follows, (3.19)

where

is a weight that encodes the dissimilarity information among the neighboring

sample points and

is the ordering pattern of length d!.

The final interleaved order based local descriptor pattern is computed by concatenating the patterns for all neighboring set. Mathematically, we define the final pattern for pixel Xi as follows, (3.20)

Fig. 3.5. IOLD descriptor construction process, B support regions are used with C sub-regions in each support region. The IOLD descriptor is constructed by accumulating local descriptor in each sub-region from all support regions.

26

Chapter 3

Interleaved Intensity Order Based Local Descriptor

According to [12], the dimension

of

is given by, (3.21)

It means, (3.22) For two interleaved sets of intensity values of Fig. 3.4(c), its orders and ordering patterns computed using [12] are illustrated in the Fig. 3.4(c-e) respectively. Note that, in the example, we have partitioned 8 neighbors into 2 sets of 4 neighbors so the length of each ordering pattern is 24. Only the element corresponding to the index value of the order is set to 1 in the ordering pattern and rest are zeros as illustrated in Fig. 3.4(d). We also calculated the weight for each set using its intensity values and multiplied it with the ordering patterns to get the weighted ordering patterns as depicted in the Fig. 3.4(e). Finally, both weighted ordering patters are concatenated to form the final pattern (see Fig. 3.4(f)) for the pixel Xi of Fig. 3.4(a) using its 8 local neighbors. 3.1.5.

Descriptor Construction

The proposed IOLD descriptor construction workflow is demonstrated by Fig. 3.5. We consider B number of support regions centered at the feature point of minimal support region having the uniform increasing size similar to [13, 15]. Circular rings of size 41×41 are obtained by normalizing each support ring. Each support region is divided into C number of sub-regions based on the global intensity orders of each pixel in that support region similar to [12, 15]. The pattern over a sub-region is extracted by summing patterns of all pixels belonging to that sub-region. We refer jth sub-region of ith support region by descriptor

over sub-region

. Then

is calculated as follows, (3.23)

where

,

and

is given by (3.20). The descriptor over a support region is

computed by concatenating the descriptor computed over each sub-region of that support region. So descriptor over ith support region

becomes, (3.24)

The descriptor extracted over each support region is concatenated to compute the final IOLD descriptor. Mathematically IOLD descriptor is given by, (3.25) From (3.24) and (3.25), it is derived that IOLD descriptor can also be represented as,

27

Chapter 3

Interleaved Intensity Order Based Local Descriptor

10

10

LIOP pattern our pattern with k=2 our pattern with k=3 our pattern with k=4

Length of the Pattern

8

10

6

10

4

10

2

10

0

10

0

2 4 6 8 10 Number of neighboring sample points (N)

12

Fig. 3.6. Comparison between the pattern dimension using LIOP and proposed approach.

(3.26) The dimension of IOLD for

and

using (3.22) is given as, (3.27)

The dimension of LIOP [12] for

is given as, (3.28)

It is shown in Fig. 3.6 that

is much lesser than the N!. It means, (3.29)

By adapting local neighborhood division into several local neighborhoods (i.e. k sets), we reduce the pattern size significantly with comparable performance. The proposed ordering pattern is distinctive because it holds the invariance property for the rotation and illumination difference; moreover the symmetric information around the center pixel makes it more discriminative. It has also been shown that neighborhood division approach greatly reduces the descriptor size (See Fig. 3.6).

3.2. Experiments and Results We compare SIFT [9], HRI-CSLTP [11] and LIOP [12] descriptors with IOLD descriptor to measure the effectiveness and discriminative ability of proposed descriptor. For evaluation purpose three widely used standard datasets namely Oxford image matching dataset [166], Complex illumination change dataset [167] and Large image matching dataset [168] have been used. The Oxford dataset comprises different geometric and photometric transformed image sets with textured and structured scenes. We used the Harris-Affine and Hessian-Affine detectors to detect the interest regions [166]. All the matching experiments are conducted

28

Chapter 3

Interleaved Intensity Order Based Local Descriptor

using a computer system having Intel(R) Core(TM) i5 CPU [email protected] GHz processor with 4 GB of RAM. 3.2.1.

Evaluation Criteria

The criterion introduced in [52] is used for the evaluation of the descriptors in this chapter. Each region of one image is matched with the every region of the second image and according to the number of false and correct matches the precision and recall values are generated. We calculate all matches using nearest neighbor distance ratio (NNDR) matching strategy. According to this scheme, a distance ratio is computed between 1 st and 2nd nearest region. If the value of distance ratio is above a threshold, then only a match is declared with the 1st nearest region. By changing this distance threshold, different precision and recall values are obtained. We use overlap error [47] to determine the ground truth correspondences and the number of correct matches. The target region is transformed over the source region using a homography. The ratio of area of intersection and union between both regions (i.e. original source region and transformed target region) is used to find the overlap error. A match between two regions is acceptable if the overlap error <0.5. If A1 and A2 are the two regions, then overlap error between A1 and A2 is defined as, overlap error ( A1, A2)  1 

A1 A2 A1 A2

(3.30)

We used recall vs1-precision plots to present the matching results. If the number of correctly, falsely, all and ground truth matches are represented by #correct matches, #false matches, #all matches, and #correspondences respectively, then,

recall 

# correct matches # correspondences

1  precision 

# false matches # all matches

(3.31)

We have used 1.0, 1.2 and 6 as the values of σp, σn and R in this similar to [12] for all experiments such that a fair comparison can be made between LIOP and IOLD. 3.2.2.

Performance Evaluation Over Oxford Dataset

We used standard Oxford image matching dataset [166] for the evaluation of IOLD descriptor. IOLD is evaluated and compared for both Harris and Hessian Affine (i.e. haraff and hesaff) detectors. We considered 6 sequences of Oxford dataset namely leuven (illumination change), bikes (image blur), ubc (JPEG compression), boat (rotation and scale), graf (viewpoint change) and wall (viewpoint change). Each sequence consists of the 6 images with increasing degree of the corresponding transformation. For a particular sequence, the

29

Chapter 3

Interleaved Intensity Order Based Local Descriptor

first image is matched with remaining five images (i.e., 5 pairs). Results are depicted in the terms of average performance over each pair for each sequence in Fig. 3.7-3.8 using recall and 1-precision. We compared the average performance and matching time by changing the number of neighboring sets k and the number of elements in neighboring set d. To illustrate the effect of k and d the value of B (#support regions) and C (#partitions in a region) is considered as 1. The value of k is considered as 1 and 2 and the value of d is considered as 4, 5 and 6. When k=1, we denoted it by LIOP because in this case IOLD is equivalent to the LIOP. In the source paper of LIOP, only 3 and 4 neighboring sample points are considered, whereas in this chapter we also experimented with the LIOP using more number of neighboring sample points (N) to test the effect of N over performance and matching time. Five combinations of BCkd( ) (i.e. 1114(24), 1124(48), 1115(120), 1125(240) and 1116(720) respectively) are compared (see Fig. 3.7-3.8) where

is the dimension of the descriptor. Fig.

3.7(a-f) and Fig. 3.8(a-f) shows the results when harraf and hesaff detector is used respectively while Fig. 3.7(g) and Fig. 3.8(g) shows the matching time by each combination of BCkd for each sequence of Oxford dataset using haraff and hesaff detectors respectively. haraff - bikes

haraff - leuven

haraff - ubc 0.9

0.8 LIOP1114(24) IOLD1124(48) LIOP1115(120) IOLD1125(240) LIOP1116(720)

0.6 0.5 0.4 0.3 0

0.1

0.2

0.3

0.4

recall

0.7

0.7

recall

recall

0.8

0.6 0.5

0.3 0

0.5

0.1

0.2

0.4

(a)

(b)

haraff - boat

haraff - graf

0.6 0

0.5

0.6 0.5

0.4 0.3

0.1

0.4 0.3 0.2

0.2 0.8

0.1 0.3

0.4

0.5

0.6

0.7

0.8

0

0.2

0.4

1-precision

1-precision

(d)

(e)

(f)

Time in seconds (haraff)

1-precision

400 300

0.3

(c)

0.2

0.6

0.2

haraff - wall

recall

0.4

0.4

0.1

1-precision

0.5

recall

recall

0.3

1-precision

0.6

0.2

0.7

0.4

1-precision

0 0

0.8

0.6

0.8

LIOP1114 IOLD1124 LIOP1115 IOLD1125 LIOP1116

200 100 0

leuven

bikes

ubc

boat

graf

wall

Data Set

(g) Fig. 3.7. Descriptors performance for kd=14, 24, 15, 25 and 16 when B=1 and C=1 using Harris-Affine region detector over the Oxford dataset.

30

Chapter 3

Interleaved Intensity Order Based Local Descriptor

hesaff - ubc

hesaff - bikes

hesaff - leuven 0.9

0.8

0.9

LIOP1114(24) IOLD1124(48) LIOP1115(120) IOLD1125(240) LIOP1116(720)

0.4 0.2 0

0.1

0.2

0.3

0.4

recall

0.6

recall

recall

0.8 0.7 0.6 0.5

0.7

0.4 0

0.5

0.1

0.2

0.3

0

0.05

1-precision

(b) hesaff - graf

0.15

0.6

0.7

0.5

0.6 0.5 0.4 0.3 0.2

0.4 0.3

0.1

0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.1 0.2

0.3 0.4 0.5

1-precision

1-precision

(d)

(e)

(f)

100

0.25

(c)

1-precision

150

0.2

hesaff - wall

0.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.1

1-precision

recall

recall

(a) hesaff - boat

Time in seconds (hesaff)

recall

1-precision

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0.8

0.6 0.7

LIOP1114 IOLD1124 LIOP1115 IOLD1125 LIOP1116

50

0

leuven

bikes

ubc

boat

graf

wall

Data Set

(g) Fig. 3.8. Descriptors performance for kd=14, 24, 15, 25 and 16 when B=1 and C=1 using Hessian-Affine region detector over the Oxford dataset.

Table 3.1. Matching time reduced by IOLD1125 over LIOP1116 in % over each category of the Oxford dataset Detector Used

Image Category of Oxford Dataset Leuven

bikes

ubc

boat

Graf

Wall

Harris-Affine

70.53

59.66

41.32

35.27

41.23

37.83

Hessian-Affine

69.60

75.34

50.86

43.66

47.71

102.81

The significant improvement in the performance is reported when the value of k has increased to 2 for a particular d (i.e. between IOLD1124 and LIOP1114 and between IOLD1125 and LIOP1115). Consider the case of LIOP1116 and IOLD1125 combinations, the image matching time consumed by earlier one is much higher than later one for each sequence because

, while the performance of the IOLD1125 is either

better or nearly equal to LIOP1116. Table 3.1 depicts the % of matching time reduced by IOLD1125 over LIOP1116 for each set of images of Oxford dataset using both detectors. The

31

Chapter 3

Interleaved Intensity Order Based Local Descriptor

highest improvement in the matching complexity is 102.81% reported for the wall sequences while hesaff detector is used. The plots of the Fig. 3.7-3.8 convey that by increasing only N the dimension

increases more rapidly as compared to the performance, but this problem can

be overcome by dividing N into k interleaved sets. We also tested the proposed approach in conjunction with the multiple support region and region division concept. Fig. 3.9 reports the results and time consumption using haraff detector for B=2, C=2, k=1, 2 and d=3 combinations. We implemented LIOP2213 as the LIOP over multi-support-regions to show the effect of multiple support regions over LIOP and to compare it with IOLD implemented over multiple support regions. Here we reported the average performance and matching time over the full oxford dataset for all combinations using each detector. It is observed that if k is increased the performance of the descriptor is still improved significantly using haraff detector, but the degree of improvement is less whereas the matching time is nearly same (see Fig. 3.9(b)). The results for hesaff detector also follow the same trend as haraff detector. Fig. 3.10 demonstrates the average image matching performance and matching time over the Oxford

Time in seconds

dataset for SIFT, HRI-CSLTP, LIOP1116 and IOLD1125 descriptors using haraff detector.

0.6

recall

0.5 0.4 0.3

LIOP2213(24) IOLD2223(48)

0.2 0.1

0.2

0.4

300 200 100

LIOP2213(24) IOLD2223(48)

0

0.6

haraff

1-precision

detector

(a)

(b)

Fig. 3.9. (a) Matching results and (b) matching time over the Oxford dataset in conjunction with B and C while kd=13 and 23 and BC=22 for Harris-Affine detector.

1,250

Time in seconds

recall

0.7 0.6 0.5

SIFT(128) HRI-CSLTP(384) LIOP1116(720) IOLD1125(240)

0.4 0.3

0.2

0.4

1,000 750 500 250 0

0.6

SIFT(128) HRI-CSLTP(384) LIOP1116(720) IOLD1125(240)

haraff

1-precision

detector

(a)

(b)

Fig. 3.10. Comparison of IOLD with LIOP, SIFT and HRI-CSLTP over the Oxford dataset in terms of (a) ROC and (b) matching time using Harris-Affine detector.

32

Chapter 3

Interleaved Intensity Order Based Local Descriptor

(a)

(b) Fig. 3.11. Images of (a) Corridor and (b) Desktop category of the Complex illumination change dataset. haraff - corridor

hesaff - corridor

haraff - desktop

0.8

0.9

0.6

0.8

0.5

LIOP1114(24) IOLD1124(48) LIOP1115(120) IOLD1125(240) LIOP1116(720)

0.4 0.3 0.2 0

0.2

0.4

recall

recall

0.6

recall

0.7

0.4 0.2

0.6

0.7 0.6 0.5

0

0.1

1-precision

0.2

0.3

0.4

0.5

0

0.05

1-precision

(a) Time in seconds

recall

0.7 0.6 0.5 0.4 0

0.05

0.1

0.15

1-precision

0.2

0.2

0.25

(c)

hesaff - desktop

0.8

0.15

1-precision

(b)

0.9

0.1

LIOP1114(24) IOLD1124(48) LIOP1115(120) IOLD1125(240) LIOP1116(720)

80 60 40 20 0

haraff+corridor

hesaff+corridor

haraff+desktop

hesaff+desktop

detector+dataset

(d)

(e)

Fig. 3.12. (a-d) Descriptors performance and (e) matching time for kd=14, 24, 15, 25 and 16 when BC=11 using both region detectors over the Complex illumination change dataset. We considered BC=11 for both LIOP and IOLD such that a fair comparison can be made in view of introduced concept. It is evident from this figure that the performance of IOLD1125 is better than the remaining descriptors for haraff detector and it is 49.40% and 15.53% faster than LIOP1116 and HRI-CSLTP respectively. 3.2.3.

Performance Evaluation Over Complex Illumination Change Dataset

We used a Complex illumination change dataset [167] in order to evaluate proposed descriptor for large illumination changes. Two image sets corridor and desktop of 6 images, each having drastic illumination differences are used in this chapter as shown in Fig. 3.11. We synthesized the 6th image of the corridor (i.e., corridor 6) from the 1st image of corridor

33

Chapter 3

Interleaved Intensity Order Based Local Descriptor

having largest illumination difference from the 1st image of the corridor. The 5th and 6th image of the desktop is square and square root of the 4th image of the desktop respectively. Fig. 3.12 demonstrates the descriptors performance and matching time for kd=14, 24, 15, 25 and 16 when BC=11 using both region detectors over the Complex illumination change dataset. It is observed here also that the performance of IOLD descriptor with k=2 is improved significantly as compared to the LIOP descriptor with k=1 for a particular d. The results of IOLD1125 (dim: 240) is better than the results of LIOP1116 (dim: 720) whereas the matching time with LIOP1116 is much higher than the matching time with LIOP1125. In Fig. 3.13, we compared the IOLD descriptor with SIFT, HRI-CSLTP and LIOP descriptors using haraff detector over full Complex illumination change dataset in terms of average precision, average recall and matching time. Both LIOP and IOLD descriptors outperform SIFT and HRICSLTP descriptors because LIOP and IOLD descriptors are inherently invariant to the monotonic intensity change. The performance of IOLD is still comparable with the LIOP while maintaining the low dimensional feature description and the matching time with IOLD is significantly lower than the matching time with LIOP. It is observed across the plots (Fig. 3.12-3.13) that IOLD is able to maintain better results with the low dimensional feature descriptor under the drastic illumination difference scenario. 3.2.4.

Performance Evaluation Over Large Image Matching Dataset

To demonstrate the performance of proposed descriptor over the large image matching dataset, we considered 190 pair of images which consists of 84, 63 and 43 pairs of rotation, illumination and zoom category respectively [168]. The image pairs already used in Oxford dataset are excluded in this experiment. The average results and matching time over the large image matching dataset using SIFT, HRI-CSLTP, LIOP1116 and IOLD1125 are shown in Fig. 3.14. IOLD outperforms other descriptors using Harris-Affine region detector (see Fig. 3.14(a)). We observed that, in the case of Hessian-Affine region detector also, the performance of IOLD descriptor is comparable. The matching using IOLD is faster than LIOP and HRI-CSLTP by 1.43 and 1.13 times respectively, and slower than SIFT by 0.89 times using hesaff detector and similar speedup also gained using haraff detector as shown in Fig. 3.14(b). The results and matching time suggest that the IOLD descriptor match the images more precisely and accurately with reasonable speed.

3.3. Observations and Discussions In this section, we present some observations and discussions about the performance of IOLD descriptor under drastic illumination change and noisy conditions. In the last part of this section, we analyze the matching time in terms of the number of matched key-points.

34

Chapter 3

Interleaved Intensity Order Based Local Descriptor

Time in seconds

recall

0.8 0.6 SIFT(128) HRI-CSLTP(384) LIOP1116(720) IOLD1125(240)

0.4 0.2 0

0.2

100 80 60

SIFT(128) HRI-CSLTP(384) LIOP1116(720) IOLD1125(240)

40 20 0

0.4

haraff

1-precision

Detector

(a)

(b)

Fig. 3.13. Comparison of IOLD with LIOP, SIFT and HRI-CSLTP over the Complex illumination change dataset in terms of (a) recall-precision and (b) matching time using

Time in seconds

Harris-Affine detector.

recall

0.8 0.7 SIFT(128) HRI-CSLTP(384) LIOP(720) IOLD(240)

0.6 0.5 0

0.1

0.2

0.3

2000 1500 SIFT(128) HRI-CSLTP(384) LIOP1116(720) IOLD1125(240)

1000 500 0

0.4

haraff

1-precision

detector

(a)

(b)

Fig. 3.14. Comparison of IOLD with LIOP, SIFT and HRI-CSLTP over the Large image matching dataset in terms of (a) recall-precision and (b) matching time using Harris-Affine detector.

(b)

0.4

SIFT LIOP IOLD

Difference Normal Distribution

Descriptors Difference of 2 Patches

(a) 0.6

0.2 0 -0.2 -0.4 -0.6

20

40

60

80

100

8

6 5 4 3 2 1 0

120

Descriptor Normalized Bin Number

SIFT LIOP IOLD

7

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Standard Deviation

(c)

(d)

Fig. 3.15. Visualization of the performance of SIFT, LIOP and IOLD under illumination change, (a) a patch from the 1st image of the corridor, (b) same patch as of (a) but from the 4th image of the corridor,(c) the difference between the pattern of both patches for each descriptor, and (d) normal distribution of dissimilarities (c) at zero mean (µ=0).

35

Chapter 3

Interleaved Intensity Order Based Local Descriptor

(a) Original image

(b) σ= 0.02

(d) σ = 0.06

(e) σ= 0.08

(c) σ=0.04

(f) σ= 0.1

Similarity (Histogram Intersection)

1 0.95 0.9 LBP CSLBP CSLTP LIOP IOLD

0.85 0.8 0.75 0.7 0.65 0.6 0.55

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 Images with Gaussian noise (variance)

(g) Effect of noise Fig. 3.16. Similarity between histograms of original and noised frames (effect of Gaussian noise), the first desktop frame is the original frame and the remaining frames are obtained after adding Gaussian noise in original frame with zero mean and σ variance. 3.3.1.

Effect of Drastic Illumination Change Over Descriptor

To visualize the effect of the proposed approach on drastic illumination change, we considered a patch from the 1 st image of the corridor and also same patch from the 4 th image of the corridor. SIFT, LIOP1114 and IOLD1125 are computed from both the patches. LIOP and IOLD are quantized to the size of the SIFT such that the dimension of patterns becomes same for each descriptor. The difference between the patterns of both patches is computed for each descriptor. Fig. 3.15 presents the patches and similarity plot, two corresponding patches are shown in (a) and (b), and the similarity plot for the pattern difference is shown in (c). We observe that the global peak values (both +ive and -ive) is lowest for IOLD and highest for the SIFT descriptor. Another important factor is the overall deviation from zero in both directions (i.e. +ive and -ive) for each bin which is lowest for IOLD. We compared the normal distribution in (d) at zero mean (µ=0). It is observed that the plot for IOLD is more tend to mean value and also have the highest peak value. From Fig. 3.15(d), it is concluded that the pattern of both patches are more similar using IOLD. Thus, we believe that incorporating order based approach proposed descriptor becomes more robust towards monotonic intensity change and provides more similar pattern for the similar patches under large illumination differences.

36

Chapter 3

3.3.2.

Interleaved Intensity Order Based Local Descriptor

Effect of Noise Over Descriptor

While good performance is achieved by LBP [16], CS-LBP [17], CS-LTP [11] and LIOP [12] operators (i.e. local order based methods), these methods are sensitive to noise. We synthesized ten noisy frames from a desktop frame by adding Gaussian noise with zero mean and σ variance (σ = [0.01, 0.1] at an interval of 0.01) to illustrate the effect of noise over descriptors. We compared LBP, CS-LBP, CS-LTP, LIOP1116 and IOLD1125 (i.e. all methods based on the local ordering) using these noisy frames. Fig. 3.16(a) depicts the original desktop frame used in this experiment and Fig. 3.16(b-f) shows some noisy frames obtained after adding the Gaussian noise in the original frame. The original frame’s descriptor is compared with the same of each noisy. We used histogram intersection method [169] to compare two histograms. If the value of the similarity is tending towards one, it means that histograms are similar (i.e. that method is robust to noise). The performance of each method is shown in Fig. 3.16(g). LIOP and IOLD are having less sensitivity to the noise than LBP, CS-LBP and CS-LTP. LIOP are more robust than CS-LTP because it is the generalization of CS-LTP but IOLD are more robust to noise than LIOP because LIOP is a special case of proposed descriptor. IOLD is also consistent with the degree of the noise.

Matching time (in Seconds)

175 IOLD LIOP

150 125 100 75 50 25 0

0

1

2

3

4

5

6

Total number of key-points matched

7

8

9 x 10

6

Fig. 3.17. Matching time vs number of matched key-points using IOLD descriptor and LIOP descriptor. 3.3.3.

Matching Time Analysis using Number of Matched Key-points

We have shown in the previous section that the dimension of the proposed IOLD descriptor is significantly lower than LIOP descriptor [12] whereas the performance of IOLD descriptor is either improved or nearly same than LIOP descriptor. Here, we analyze the matching time in terms of the number of matched key-points using IOLD descriptor and the LIOP descriptor. Consider LIOP is constructed from the 6 local neighbors (i.e. LIOP1116 with dimension 720) and IOLD is constructed from the 10 local neighbors and two neighboring sets (i.e. IOLD1125 with dimension 240). We calculated the matching time for each pair of the images

37

Chapter 3

Interleaved Intensity Order Based Local Descriptor

of Oxford image matching dataset [166]. The total number of image pairs is 30 in the Oxford dataset, but we matched each pair using both Harris-Affine and Hessian-Affine detectors so the total number of image pair comparison is 60. If the two images of any pair are having and

number of key-points returned by any particular detector then total number of

matched key-points for that pair will be

. Fig. 3.17 presents the matching time vs

total number of matched key-points for both LIOP and IOLD descriptor. It is observed that the matching time for IOLD descriptor is always less than the LIOP descriptor. In other words the proposed IOLD descriptor is more efficient as compared to the LIOP descriptor in terms of matching time. Moreover, the degree of improvement increases with number of matched key-points. So, it is clearly deduced that the proposed approach is more efficient when the images are containing more details and of course more number of extracted key-points. The experiment shows that introduced interleaved order based local descriptor (IOLD) is better than other order based descriptors such as LIOP and HRI-CSLTP in terms of both performance and time complexity. The performance of proposed descriptor is better under each geometric and photometric transformation considered in this chapter (i.e. scale change, JPEG compression, viewpoint change, image rotation, image blur and illumination difference). IOLD descriptor also performs very well under drastic illumination differences. We also compared proposed descriptor under noisy condition and found that IOLD is less prone to noise as compared to LBP, CSLBP, CSLTP and LIOP. The multiple intensity orders computed from different neighboring sets provide the discriminative ability to proposed descriptor and make it more robust for different image transformations. The results obtained using IOLD descriptor points that IOLD outperforms other prominent descriptors proposed recently.

3.4. Summary To overcome the problem of rapid growth in the descriptor’s dimension with a slight increase in the number of neighboring sample points, an interleaved neighbor division approach is presented in this chapter. An interleaved order based local descriptor (IOLD) is introduced by computing the ordering patterns over multiple neighboring sets. IOLD incorporates the advantage of local features extracted in a rotation invariant manner. It computes the local intensity orders to achieve the invariance property towards monotonic intensity change. The robustness and discriminating ability of proposed descriptor is increased by using more than one interleaved intensity orders derived from multiple neighboring sets of the neighboring sample points. Multiple support regions and region partitioning into sub-regions further improve our descriptor. By incorporating all these, proposed descriptor becomes more robust and invariant towards various geometric and photometric image transformations. IOLD

38

Chapter 3

Interleaved Intensity Order Based Local Descriptor

greatly reduces the matching time on an average with a factor of 49.40% while having the comparable performance. Results obtained on the image matching experiments suggest that the proposed IOLD descriptor is more time efficient and able to discriminate the images more robustly. In the presence of noise also IOLD is performing more robustly. IOLD outperforms other state-of-the-art descriptors under different imaging conditions.

39

Chapter 4 Local Gray Scale Image Descriptors for Biomedical Image Retrieval The biomedical image retrieval plays an important role in medical diagnosis where a physician can retrieve most similar images from template images against a query image of a particular patient, which in turns enhances the decision making by the experts. The recently proposed feature descriptors for biomedical image retrieval mainly suffer between the “less discriminative power” and “high dimensionality” [32-33, 83, 85]. In order to match two biomedical images, generally gray scale image descriptors are useful. Thus, we investigated following four new local image descriptors in this chapter: Local Diagonal Extrema Pattern (LDEP), Local Bit-plane Decoded Pattern (LBDP), Local Bit-plane Dissimilarity Pattern (LBDISP) and Local Wavelet Pattern (LWP). The main characteristics of the proposed descriptors are the improved discriminative ability and lower dimensional features. The LDEP has exploited the concept first-order local diagonal derivatives. The LBDP and LBDISP have used the characteristics at local bit-plane level. The LWP has utilized the concept of 1-D Haar wavelet in LBP framework. These descriptors are validated for biomedical image retrieval through experiments in this chapter. This remaining of this chapter is integrated in following manner. Section 4.1 presents the proposed descriptors for biomedical image retrieval. Section 4.2 describes the distance measures and evaluation criteria for image retrieval. In Section 4.3, biomedical image retrieval experiments are performed over one Magnetic Resonance Imaging (MRI) and four Computer Tomography (CT) image databases; and finally the chapter is summarized in Section 4.4.



The content of this chapter is published in the following research articles:  S.R. Dubey, S.K. Singh, and R.K. Singh, “Local Wavelet Pattern: A New Feature Descriptor for Image Retrieval in Medical CT Databases,” IEEE Transactions on Image Processing, Vol. 24, No. 12, pp. 5892-5903, 2015.  S.R. Dubey, S.K. Singh, and R.K. Singh, “Local Bit-plane Decoded Pattern: A Novel Feature Descriptor for Biomedical Image Retrieval,” IEEE Journal of Biomedical and Health Informatics, Vol. 20, No. 4, pp. 1139-1147, 2016.  S.R. Dubey, S.K. Singh, and R.K. Singh, “Local diagonal extrema pattern: a new and efficient feature descriptor for CT image retrieval,” IEEE Signal Processing Letters, Vol. 22, No. 9, pp. 1215-1219, 2015.  S.R. Dubey, S.K. Singh, and R.K. Singh, “A Novel Local Bit-plane Dissimilarity Pattern for CT Image Retrieval,” IET Electronics Letters, Vol. 52, No. 15, pp. 1290-1292, 2016.

Chapter 4

Local Gray Scale Image Descriptors

4.1. Local Gray Scale Image Descriptors for Biomedical Image Retrieval In this section, we present the framework for biomedical image retrieval using local diagonal extrema pattern (LDEP), local bit-plane decoded pattern (LBDP), local bit-plane dissimilarity pattern (LBDISP) and local wavelet pattern (LWP). Fig. 4.1 shows the proposed framework using block diagrams. In Fig. 4.1, first of all, the patterns are computed over the image, then the feature vectors are formed from these patterns which are finally used for the similarity computation between two images and retrieval. In the rest of this section, the above mentioned methods are described. In order to describe the pattern extraction process, let a grayscale biomedical image of dimension at coordinate

. The

is a particular pixel of image

in Cartesian coordinate system having origin at the left and upper corner

as shown in Fig. 4.2 and the intensity value at pixel

4.1.1.

is

is

.

Local Diagonal Extrema Pattern (LDEP)

In this sub-section, we introduce a new and efficient image feature descriptor using local diagonal extrema pattern (LDEP) from the center pixel and its local diagonal neighbors. The relationship of local diagonal extremes (i.e. maxima and minima) with the center pixel is used to encode the LDEP descriptor. The computation process is illustrated using an example in Fig. 4.3. Fig. 4.3(a) shows the position of four diagonal neighbors with intensity values ,

and

of center with intensity value

Fig. 4.3(b). Let

and

,

and the example considered is depicted in

are the position of maximum and minimum diagonal

neighbors. The values of maximum and minimum diagonal neighbors (i.e.

and

respectively) as well as center pixel are extracted in Fig. 4.3(c). The values of

and

are shown in Fig. 4.3(d). The values and indexes of the local diagonal extremes are computed which will be used with the central pixel to form the local diagonal extrema pattern.

Image Database

Query Image

Local Diagonal Extrema Pattern

Feature Vector Computation

Similarity Measurement

Image Retrieval

Local Bit-plane Decoded Pattern

Feature Vector Computation

Similarity Measurement

Image Retrieval

Local Bit-plane Dissimilarity Pattern

Feature Vector Computation

Similarity Measurement

Image Retrieval

Local Wavelet Pattern

Feature Vector Computation

Similarity Measurement

Image Retrieval

Fig. 4.1. The biomedical image retrieval using local descriptors.

42

Chapter 4

Local Gray Scale Image Descriptors

(b) (a)

Origin

Fig. 4.2. (a) The axis of the image in Cartesian coordinate system, and (b) the origin of the axis

is at the upper and left corner of the image and

is the pixel of image at coordinate

(a)

.

(b) An Example

(c)

(d)

(e)

(f)

(g) Final Local Diagonal Extrema Pattern Fig. 4.3. The computation of

pattern for center pixel

(intensity value

) using

the flow diagram with an example.

We represent the local diagonal extrema pattern (LDEP) for

with a binary pattern

and generated as follows,

where

is the length of the

pattern and

is the

element of the

and given using following formulae,

where

is a variable to denote the extrema-center relationship factor. The

pattern is

all 0’s with two 1’s and the positions of the 1’s are determined on the basis of the indices of the maxima, minima and the relationship of the extremes with the center pixel (i.e. and ).

43

,

Chapter 4

Local Gray Scale Image Descriptors

The extrema-center relationship factor quantifies the relationship of center with the extremes and defined as,

Note that the dimension of the pattern

is the maximum possible value of

turns depends upon the maximum possible values of then the maximum possible value of

and . When

which in and

is 24, it means that the dimension

,

of the

is 24. The low dimension of the proposed pattern is the main benefit of our method. The value of

for this example is demonstrated in the Fig. 4.3(e) and Fig. 4.3(f) shows the values of and

. Finally, the LDEP pattern is depicted in Fig. 4.3(g) for

this example. Only two elements of the pattern are set to 1 and the rest are zeros. The computed local diagonal extrema pattern for the pixel diagonal extrema pattern (LDEP) over image

where

4.1.2.

is the

element of

is

. The local

is given as follows,

and given as follows,

Local Bit-plane Decoded Pattern (LBDP)

In this sub-section, we proposed the local bit-plane decoded pattern (LBDP) which consists of the following main components: local bit-plane decomposition, local bit-plane transformation and local bit-plane decoded pattern computation. In the rest of this sub-section, we describe each component in detail.

a) Local Neighborhood Extraction To facilitate the computation of LBDP, first we required to extract the local neighborhood of any given pixel. We extract the local neighbors in a manner such that all the extracted neighbors will be equally spaced at a particular radius from the center pixel similar to [16, 33, 79]. We represented circle of radius element of integer and

to denote the set of

having center at ) is denoted as

local neighbors of

. As depicted in Fig. 4.4, the having intensity value

equally distributed at a neighbor of where

(i.e.

is a positive

. It should be noted that we can consider only those pixels as a central

44

Chapter 4

Local Gray Scale Image Descriptors

pixel for which all coordinates of

local neighbors are within the dimension of the image

. The

with respect to the origin of the image in Cartesian coordinate (

)

system is given as,

where of

,

,

in the polar coordinate system w.r.t. the

Fig. 4.4. The local neighbors (i.e.

for

and

are the coordinates

and computed as,

) of a center pixel (

) in polar

The local bit-plane decomposition step is performed to separate each bit-plane

of the local

coordinate system.

b) Local Bit-plane Decomposition

neighboring structure of the pixel the bit-depth of image

, where

is the positive integer with

is

. The local bit-plane decomposition step yields the binary values in

each bit-plane and applied over only neighbors center pixel

and

. Note that this step is not applied over the

. It would be more useful and interesting if each bit-plane is represented in a

cylindrical coordinate system because this coordinate system can be easily demonstrated with the help of polar coordinate system. We represent the elements of the bit-planes in the cylindrical coordinate system ( , , ) as shown in Fig. 4.5 where and the spatial coordinate in the polar coordinate system and number. The cylinder is composed of the represents the original

can be used to find

can be used to find the bit-plane

stacked horizontal slices, where the base slice

pixels (i.e. one center pixel with its

45

neighbors).

Chapter 4

Local Gray Scale Image Descriptors

·

·

·

·

·

·

·

·

·

·

·

·

(a)

(b)

Fig. 4.5. (a) Cylindrical coordinate system axis, (b) the local bit-plane decomposition. The

cylinder can be thought of the

horizontal slices. The base slice of the cylinder is

composed of the original center pixel and its neighbors with the center pixel at the origin. The remaining

slices correspond to the

bit-planes of the local neighbors of base slice. The

slice from the base corresponds to the

The original center pixel represented by remaining

neighbor (i.e.

becomes the origin of the cylindrical coordinate system and

because

slices with

. The

) is changed to

bit-plane for

, , and

coordinate of the

bit-planes of

are contained by the

bit-plane at the top. The notation in polar coordinate of in the cylindrical coordinate because they are

contained by the base of the cylinder. Let in

bit-plane of the base slice.

represents the element corresponding to

and

and its binary value is denoted by

element (i.e. for the

given as,

46

neighbor of

) in

. The

bit-plane is

Chapter 4

Local Gray Scale Image Descriptors

(a)

(b)

Fig. 4.6. An example of bit-plane decomposition into

with

bit-planes, (a) A sample pattern

, (b) The decomposed bit-planes of the neighbors; the ‘red’ and ‘green’

and

circles represent ‘1’ and ‘0’ respectively.

(a) Sample Image

(b) LBP map

(f)

(g)

(c)

(d)

(e)

(h)

(i)

(j)

Fig. 4.7. Example of local bit-plane transformed values map for each bit-plane, (a) sample

image, (b) LBP map over sample image, (c-j) local bit-plane transformed value maps for each bit-plane.

The binary value intensity value of

of the (i.e.

neighbor of

in

bit-plane is defined from

) as,

The local bit-plane decomposition is illustrated using an example pattern in Fig. 4.6 for ,

and

. The example pattern shown in Fig. 4.6(a) comprises of a center

pixel and its 8 neighbors. The 8 bit-planes consisting of the binary values for each neighbor are depicted in Fig. 4.6(b) with the help of circles of two colors. The ‘red’ and ‘green’ color circles stand for ‘1’ and ‘0’ respectively.

47

Chapter 4

Local Gray Scale Image Descriptors

c) Local Bit-plane Transformation The local binary pattern [16] uses the raw intensity values of neighboring pixels without any transformation which became motivation for us to introduce a concept of local bit-plane transformation which captures the local information in each bit-plane separately with lower and higher bit-plane capture the fine and coarse details respectively. The local bit-plane transformed values for each bit-plane are generated using this transformation. The local bitplane transformed value for neighbor in

bit-plane (i.e.

bit-plane is defined using

) using the decomposed value of each as follows,

The local bit-plane transformed values for the local bit-planes of Fig. 4.6(b) are computed as 244, 239, 98, 8, 33, 164, 71 and 0 for bit-plane number

to

respectively. We

considered an example image from OASIS-MRI database in Fig. 4.7(a) to show the characteristics of introduced local bit-plane transformation. Fig. 4.7(b) portrayed the LBP map [16] computed over the considered image. The local bit-plane transformed values maps is displayed in Fig. 4.7(c-j) for bit-plane number generated for

and

to

respectively; these maps are

. It is observed from Fig. 4.7 that LBP is not able to

encode very fine details of the image which is encoded by the lower bit-planes of the local bit-plane transformation scheme. It is also deduced from Fig. 4.7 that the lower bit-plane maps capture finer details (i.e. edges, corners, etc.) whereas the higher bit-plane maps catch the more coarse details (i.e. ambience, essence, etc.), which provide more distinctive ability to the description of the image.

d) Local Bit-plane Decoded Pattern The

pattern for pixel

concatenating the

of image

using

values for each

neighbors at radius

is given by

as follows,

where

is a binary LBDP pattern value computed over

bit-plane as,

where

and

and defined as,

is obtained after range matching of

48

Chapter 4

Note that the range of

Local Gray Scale Image Descriptors

is between

. We need to match the range of range matching of

with

and

whereas the range of

with the range of

by dividing

is between

. We obtained

with the range matching factor

and

after the . For

the case when the number of local neighbors is equal to the number of bit-planes (i.e. the range matching is not needed. Thus,

),

binary pattern is generated of length . The

local bit-plane transformed difference values are computed as 219, 214, 73, -17, 8, 139, 46, and -25 for bit-plane number

to

respectively by subtracting center pixel’s intensity

value 25 from the local bit-plane transformed values 244, 239, 98, 8, 33, 164, 71 and 0 respectively generated previously. According to the sign of the computed local bit-plane transformed difference values, the

pattern of center pixel of the example considered in

Fig. 4.6 is ‘11101110’.

e) Local Bit-plane Decoded Feature Vector We generated the local bit-plane decoded patterns ( proposed local bit-plane transformation scheme. The

) in previous subsection using values are computed in the binary

form for a particular pixel which needs to be converted into the form of a histogram. The local bit-plane decoded feature vector ( ) of every pixel of image

dimension is calculated using the

. We find the histogram

neighbors evenly distributed at a circle of radius

s of

using the following equation when are used to generate the patterns,

where

is

.

4.1.3.

Local Bit-plane Dissimilarity Pattern (LBDISP)

The existing local descriptors have at least one of the two main downsides: 1) relatively less discriminative and 2) high dimensionality. In order to overcome these problems, we have designed a more discriminative and low dimensional Local Bit-plane Dissimilarity Pattern (LBDISP) based descriptor in this sub-section for biomedical image retrieval. The introduced method first extracts the local neighborhood and decomposes it the into bit-planes (similar to the LBDP, see subsections 4.2.2(a-b)), then computes the dissimilarity map by computing the dissimilarity between the center and its neighbors at each bit-plane and finally finds the Local Bit-plane Dissimilarity Pattern (LBDISP) by exploiting the relationship of center with dissimilarity map. The consideration of dissimilarity at bit-plane level enhances the discriminative power of the LBDISP descriptor. The dimension of LBDISP depends upon the

49

Chapter 4

Local Gray Scale Image Descriptors

number of neighbors similar to the LBP [16] which is relatively low as compared to the most of recently proposed descriptors. The remaining of this section elaborates the local bit-plane dissimilarity map generation, LBDISP computation, and feature descriptor calculation steps in detail. In the local bit-plane decomposition step, the center value was not decomposed in LBDP, whereas both the center value and neighboring values are decomposed into bit-planes here similar to (4.10) and (4.11).

a) Local Bit-plane Dissimilarity Encoding The local bit-plane dissimilarity encoding step is proposed in this thesis to utilize the local information at the bit-level in order to enhance the discriminative ability of the proposed descriptor. It basically encodes the difference between the center and its neighbors at each bitplane. One characteristic of this step is that its input and output both are in binary form. Let, is the binary value obtained after local bit-plane dissimilarity encoding at for a

neighbor of the center pixel. The computation of

binary value in

from

bit-plane corresponding to the center and

and

bit-plane (i.e. the

neighbor of center

respectively after local bit-plane decomposition) is defined as follows:

Note that this step does not affect the binary values of say that

is equal to the

for

(i.e.

). In other words, we can

.

b) Local Bit-plane Dissimilarity Map The local bit-plane dissimilarity map basically consists of the processed values of center pixel and its

neighbors

for

. This map is generated from the binary values of

the center as well as its neighbors obtained over the

bit-planes in the previous dissimilarity

encoding step. The input for this step is binary values and output is decimal values in depth. The

bit-

neighboring values of the center pixel in local bit-plane dissimilarity map is

represented with

and computed as:

and the center value of the local bit-plane dissimilarity map denoted as actual center value i.e.

is equal to the

. The values of the center and its neighbors of the local bit-plane

dissimilarity map will be utilized by the next step to compute the LBDISP.

50

Chapter 4

Local Gray Scale Image Descriptors

c) Local Bit-plane Dissimilarity Pattern The local bit-planes dissimilarity pattern (LBDISP) corresponding to the center pixel denoted by

where

. The

is the

center and from

is a binary pattern of length

element of

and given as follows,

and represents the relationship between the

neighbor of the dissimilarity map. Mathematically,

and

is

is computed

as follows,

d) LBDISP Feature Descriptor The computation process of LBDISP is discussed previously for a particular pixel at coordinate (

) of any gray scaled image

of size

(i.e.

rows and

columns).

The LBDISP feature descriptor (i.e. histogram of LBDISPs) is computed over the whole image. The LBDISP feature histogram is calculated as,

where

is

.

The LBDISP maps for

to

are displayed in second to eighth columns

respectively, for an image of the first column in Fig. 4.8. The image is taken from the OASISMRI database. It is noticed that the higher bit-plane (i.e. MSB) encodes the low frequency information, whereas, the lower bit-plane (i.e. LSB) encodes the high frequency information. The presence of more coarse to more detailed information in different bit-planes boosts the discriminative ability of the proposed descriptor.

Fig. 4.8. The LBDISP maps are plotted in

plane for

to

column corresponding to the

bit-

respectively, for the input image in the 1st column. It can be seen

that the higher bit-plane (i.e. MSB) results more coarse information, whereas, the lower bitplane (i.e. LSB) results more detailed information. The original input image is taken from the OASIS-MRI database.

51

Chapter 4

4.1.4.

Local Gray Scale Image Descriptors

Local Wavelet Pattern (LWP)

The local feature descriptions presented through the published literature have utilized the relationship of a referenced pixel with its neighboring pixel [16, 18]. Some approaches also tried to utilize the relationship among the neighboring pixels with success in some extent but at the expense of high dimensionality which are generally more time consuming for image retrieval [33]. This is the motivation for us to propose a new local wavelet pattern (LWP) based feature vector of low dimension. The LWP uses both relationships, i.e. among local neighbors and also between the center pixel and its local neighbors to construct the descriptor. It encodes the relationship among the local neighbors using local wavelet decomposition method and finally produces a binary pattern by comparing these decomposed values with the transformed center pixel value. The outperformance and efficiency of the LWP have been made confirmed through biomedical image retrieval experiments. Local neighborhood extraction, local wavelet decomposition, center pixel transformation, local wavelet pattern generation and feature vector generation are the steps to find the LWP based descriptor. The local neighborhood extraction approach adopted here is same as used by LBDP in subsection 4.2.2(a) (see Fig. 4.4 also for the local neighbors of any center pixel). The remaining steps are described in this subsection to describe the LWP based descriptor.

a) Local Wavelet Decomposition The

neighbor at radius

of any pixel

is

having intensity value

,

where

is the total number of neighbors. Now, we use these intensity values to encode the

relationship existing among the neighbors of the center pixel using the concept of local wavelet decomposition. We applied 1-D Haar wavelet decomposition to transform the into

for

, where is a positive integer number (i.e.

the level of transformation. Note that the value of , where by

should be chosen in such a way that

is a function to find the remainder when

, and the maximum possible value of (i.e.

and and

is the intensity value of the value at

level. The computation of

. with

be the two sets of neighbor and at

is divided

) depends upon the total number of

neighbors ( ) under consideration and satisfies the following criteria: Let

) used to represent

level from

-dimensional vectors, where is the

wavelet decomposed

by using the basis function

of the 1-D Haar wavelet at that level is shown in Fig. 4.9. Mathematically, this is defined as follows,

52

Chapter 4

where

where

Local Gray Scale Image Descriptors

is the basis function at the

is the unit matrix of size

level for

,

values and given as,

is the basis function at

is the 1-D Haar wavelet square basis matrix of size The values of the elements of matrix

for

level and

level transformation.

depend upon the level of transformation (i.e. ) and

defined as follows,

where

and ) and

the ,

, and

are the row and column number of the matrix are the different conditions for

and

and defined on the basis of

as follows,

From (4.22) and (4.23),

After replacing

(i.e.

can be re-written in the following form,

with

from (4.22) in (4.26) and simplifying (4.26),

can be defined recursively in following manner,

53

Chapter 4

Local Gray Scale Image Descriptors

….

….







Fig. 4.9. The transformation of an

-dimensional vector

to another

-dimensional vector

level using 1-D Haar wavelet.

where

is an

at the

evel

evel

evel

evel

at



-dimensional vector obtained after 1-D Haar wavelet decomposition of

level. Note that in (4.22) the basis matrix and directly applied with input values

also be obtained recursively from using the basis matrix of

level (i.e.

is computed recursively from

to obtain the

. Whereas,

without recursive computation of

can instead by

) only as deduced in (4.27).

b) Center Pixel Transformation We encoded the relationship among the neighboring values

of the center value

using

local wavelet decomposition. Finally, it is required to encode the relationship exist between and

. It is obvious that, we can’t compare

the values of

directly because the range of

is now changed. To cope with this problem, we propose a center pixel

transformation scheme which transforms the range of the

with

into an array

is the same as the range of the

following equation,

54

of length for

at level such that according to the

Chapter 4

Local Gray Scale Image Descriptors

(4.28)

where

is the number of gray levels in image

and equal to the

.

c) Local Wavelet Pattern Here,

and

values are used to encode the relationship between

and

(i.e.

between the center pixel and its neighbors) into binary form. We termed this relation as a local wavelet pattern (LWP) which is basically a binary array of the each neighbor of

where

values corresponding to

and defined as follows,

is a binary LWP value for the

radius are used for wavelet decomposition at

neighbor of

when

neighbors at

level and mathematically computed by the

following equation,

We define the local wavelet pattern map (i.e.

) for

using its local wavelet patterns

defined in (4.29) as follows,

Note that the values of

are dependent upon the number of neighbors ( ) considered to

form the pattern and it is in between the [ ,

and

. In other words the range of

is

].

d) LWP Feature Vector The local wavelet pattern feature vector ( ) of of every pixels of image

dimension is calculated using the

. We find the LWP feature vector

wavelet decomposition when

neighbors at radius

using the following equation,

55

at the

level of local

are considered to construct the

Chapter 4

where,

Local Gray Scale Image Descriptors

are the dimension of the image

(i.e. total number of pixel) and

is

.

4.2. Similarity Measurement and Evaluation Criteria 4.2.1.

Similarity Measurement

We use the distances discussed in the section 1.2.3 for the similarity measurement between the descriptors of two images in this chapter. 4.2.2.

Evaluation Criteria

Each image of the database is considered as a query image and matched with all images of that database. The top

matching images are retrieved using different similarity measure.

The approach matches those retrieved images correctly, which are retrieved from the same category as of the query image. In order to analyze the performance of the proposed descriptors, the average retrieval precision (ARP) and average retrieval rate (ARR) are calculated by finding the mean of average precision (AP) and average recall (AR) respectively for all categories. AP and AR for a particular category is computed by finding the average of precisions and recalls respectively by turning each image of that category as the query image. The precision and recall for any query image (

) is given as follows,

The F-score or F-measure metric is given as follows,

In order to find out the average normalized modified retrieval rank (ANMRR) metric, the algorithm given in [170] is followed and represented in the percentage as a function of number of retrieved images. The higher value of ARP, ARR and F-score represents the better retrieval performance and vice-versa, whereas the lower value of ANMRR means the better retrieval result and vice-versa.

4.3. Results and Discussion This section is devoted for biomedical image retrieval experiments and comparisons. One MRI and Four CT image retrieval experiments are performed over publicly available OASIS-

56

Chapter 4

Local Gray Scale Image Descriptors

MRI [171], Emphysema-CT [81], NEMA-CT [172], TCIA-CT [173] and EXACT09-CT [174] databases. Until or otherwise specified, 8 local neighbors ( ) equally spaced at a radius ( ) of 1 are used to construct the LBDP, LBDISP and LWP descriptors, the bit-depth (B) is set to 8 for LBDP and LBDISP, and local wavelet decomposition at 2nd level is adopted to construct the LWP feature vector. We compare the results of LDEP, LBDP, LBDISP and LWP feature vectors with the results of LBP [16], LTP [18], CSLBP [17], CSLTP [11], LDP [19], LTrP [20], LTCoP [32], LMeP [33] and SS3DLTP [83] feature vectors. 4.3.1.

Biomedical Databases Used

a) OASIS-MRI Database In the experiment, a magnetic resonance imaging (MRI) database is considered which is made public for research and analysis by the Open Access Series of Imaging Studies (OASIS) [171]. The cross-sectional images (resolution 176×208) of 421 persons are included in this database from the age-group between 18 and 96 years. We partitioned this database into four categories having 106, 89, 102 and 124 images for image retrieval uses on the basis of the ventricular shape inside the images. Fig. 4.10 depicts the example images (four images per category) from the OASIS-MRI database. The ventricular shape inside the images can be observed easily.

b) Emphysema-CT Database The loss of lung tissue is identified as the Emphysema. In order to analyze the Emphysema disease in more details, it is crucial to recognize the healthy and emphysematous lung tissues. We have used the CT images of the Emphysema disease named as the Emphysema-CT database [81]. The Emphysema-CT database comprises of the 3 classes, namely, Normal Tissue (NT), Centrilobular Emphysema (CLE), and Paraseptal Emphysema (PSE) with 59, 50 and 59 images respectively composed of the 39 persons [81]. The emphysema morphology is described by featuring the Emphysema-CT image with the help of texture based descriptors and the retrieval results are analyzed over the Emphysema-CT database in this thesis. Fig. 4.11 displayed one example images from each category of this database.

c) NEMA-CT Database The National Electrical Manufacturers Association (NEMA) [172] created the digital imaging and communications in medicine (DICOM) standard to facilitate the storage and uses of medical images for research purpose. We considered the CT0001, CT0003, CT0020, CT0057, CT0060, CT0080, CT0082, and CT0083 cases of this database for experiments.

57

Chapter 4

Local Gray Scale Image Descriptors

(a) Category-1 (C1)

(b) Category-2 (C2)

(c) Category-3 (C3)

(d) Category-4 (C4)

Fig. 4.10. OASIS-MRI example images, four images from each group.

(a) NT

(b) CLE

(c) PSE

Fig. 4.11. Images from Emphysema-CT database, one image from each class.

(a) Category-1

(b) Category-2

(c) Category-3

(d) Category-4

(e) Category-5

(f) Category-6

(g) Category-7

(h) Category-8

Fig. 4.12. Sample images from each category of the NEMA-CT database.

58

Chapter 4

Local Gray Scale Image Descriptors

Fig. 4.13. Some images of TCIA-CT database, one image from each category.

Fig. 4.14. EXACT09-CT example images, one image from each group.

We considered the 499 CT images of resolution 512×512 taken from the different body parts of NEMA in this study and classified it into 8 different classes (each class for different body parts) which consists of the 104, 46, 29, 71, 108, 39, 33 and 69 images to form the NEMA-CT database. Four sample images from each class of NEMA-CT database is displayed in Fig. 4.12.

d) TCIA-CT Database The cancer image archive (TCIA) provides the storage for the huge amount of research, clinical and medical cancer images [173]. These images are made public to download for the purpose of research. Digital Imaging and Communications in Medicine (DICOM) image format is used to store these images. We prepared TCIA-CT database by collecting 604 Colo_prone 1.0B30f CT images of the DICOM series number 1.3.6.1.4.1.9328.50.4.2 of

59

Chapter 4

Local Gray Scale Image Descriptors

study instance UID 1.3.6.1.4.1.9328.50.4.1 for subject 1.3.6.1.4.1.9328.50.4.0001. According to the size and structure of Colo_prone, we manually grouped this collected 604 images in 8 categories having 75, 50, 58, 140, 70, 92, 78, and 41 images respectively. All images are having the dimension of 512×512 pixels in this database. Fig. 4.13 displayed some example images of the TCIA-CT database with one image from each category.

e) EXACT09-CT Database Extraction of Airways from CT 2009 (EXACT09) is a database of chest CT scans [174]. This database contains the images in two sets training and testing with 20 cases in each set. The DICOM format is used to store the CT scan images. We considered the 675 CT images of CASE23 of testing set of EXACT09 in this chapter for image retrieval experiment and grouped these images on the basis of the structure and size of CT scans into 19 categories having 36, 23, 30, 30, 50, 42, 20, 45, 50, 24, 28, 24, 35, 40, 50, 35, 30, 28 and 55 CT images to form the EXACT09-CT database. The dimension of the images is 512×512. Fig. 4.14 depicts the example images of EXACT09-CT database with one image of each group.

OASIS-MRI Database

OASIS-MRI Database 40

70 LBP LTP CSLBP CSLTP LDP LTrP LTCoP LMeP SS3DLTP LDEP LBDP LBDISP LWP

60

ARP (%)

55 50 45 40 35 30 25

4

12

20

28

36

44

52

60

68

76

84

92

35 30

ARR (%)

65

25 20 15 10 5 0

100

4

12

20

28

36

44

52

60

68

76

84

92

100

Number of Top Matches

Number of Top Matches

OASIS-MRI Database

OASIS-MRI Database 75

40 35

65

ANMRR (%)

F-Score (%)

30 25 20 15

55 45 35

10 25

5 0

4

12

20

28

36

44

52

60

68

76

84

92

100

15

10

20

30

40

50

60

70

80

90

100

Number of Top Matches

Number of Top Matches

Fig. 4.15. The performance comparison of LDEP, LBDP, LBDISP and LWP descriptors with

LBP, LTP, CSLBP, CSLTP, LDP, LTrP, LTCoP, LMeP and SS3DLTP descriptors over OASIS-MRI Database by using D1 distance measure in terms of the ARP, ARR, F-Score and ANMRR as a function of number of top matches (i.e. number of retrieved images).

60

Chapter 4

4.3.2.

Local Gray Scale Image Descriptors

Experimental Results

The performance comparison of LDEP, LBDP, LBDISP and LWP descriptors with LBP, LTP, CSLBP, CSLTP, LDP, LTrP, LTCoP, LMeP and SS3DLTP descriptors over OASISMRI Database is depicted in Fig. 4.15 using the ARP, ARR, F-Score and ANMRR against the number of retrieved images (i.e. number of top matches). It is observed across the plots that the performance of bit-plane based descriptors such as LBDP and LBDISP is improved too much as compared to the remaining descriptors. The results also highlight that LDEP and LWP descriptors are not suitable for the OASIS-MRI database. It is also noticed that LBDP is better performing as compared to LBDISP up to 35 number of top matches after that the scenario is interchanged. The results comparison over Emphysema-CT, NEMA-CT, TCIA-CT and EXACT09-CT databases are done in Fig. 4.16 in terms of the ARP and ANMRR. The performance of LBDP and LWP feature descriptors are comparable to each other over Emphysema-CT database while these descriptors are outperforming the other descriptors. The performance of LDEP descriptor (with very low dimensionality) is also comparable as to the better performing descriptors over Emphysema-CT database. On the other hand, the performance of SS3DLTP (with very high dimensionality as compared to the proposed LDEP, LBDP, LBDISP, and LWP descriptors) is very poor over this database. The LBDISP descriptor is not best suited for Emphysema-CT database. In case of NEMA-CT database, the LBDISP descriptor is outperforming the remaining descriptors in terms of both ARP and ANMRR. The LBDP and LTCoP descriptors are the second and third best methods respectively, over NEMA-CT database. The performance of LDEP and LWP descriptors are also good over NEMA-CT with LDEP is performing extraordinary as its dimension is very low as compared to the top performing descriptors. This database is having images from different part of the body. So, it can be deduced that LBDP and LWP descriptors are better suited if the inter class variation is high while intra class similarity is high. The five top performing descriptors over TCIA-CT database in decreasing order are LBDP, LWP, SS3DLTP, LBDISP and LDEP. The improvement in LBDP is very high as compared to the LWP and the improvement in LWP is also very high as compared to the SS3DLTP. In the case of EXACT09-CT database, the LBDP, LBDISP and LWP descriptors are having the comparable performance to each other while these descriptors outperforming the remaining descriptors. The performance of LDEP is also very fair despite its low dimension. It is deduced that LBDP and LWP descriptors are the ideal ones for the CT databases having the images from a single subject because both TCIA-CT and EXACT09-CT databases are extracted from a single but distinct subject. The LDEP is performing good for each CT database.

61

Chapter 4

Local Gray Scale Image Descriptors

Emphysema-CT Database

Emphysema-CT Database

90 LBP LTP CSLBP CSLTP LDP LTrP LTCoP LMeP SS3DLTP LDEP LBDP LBDISP LWP

80

ARP (%)

75 70 65 60 55 50

50 40

ANMRR (%)

85

6

10

14

18

22

26

30

34

38

42

46

20 10

45 2

30

0

50

2

6

10

14

Number of Top Matches NEMA-CT Database

LBP LTP CSLBP CSLTP LDP LTrP LTCoP LMeP SS3DLTP LDEP LBDP LBDISP LWP

85 80 75 70

34

38

42

46

50

2

6

10

14

38

42

46

50

76

84

92

100

38

42

46

50

20 15 10 5

18

22

26

30

34

38

42

46

0

50

2

6

10

14

Number of Top Matches

18

22

26

30

34

Number of Top Matches

TCIA-CT Database

TCIA-CT Database

100

70 LBP LTP CSLBP CSLTP LDP LTrP LTCoP LMeP SS3DLTP LDEP LBDP LBDISP LWP

80 70 60 50 40 4

12

20

28

36

44

52

60

68

76

84

92

60 50

ANMRR (%)

90

ARP (%)

30

25

ANMRR (%)

ARP (%)

90

40 30 20 10 0

100

4

12

20

28

Number of Top Matches EXACT09-CT Database

70 60 50 40

18

22

26

30

34

68

38

42

46

50 40 30 20 10

30 14

60

60

ANMRR (%)

80

10

52

70

LBP LTP CSLBP CSLTP LDP LTrP LTCoP LMeP SS3DLTP LDEP LBDP LBDISP LWP

6

44

EXACT09-CT Database

90

2

36

Number of Top Matches

100

ARP (%)

26

30

95

30

22

NEMA-CT Database

100

65

18

Number of Top Matches

0

50

Number of Top Matches

2

6

10

14

18

22

26

30

34

Number of Top Matches

Fig. 4.16. The performance comparison descriptors over Emphysema-CT, NEMA-CT, TCIA-

CT and EXACT09-CT Databases by using D1 distance measure in terms of the ARP and ANMRR as a function of the number of top matches (i.e. the number of retrieved images).

62

Chapter 4

Local Gray Scale Image Descriptors

OASIS-MRI Database

ARP (%) for 50 Top Matches

50

40

30 Euclidean Cosine Emd Canberra L1 D1 Chi-square

20

10

0

LDEP

LBDP

LBDISP

LWP

Descriptors Emphysema-CT Database

NEMA-CT Database 100

70

ARP (%) for 25 Top Matches

ARP (%) for 25 Top Matches

80

60 50 40 Euclidean Cosine Emd Canberra L1 D1 Chi-square

30 20 10 0

LDEP

LBDP

LBDISP

80

60

20

0

LWP

Euclidean Cosine Emd Canberra L1 D1 Chi-square

40

LDEP

Descriptors

70

70

60 50 40 Euclidean Cosine Emd Canberra L1 D1 Chi-square

10 0

LDEP

LBDP

LBDISP

LWP

EXACT09-CT Database 80

ARP (%) for 25 Top Matches

ARP (%) for 50 Top Matches

TCIA-CT Database

20

LBDISP

Descriptors

80

30

LBDP

60 50 40

20 10 0

LWP

Descriptors

Euclidean Cosine Emd Canberra L1 D1 Chi-square

30

LDEP

LBDP

LBDISP

LWP

Descriptors

Fig. 4.17. The performance of LDEP, LBDP, LBDISP and LWP descriptors in terms of ARP

with different distance measures such as Euclidean, Cosine, Emd, Canberra, L1, D1 and Chisquare when either 25 or 50 images are retrieved.

a) Effect of Distance Measures The D1 distance measure was used in the results reported previously in this chapter. The LDEP, LBDP, LBDISP and LWP descriptors are also tested with the Euclidean, Cosine, Emd, Canberra, L1, D1 and Chi-square distances over each database and the results are displayed in Fig. 4.17 in terms of ARP when either 25 or 50 images are retrieved. These results suggest that a single distance is not always fit over each database for each descriptor. The performance of descriptors can be boosted further by adopting the best distance for a particular descriptor over a particular database.

63

Chapter 4

Local Gray Scale Image Descriptors

100 level = 1 level = 2 level = 3

95

90

85

level = 1 level = 2 level = 3

95

ARP (%)

ARP (%)

100

90 85

1

2

3

4 5 6 7 8 Number of Top Matches

9

80

10

1

2

3

4 5 6 7 8 Number of Top Matches

(a)

9

10

(b) 100 level = 1 level = 2 level = 3

ARP (%)

98 96 94 92

1

2

3

4 5 6 7 8 Number of Top Matches

9

10

(c) Fig. 4.18. The comparison results for different levels of wavelet decomposition of LWP

descriptor in terms of the ARP vs CT databases. The values of

over (a) TCIA-CT, (b) EXACT09-CT, and (c) NEMA-

and

are 8 and 1 in this analysis so the possible levels of

wavelet decomposition are 1, 2 and 3.

b) Effect of Level of Wavelet Decomposition of LWP The effect of level ( ) is also analyzed of local wavelet decomposition over the performance of the LWP descriptor. For 8 local neighbors at 1 radius of the neighborhood (i.e., ), the maximum possible value of is 3 (i.e., for

and

). We tested the LWP descriptor

in the Fig. 4.18(a-c) over TCIA-CT, EXACT09-CT and NEMA-CT

databases respectively. It is observed across the plots of the Fig. 4.18 that the performance of LWP is nearly increasing over TCIA-CT and EXACT09-CT databases with the increase in . In the case of NEMA-CT database, the performance of LWP is better for used

. This was the fact that we

for LWP in the results so far. From this analysis, it is desirable to use the higher

level of wavelet decomposition for the databases having images from the same body part such as TCIA-CT and EXACT09 database and the lower level of wavelet decomposition for the databases having images from the different body part such as NEMA-CT database. Moreover, the dimension of the LWP descriptor doesn’t change with the level of wavelet decomposition which is a major finding of our approach.

64

Chapter 4

Local Gray Scale Image Descriptors

c) Performance V/S Complexity The dimension ( CSLBP (

) of different descriptors are LBP ( ), CSLTP (

(

), LMeP (

(

), LBDISP (

), LDP (

), LTP (

), LTrP (

), SS3DLTP ( ), and LWP (

), ), LTCoP

), LDEP (

), LBDP

). The dimension of the proposed

LDEP, LBDP, LBDISP and LWP descriptors are very less as compared to the remaining descriptors except LBP, CSLBP and CSLTP descriptors. The dimension of LDEP is very low, but its performance is very good over each CT databases. The dimension of LBDP, LBDISP and LWP is equal to the LBP, but the performance is improved too. All the experiments are conducted using a system having Intel(R) Core(TM) i5 CPU [email protected] GHz processor, 4 GB RAM, and 32-bit Windows 7 Ultimate operating system.

Fig. 4.19. Retrieval results from TCIA-CT database using LBP (1

st

row), LTP (2nd row),

CSLBP (3rd row), CSLTP (4th row), LDP (5th row), LTrP (6th row), LTCoP (7th row), LMeP (8th row), SS3DLTP (9th row), LDEP (10th row), LBDP (11th row), LBDISP (12th row) and LWP (13th row) feature vectors. The first image in each row is the query image and rests are the retrieved images in order of deceasing similarity from left to right. Note that the images in red rectangles are the false positives.

65

Chapter 4

Local Gray Scale Image Descriptors

d) Retrieval Results The top 10 retrieved images for a query image taken from the TCIA-CT database using each descriptor are displayed in Fig. 4.19. The 1st to 13th rows correspond to the LBP, LTP, CSLBP, CSLTP, LDP, LTrP, LTCoP, LMeP, SS3LTP, LDEP, LBDP, LBDISP and LWP descriptors respectively. Here, it can be observed that LBDP, LBDISP and LWP are the descriptors that achieved 100% precision. The number of correct images retrieved by LDEP is 7 which are still comparable with other descriptors of high dimensionality. From the experimental results in terms of ARP, ARR, F-Score and ANMRR over OASISMRI, Emphysema-CT, NEMA-CT, TCIA-CT and EXACT09-CT databases, it is pointed out that proposed LDEP, LBDP, LBDISP and LWP feature descriptors are more discriminative and efficient as compared to the state-of-the-art descriptors proposed recently such as LBP, LDP, LTrP, LTCoP, LMeP and SS3DLTP. It is pointed out that LDEP and LWP descriptors are suited only for the CT databases, whereas LBDP and LBDISP descriptors are suited for both MRI as well as CT databases.

4.4.

Summary

We proposed four new image feature descriptor namely LDEP, LBDP, LBDISP and LWP in this chapter for medical biomedical image retrieval. The LDEP exploited the relationship of center with diagonal neighbors. The LBDP and LBDISP have used the local relationship at bit-plane level. The LWP has been constructed by taking the 1-D wavelet decomposition of local neighborhoods. The basic aim of these descriptors was to encode the relationship among the local neighbors as well as between the center and its neighbors to enhance the discriminative ability while at the same time maintaining the low dimensionality of the descriptors. These descriptors were tested on one MRI and four CT databases. The CT databases considered were having different characteristics such as some database was from the same part of the body while some was from the different part of the body; some was constructed from a single subject while some was constructed from different subjects. The image retrieval framework was used for the performance comparison. Various state-of-the-art descriptors are compared with the proposed descriptors. The performance of proposed descriptors was found appealing in most the cases in terms of precision, recall and retrieval rank. The introduced descriptors are also experimented with different distances and found that by adopting the best performing distance the performance can be boosted further. The dimension of the descriptors was also compared and deduced that the proposed descriptors are having very less dimensionality as compared to the most of existing descriptors. It is evident from the experiments and analysis that the proposed descriptors are more efficient as well as more discriminative for the biomedical image retrieval over MRI and CT databases.

66

Chapter 5 Local Color Image Descriptors for Natural and Textural Image Retrieval Content-based image retrieval is demanding accurate with efficient retrieval approaches to index and retrieve the most similar images from the huge image databases. Color is a very important characteristic of the real image. Thus, it is required to encode the color information while designing any descriptor. This chapter has introduced four color based local descriptors namely local color occurrence descriptor (LCOD), rotation and scale invariant hybrid image descriptor (RSHD), multichannel adder based local binary pattern (maLBP) and multichannel decoder based local binary pattern (mdLBP). In LCOD and RSHD, the color information is processed in two steps: first, the number of color is reduced into a less number of shades by quantizing the RGB color space; second, the reduced color shade information of the local neighborhood is used to compute the descriptor. In order to encode LCOD, a local color occurrence binary pattern is generated. Color and textural features are fused together to construct the RSHD image descriptor by computing the texture information using structuring patterns over quantized color shades. The maLBP and mdLBP have utilized the concept adder and decoder respectively to exploit the cross-channel information. LCOD, RSHD, maLBP and mdLBP are tested over two colored natural and two colored texture databases for CBIR and the results confirm the superiority of these descriptors. The performance of the proposed descriptors is promising in the case of rotation and scaling also and it can be effectively used for accurate image retrieval under various image transformations. The rest of this chapter is organized as follows: Section 5.1 proposes the LCOD, RSHD, maLBP and mdLBP descriptors for color image retrieval; Section 5.2 presents the detailed experimental results, analysis and discussions over the four databases including two natural color and two textural color databases; and finally Section 5.3 concludes the chapter. 

The content of this chapter is published in the following research articles:  S.R. Dubey, S.K. Singh, and R.K. Singh, “Multichannel Decoded Local Binary Patterns for Content Based Image Retrieval,” IEEE Transactions on Image Processing, Vol. 25, No. 9, pp. 4018-4032, 2016.  S.R. Dubey, S.K. Singh, and R.K. Singh, “Local neighbourhood-based robust colour occurrence descriptor for colour image retrieval,” IET Image Processing, Vol. 9, No. 7, pp. 578-86, 2015.  S.R. Dubey, S.K. Singh, and R.K. Singh, “Rotation and scale invariant hybrid image descriptor and retrieval,” Computers & Electrical Engineering, Vol. 46, pp. 288-302, 2015.

Chapter 5

Local Color Image Descriptors

5.1. Local Color Image Descriptors and Retrieval In this section, we propose four color descriptors namely local color occurrence descriptor (LCOD), rotation and scale invariant hybrid descriptor (RSHD), multichannel adder based local binary pattern (maLBP), and multichannel decoder based local binary pattern (mdLBP). LCOD and RSHD are based on the color quantization and pattern extraction, whereas maLBP and mdLBP are based on the pattern extraction over color channels and then transformation over the binary patterns of multiple channels of the image. 5.1.1.

Local Color Occurrence Descriptor (LCOD)

In this sub-section, we describe the construction process of proposed Local Neighborhood Based Robust Color Occurrence Descriptor (LCOD). The descriptor will be computed over quantized image obtained in the previous section. Here, we focus to constructing the descriptor only on the basis of color features of the image. The whole construction process consists of three steps. In the first step, a color quantization is performed to reduce the number of color shades of the image. In the second step, a local color occurrence binary pattern is generated for each pixel in the image in its local neighborhood. In the third step, the binary patterns of all pixels of the image are aggregated to find a single pattern. a) Color quantization RGB color images contain three channels representing Red (R), Green (G) and Blue (B) colors respectively. According to the RGB color space, the range of shades is [0, l-1], where l is the number of distinguished shades in each channel. The number of different colors possible in this color space is l3, which is a large number. Considering all the colors for feature description is not feasible because the dimension of the descriptor should be as minimal as possible. We quantize RGB color space such as it becomes a single channel with reduced number of shades. To reduce the complexity of the computation, RGB color space is quantized into q×q×q=q3 bins, i.e. each color is quantized into q bins, where q << l. To retain equal weighting of each color, all color components are quantized into equal number of bins. The steps involved in the quantization are as follows:

(1) Divide each Red, Green and Blue component of image I into q shades from l shades respectively. The reduced color components (i.e. Rred, Gred and Bred) are computed as,

where, ┌ ┐is the ceiling operator and

.

68

Chapter 5

Local Color Image Descriptors

(2) Combine all three components Rred, Gred and Bred into a one-dimension to construct the reduced color image Ired as follows,

We quantize each color of the RGB image into equal number of shades which retains the symmetric information. Liu et al. [22] also quantized RGB color space into 64 shades whereas [24] and [23] quantized HSV color space into 72 shades and [102] quantized L*a*b* color space into 90 shades. In this chapter, the value of q is chosen as 4 (i.e. 64 shades) after quantization until or otherwise specified. b) Local neighborhood based color occurrence binary pattern The performance of most of the descriptors is restricted under geometric and photometric transformation conditions because they use spatial information. To overcome this problem, we considered the local occurrences of individual reduced color shades over a local neighboring region. A local color occurrence binary pattern is generated for each pixel of the image. First, we find the number of occurrences for each shade in local neighborhoods and represent it in binary form. Finally, the binary pattern for each shade is concatenated to obtain a single pattern for that pixel. The binary pattern generated for each pixel is used to construct the descriptor. Let I be an image of size u×v and Ired is the image obtained after quantization of I. We represent the number of occurrences of shade ρ within D distance (city block) local neighborhood of pixel P(x, y) of image Ired by and

, where

,

,

. To illustrate the methodology to compute the final binary pattern

for a given pixel, we have considered an example in Fig. 5.1. In this example, we find the pattern for the middle pixel (i.e.

) having shade value 3 (highlighted in green in Fig.

5.1) by considering the value of D as 2 and 1 (i.e.

and

respectively). Only 5

different shades are considered in this example (i.e. ρ = [1, 5]). In Fig. 5.1(a), the value of D is considered as 2 such that the maximum possible value of

becomes 25 (i.e.,

)

which requires a minimum of k = 5 bits to represent in the binary, where . The number of occurrences of shade ρ (i.e. and 5 respectively. The pattern which is the binary representation of

) is 6, 5, 4, 5, and 5 for ρ = 1, 2, 3, 4,

is computed by concatenating . Similarly,

for ρ = [1, 5]

is also computed in the Fig.

5.1(b) for the same example of Fig. 5.1(a). The number of bits required for D=1 is 4 in Fig. 5.1(b) because the maximum possible number of occurrence is 9. The length of Fig. 5.1(a), whereas it is 20 in Fig. 5.1(b). Thus, the length of final pattern is

69

is 25 in .

Chapter 5

Local Color Image Descriptors

Fig. 5.1. An illustration to compute the local color occurrence binary pattern for (a) D = 2, and

(b) D = 1. The number of quantized color shades is considered as 5 in this example for both the cases.

c) Local neighborhood based robust color occurrence descriptor The local color occurrence binary pattern is generated for each pixel of the image as discussed previously. To obtain the lower dimensional descriptor, we adopted bin-wise addition of binary patterns. The descriptor des obtained after bin-wise addition for

bin is given by the

following equation,

The construction process of des depends upon the size of the image (i.e. the values of u and v). The values of the elements of des will be larger for the higher resolution images and vice-versa. To overcome the effect of image size over the descriptor, we normalize the des such that it becomes invariant to the scale of the image. Normalization is carried out in such a way that the sum of resultant descriptor becomes 1. Thus, the final local color occurrence descriptor (LCOD) is given as,

70

Chapter 5

5.1.2.

Local Color Image Descriptors

Rotation and Scale Invariant Hybrid Descriptor (RSHD)

This sub-section describes the proposed rotation and scale-invariant hybrid descriptor (RSHD). The RSHD is constructed over the color quantized image

obtained in 5.2.1(a).

First, we present the five rotation-invariant structure elements used in this work to encode the texture information and then we describe the descriptor construction process in detail with illustrative examples. a) Rotation-invariant structure element Color, texture and shape are the key information that present in the natural images. We can think of the image as an arrangement of the regions with different color, texture and shapes. The local neighborhood contains much texture and shape information and plays an important role in the human visual system to perceive the local structures. Structure element has a strong influence on texture representation and recently it is used by some researchers to encode the texture information [22-24]. But these structure elements are partially invariant to the rotation and scale. In the present work, we have enhanced the concept of structure element by considering rotation-invariant structure element and the descriptor is constructed by considering local neighborhood textural information with the color information. The local neighboring structures of the images from the same class are quite similar and if such information can be encoded in a rotation-invariant manner, it may become an important description in image matching. The RSHD is the fusion of color and textural cues present in the image in an efficient manner. To encode the color information, we quantized the RGB color space into single channel having less number of shades. The textural feature is computed in the form of binary structuring pattern. Structuring pattern is generated by considering local neighboring structure element. In order to fuse the color and texture, we extracted the structuring pattern for each quantized shade independently. In this way, we are able to encode the color and texture information simultaneously. The local neighboring structure elements facilitate proposed descriptor to boost with rotation and scale-invariant property. We incorporated the basic local structures into the framework of neighboring patterns. The five structure elements used in RSHD are shown in the Fig. 5.2 and it is represented by the number and orientation of the active elements (i.e. highlighted pixels). We refer these structures of Fig. 5.2(a-e) as type 1, type 2, type 3, type 4, and type 5 structures respectively. Note that structure element in Fig. 5.2(a-e) have (4, 4, 2, 4, 1) degrees of freedom respectively towards rotation. This freedom of rotation and closest neighboring information allows the construction of rotation and scaleinvariant hybrid descriptor using these five structure elements. If the number of active elements is more it means that the local neighborhood of a particular pixel is dense with a

71

Chapter 5

Local Color Image Descriptors

particular shade. We quantized the original RGB color space into 64 different shades for RSHD descriptor.

Fig. 5.2. Five structure element containing (a) only one, (b) two consecutive, (c) two non-

consecutive, (d) three consecutive, and (e) four consecutive active elements.

Fig. 5.3. Six patterns derived from the five structure elements, representing (a) no structure,

(b) type 1 structure, (c) type 2 structure, (d) type 3 structure, (e) type 4 structure, and (e) type 5 structure.

Fig. 5.4. Extraction of structure map for each quantized shade; spρ represents the pattern over

ρth quantized shade for a particular pixel. In this example, the number of quantized color shade is set to 4.

72

Chapter 5

Local Color Image Descriptors

Fig. 5.5. Three examples to illustrate the computation of RSHD patterns over each quantized

shade for a particular pixel; in this example also, the number of quantized shade is set to 4.

Five structure elements shown in Fig. 5.2 serve as the base templates to extract the texture information over each quantized shade for each pixel in the image. To illustrate the extraction of pattern using these structuring elements, consider only two quantized shades in the image. We define six patterns, Patterni for i=0 to 5 from five structuring elements of 5 bins. The six patterns derived from the five structure element are illustrated in Fig. 5.3. Pattern0 corresponds to no structure for a particular quantized shade and the values of all five bins are set to zero. Patterni for i=1 to 5 correspond to type i structure element and only the ith bin is set to 1 in the corresponding pattern (all other bins are set to zero). b) Descriptor construction using structuring pattern Structuring pattern is the textural information on the basis of six patterns derived from the five structure elements for each pixel in the image over all quantized shades. To illustrate the concept efficiently, suppose we have only four quantized shades form 1 to 4. Fig. 5.4 illustrates the extraction of local neighborhood maps for each quantized shade and according to the structure of the map, its type is defined. Fig. 5.5 shows three examples to compute the structuring pattern SP for any pixel from its four neighboring pixels. The structuring pattern over each quantized shade SPρ is computed first and then combined it in order to find a single structuring pattern SP for any given pixel. In Fig. 5.5(a), SP1 is the pattern for the type 2 structure over shade 1; SP2 is the pattern for the type 1 structure over shade 2; SP3 is the pattern for the no structure over shade 3; SP4 is the pattern of the type 1 structure over shade 4; and SP is the concatenation of the SP1, SP2, SP3, and SP4 which represents the final structuring pattern obtained for middle pixel. The SP for the middle pixel in Fig. 5.5(b-c) is also generated similarly. The length of SP becomes 20 for 4 quantized shades considering 5 structure elements. In general, the length of SP is 5q3. We extracted the structuring pattern SP for each pixel of the image, now our goal is to combine these SPs in an efficient manner in order to construct the final descriptor. We combined these SPs for all pixels in the image by summing them for each bin individually; it means that the value at the jth bin in the final descriptor is the number of 1’s at the jth bin in all SPs (i.e. the sum of jth bin of all SPs). Finally, a normalization scheme similar to LCOD descriptor is adopted here also to make the RSHD descriptor image scale invariant.

73

Chapter 5

Local Color Image Descriptors

The final RSHD descriptor is invariant to the rotation because we constructed it by using rotation-invariant structure elements which makes it discriminative under rotation. Note that, the use of local neighboring information makes RSHD scale-invariant, too, because the information in just the local neighborhood of a pixel does not change too much with the change in scale of the image. In this chapter, the value of q3 and number of the structuring element is set to 64 and 5 respectively, which lead to the size of RSHD descriptor is 64 × 5 = 320. The proposed image feature descriptor RSHD is an improvement over SEH [24]. SEH doesn’t consider the neighboring information around a center pixel which we incorporated in the RSHD descriptor. In methodology also our descriptor is based on the binary representation of the rotation invariant structuring pattern, whereas SEH is based on the decimal representation of square structuring patterns. 5.1.3.

Multichannel Decoded Local Binary Pattern

In this sub-section, we propose two multichannel decoded local binary pattern approaches, namely multichannel adder based local binary pattern ( based local binary pattern (

) to utilize the local binary pattern information of multiple

channels in efficient manners. Total

and

number of output channels are generated by

using multichannel adder and decoder respectively from . Let

is the

channel of any image of size

number of channels. If the and

) and multichannel decoder

number of input channels for , where

neighbors equally-spaced at radius

are defined as

is the total

of any pixel

for

(see Fig. 5.6), where

according to the definition of the LBP [16], a local binary pattern in

and

channel is generated by computing a binary value

. Then, for a pixel

given as follows, (5.5)

where, (5.6)

Let, the multichannel adder based local binary pattern multichannel decoder based local binary pattern

are the outputs of the

multichannel LBP adder and decoder respectively, where Note that, the values of of

and

multichannel adder map

and the

and

.

are in the binary form (i.e. either 0 or 1). Thus, the values are also in the binary form generated from the and multichannel decoder map

respectively corresponding to the each neighbor

74

of pixel

.

Chapter 5

Local Color Image Descriptors

Fig. 5.6. The local neighbors

of a center pixel

coordinate system for

and

in

channel in polar

.

Table 5.1. Truth Table of Adder and Decoder map with 3 input channels

0

0

0

0

0

0

0

1

1

1

0

1

0

1

2

0

1

1

2

3

1

0

0

1

4

1

0

1

2

5

1

1

0

2

6

1

1

1

3

7

The truth map of

and

for

are shown in Table 5.1 are

having 4 and 8 distinct values respectively. Mathematically, the

and

are defined as, (5.7) (5.8) We denote for and pattern

for and

and

by input patterns,

by adder patterns and

for

by decoder patterns respectively. The multichannel adder based local binary for pixel

from multichannel adder map

defined as,

75

and

is

Chapter 5

Local Color Image Descriptors

(5.9) Similarly, the multichannel decoder based local binary pattern from multichannel decoder map

and

for pixel

can be computed as, (5.10)

The multichannel adder based local binary patterns ( multichannel decoder based local binary patterns ( pixel

is computed using

and

) and for the center respectively in the

following manner,

Fig. 5.7. (a) RGB image, (b) R channel, (c) G channel, (d) B channel, (e) LBP map over R

channel, (f) LBP map over G channel, (g) LBP map over B channel, (h-k) 4 output channels of the adder, and (l-s) 8 output channels of the decoder using 3 input LBP map of R, G and B channels.

76

Chapter 5

Local Color Image Descriptors

Color Image

Red (R) Channel

LBP1

Green (G) Channel

Blue (B) Channel

LBP2

LBP3

Adder

Decoder

maLBP1

maLBP4

mdLBP1

mdLBP8

maLBP1 Histogram

maLBP4 Histogram

mdLBP1 Histogram

mdLBP8 Histogram

maLBP Feature Vector

mdLBP Feature Vector

Fig. 5.8. The flowchart of computation of multichannel adder based local binary pattern

feature vector (i.e. maLBP) and multichannel decoder based local binary pattern feature vector (i.e. mdLBP) of an image from its Red (R), Green (G) and Blue (B) channels.

(5.11) (5.12) An illustration of the adder output channels and decoder output channels are presented in the Fig. 5.7 for an example image of Corel-1k database. An input image in RGB color space (i.e.

) is shown in Fig. 5.7(a). The corresponding Red (R), Green (G) and Blue (B)

channels are extracted in the Fig. 5.7(b-d) respectively. Three LBPs corresponding to the Fig. 5.7(b-d) are portrayed in the Fig. 5.7(e-g) for R, G and B channels respectively. The four output channels of the adder and eight output channels of the decoder are displayed in Fig. 5.7(h-k) and Fig. 5.7(l-s) respectively. It can be perceived from the Fig. 5.7 that the decoder channels are having a better texture differentiation as compared to the adder channels and input channels while adder channels are better differentiated than input channels. In other

77

Chapter 5

Local Color Image Descriptors

words, we can say that by applying the adder and decoder transformation the inter channel decorrelated information among the adder and decoder channels increases. The feature vector (i.e. histogram) of

output channel of the adder (i.e.

) is

computed as follows,

for

and

, where,

is the dimension of the input image

(i.e. total number of pixels).

Similarly, the feature vector of

for

output channel of decoder (i.e.

and

The final

given by concatenating the histograms of

and and

) is computed as,

feature vectors are over each output channel

respectively and given as,

The process of computation of

and

feature descriptor of an image is

illustrated in Fig. 5.8 with the help of a schematic diagram. In this diagram, Red, Green and Blue channels of the image are considered as the three input channels. Thus, 4 and 8 output channels are produced by the adder and decoder respectively.

5.2. Results and Discussions In the experiments, each image of the database is turned as the query image. For each query image, the system retrieves top matching images from the database on the basis of the shortest similarity score measured using different distances between the query image and database images. If the returned image is from the category of the query image, then we say that the system has appropriately retrieved the target image, else, the system has failed to retrieve the target image. The performances of different descriptors are investigated using ARP and ANMRR explained in sub-section 4.2.2. To demonstrate the effectiveness of the proposed

78

Chapter 5

Local Color Image Descriptors

approach, we compared our results of LCOD, RSHD, maLBP and mdLBP with existing methods such as Local Binary Pattern (LBP) [16], Color Local Binary Pattern (cLBP) [105], Multi-Scale Color Local Binary Pattern (mscLBP) [106], mCENTRIST [109], SEH [24] and CDH [102]. We have considered

5.2.1.

and

for maLBP and mdLBP.

Databases

a) Corel-1k and Corel-10k databases We evaluated proposed descriptors on the Corel-1k dataset containing 1000 images from 10 categories having 100 images per category taken from [6]. The resolution of images is either 384×256 or 256×384 in the Corel-1k dataset. The 10 categories of the Corel-1k dataset are building, bus, dragon, elephant, flower, food, horse, human being, beach and mountain. In order to evaluate the proposed descriptor over large database, a standard Corel database for CBIR [175] is used in this chapter. A large number of images having different details are present in the Corel image database ranging from natural scenarios and outdoor sports to animals. We refer this database as the Corel-10k database. The Corel-10k database consists of the 80 categories having nearly more than 100 images in each category totaling 10800 images in the database. The images in a particular group are category-homogeneous. The resolutions of the images are either 120 × 80 or 80 × 120 pixels. b) MIT-VisTex and STex-512S databases In this experiment, we used MIT-VisTex colored texture database consisting of 40 different images. The images are having 512×512 dimension and collected from the [176]. We created a database of 640 images (i.e. 40×16) by partitioning each 512×512 image into sixteen 128×128 non overlapping sub-images. We also used the STex-512S colored texture database from [177] for retrieval experiment. This database consists of the 7616 numbers of color images of dimension 128×128 from 26 categories. c) Corel-1k-Rotate, Corel-1k-Scale and Corel-1k-Illumination databases In order to emphasize the performance of proposed descriptors under rotation, we synthesized a Corel-1k-Rotate database by rotating first 25 images of each category of Corel-1k with angle 0, 90, 180, and 270 degrees. Thus, the total number of images in the Corel-1k-Rotate database is 100. We also synthesized a Corel-1k-Scale database also by scaling first 20 images of each category of Corel-1k at the scales of 0.5, 0.75, 1, 1.25, and 1.5. Corel-1k-Scale database contains 100 images for each category with 1000 total number of images. To test the performance of the descriptor under monotonic intensity change, we also synthesized a Corel-

79

Chapter 5

Local Color Image Descriptors

1k-Illumination database by adding -60, -30, 0, 30, and 60 in the all channels (i.e. Red, Green and Blue) of the first 20 images of each category of the Corel-1k database. Thus, the Corel1k-Illumination database also consists of the 1000 images with 100 images per category. 5.2.2.

Effect of Distance Measures

We compared the performance of image retrieval using LCOD, RSHD, maLBP and mdLBP descriptors in terms of the average precision rate (ARP) using Euclidean, Cosine, Emd, Canberra, L1, D1 and Chi-square (

) distance measures defined in section 1.2.3 and shown

the results in Fig. 5.9 over Corel-1k, Corel-10k, MIT-VisTex and STex-512S databases. It is observed that the Chi-square distance is better suited for each descriptor over nearly all the databases. Thus, Chi-square distance will be used for the similarity measure in the rest of the results of this chapter. 5.2.3.

Experimental Results

The image retrieval results obtained using LCOD, RSHD, maLBP and mdLBP descriptors are compared with LBP, cLBP, mscLBP, mCENTRIST, SEH and CDH descriptors.

Corel-10k Database 45

70

40

60 50 40

Euclidean Cosine Emd Canberra L1 D1 Chi-square

30 20 10 0

LCOD

RSHD

maLBP

ARP (%) for 10 Top Matches

ARP (%) for 10 Top Matches

Corel-1k Database 80

35 30 25

15 10 5 0

mdLBP

Euclidean Cosine Emd Canberra L1 D1 Chi-square

20

LCOD

Descriptors

mdLBP

(b)

MIT-VisTex Database

STex-512S Database

80

80

70 60 50 Euclidean Cosine Emd Canberra L1 D1 Chi-square

40 30 20 10 LCOD

RSHD

maLBP

ARP (%) for 10 Top Matches

ARP (%) for 10 Top Matches

maLBP

Descriptors

(a)

0

RSHD

70 60 50

30 20 10 0

mdLBP

Descriptors

Euclidean Cosine Emd Canberra L1 D1 Chi-square

40

LCOD

RSHD

maLBP

mdLBP

Descriptors

(c)

(d)

Fig. 5.9. The performance of LCOD, RSHD, maLBP and mdLBP descriptors with varying

distances over Corel-1k, Corel-10k, MIT-VisTex and STex-512S databases.

80

Chapter 5

Local Color Image Descriptors

Corel-1k Database

Corel-1k Database

100

30

95

LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

25 LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

85 80 75 70 65 60

1

2

3

4

5

6

ANMRR (%)

ARP (%)

90 20 15 10 5

7

8

0

9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

Number of Top Matches

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(a) Corel-10k Database

Corel-10k Database

100

70 60 50

60

ANMRR (%)

80

ARP (%)

70

LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

90

40 30

40

20

30

10

20

1

2

3

4

5

6

7

8

0

9 10 11 12 13 14 15 16 17 18 19 20

LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

50

1

2

3

4

5

6

Number of Top Matches

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(b) 100 95 90 85 80 75 70 65 60 55 50 45

MIT-VisTex Database 55 50

LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

1

2

3

4

5

6

7

ANMRR (%)

ARP (%)

MIT-VisTex Database LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

45 40 35 30 25 20 15 10 5

8

0

9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

Number of Top Matches

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(c) STex-512S Database

STex-512S Database

100 95

50 45

90 85

40 LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

80 75 70 65 60 55 50 45 1

2

3

4

5

6

7

ANMRR (%)

ARP (%)

LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

35 30 25 20 15 10 5

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(d) Fig. 5.10. The result comparison of LCOD, RSHD, maLBP and mdLBP descriptors with LBP,

cLBP, mscLBP, mCENTRIST, SEH and CDH descriptors over (a) Corel-1k, (b) Corel-10k, (c) MIT-VisTex and (d) STex-512S databases in terms of the ARP and ANMRR.

81

Chapter 5

Local Color Image Descriptors

Corel-1k Database

Average Precision (10 Top Matches)

100 mdLBP maLBP RSHD LCOD

90

80

70

60

50

40 Building

Bus

Dragon Elephant Flower

Food

Horse

Human

Beach Mountain

Corel-1k Database Categories

Fig. 5.11. The results of LCOD, RSHD, maLBP and mdLBP descriptors over each category of

Corel-1k database in terms of the average precision.

Fig. 5.12. Top 10 retrieved images using each descriptor for a query image from Corel-1k

database. Note that 10 rows corresponds to the different descriptor such as LBP (1 st row), cLBP (2nd row), mscLBP (3rd row), mCENTRIST (4th row), SEH (5th row), CDH (6th row), LCOD (7th row), RSHD (8th row), maLBP (9th row) and mdLBP (10th row) and 10 last columns corresponds to the 10 retrieved images in decreasing order of similarity for the query image in the first column.

82

Chapter 5

Local Color Image Descriptors

Fig. 5.13. Top 10 retrieved images using each descriptor for a query image from MIT-VisTex

database. Note that 10 rows corresponds to the different descriptor such as LBP (1 st row), cLBP (2nd row), mscLBP (3rd row), mCENTRIST (4th row), SEH (5th row), CDH (6th row), LCOD (7th row), RSHD (8th row), maLBP (9th row) and mdLBP (10th row) and 10 last columns corresponds to the 10 retrieved images in decreasing order of similarity for the query image in the first column.

The result comparison plots in terms of ARP and ANMRR over (a) Corel-1k, (b) Corel10k, (c) MIT-VisTex and (d) STex-512S databases are illustrated in Fig. 5.10. The RSHD descriptor outperforms the remaining descriptors over the Corel-1k database as the ARP is highest and ANMRR is lowest for it. The LCOD descriptor is the second best descriptor over Corel-1k database. The maLBP and mdLBP descriptor is also performing better with mdLBP as the third best performing over Corel-1k. In the case of large database (i.e. Corel-10k database), the results of LCOD, RSHD and mdLBP are nearly same and outperforms the remaining descriptors. The mdLBP is best suited descriptor for colored texture databases as its performance is too improved over MIT-ViSTex and STex-512S databases. Moreover, the maLBP descriptor is also having the comparable results. Over colored texture databases, SEH

83

Chapter 5

Local Color Image Descriptors

and CDH descriptors fail drastically, whereas LCOD and RSHD descriptors are able to maintain better results than SEH and CDH. It is observed from the results of Fig. 5.10 that the LCOD and RSHD descriptors are better for color natural databases and maLBP and mdLBP descriptors are better for color texture databases. Corel-1k-Rotate Database 100 95

ARP (%)

90 LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

85 80 75 70 65 60 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(a) Corel-1k-Scale Database 100 95

ARP (%)

90 LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

85 80 75 70 65 60 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(b) Corel-1k-Illumination Database 100 95 90

ARP (%)

85

LBP cLBP mscLBP mCENTRIST SEH CDH LCOD RSHD maLBP mdLBP

80 75 70 65 60 55 50 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Number of Top Matches

(c) Fig. 5.14. The results comparison of different descriptors over (a) Corel-1k-Rotate, (b) Corel-

1k-Scale and (c) Corel-1k-Illumination database.

84

Chapter 5

Local Color Image Descriptors

The average precision over each category of the Corel-1k database is depicted in Fig. 5.11 using LCOD, RSHD, maLBP and mdLBP descriptors to know the category which needs more attention. It is observed across the plots that all the descriptor’s performance is down for the ‘beach’ and ‘mountain’ categories. The top 10 similar retrieved images for a query image from the Corel-1k database using each descriptor are displayed in Fig. 5.12. Note that the rows represent the retrieved images using different descriptors in following manner: 1st row using LBP, 2nd row using cLBP, 3rd row using mscLBP, 4th row using mCENTRIST, 5th row using SEH, 6th row using CDH, 7th row using LCOD, 8th row using RSHD, 9th row using maLBP and 10th row using mdLBP; the 10 last columns corresponds to the 10 retrieved images in decreasing order of similarity and the images in the 1st column are the query images. Fig. 5.12 is showing the retrieved images for a query image from the ‘Horse’ category. The precision obtained by LBP, cLBP, mscLBP, mCENTRIST, SEH, CDH, LCOD, RSHD, maLBP and mdLBP for this example are 30%, 20%, 20%, 30%, 70%, 30%, 90%, 100%, 60% and 70% respectively. It means that RSHD descriptor retrieve most correct images. The 10 retrieved images are also depicted in Fig. 5.13 for a query image from MITVisTex database. The number of correct images retrieved using LBP, cLBP, mscLBP, mCENTRIST, SEH, CDH, LCOD, RSHD, maLBP and mdLBP are 4, 7, 8, 9, 9, 7, 8, 9, 9 and 10 in Fig. 5.13. Here, the only mdLBP descriptor is able to gain the 100% precision. 5.2.4.

Analyzing Robustness of the Descriptors

In this sub-section, we test the robustness of the descriptors towards image rotation, image scaling and uniform illumination difference. The comparison results of different descriptors are displayed in Fig. 5.14 over (a) Corel-1k-Rotate, (b) Corel-1k-Scale and (c) Corel-1kIllumination database. It is explored that LCOD and RSHD descriptors are robust to the rotation and scale of the image, whereas, maLBP and mdLBP fail to gain the same. It is also experienced that mdLBP is more uniform illumination robust than other descriptors. The CDH descriptor fails drastically in the case of illumination difference, whereas, it is partial robust to the image scaling. The SEH descriptor also exhibits the partial robustness towards the image rotation, image scaling and uniform illumination change.

5.3. Summary In this chapter, four color image descriptors namely LCOD, RSHD, maLBP and mdLBP have proposed. The LCOD and RSHD are based on the color quantization. The maLBP and mdLBP have utilized the local information of multiple channels on the basis of the adder and decoder concepts. The proposed descriptors are evaluated using image retrieval experiments over four color databases having images of natural scenery and textures. The experimental

85

Chapter 5

Local Color Image Descriptors

results pointed out that the LCOD and RSHD descriptors have shown very promising results over natural color databases, whereas, the maLBP and mdLBP have outperformed state-ofthe-art descriptors over color textural databases. It has also been experimented that the Chisquare distance is very good with the proposed descriptors. The robustness of LCOD and RSHD is improved enough towards the image rotation and image scaling. The mdLBP has shown the highest degree of robustness towards the uniform illumination change. It is deduced from the experimental result that the proposed descriptors outperform the LBP as well as non-LBP based descriptors over color images.

86

Chapter 6 Brightness Invariant Image Retrieval Using Illumination Compensation The image retrieval is still challenging to retrieve the most similar images of a given image from a huge database more accurately and robustly. It becomes more challenging for the images having drastic illumination differences. Most of feature descriptor having better retrieval performance degrades in the case of illumination change. To circumvent this problem, we compensated the varying illumination in the image using multi-channel information. We used red, green, blue channel of RGB color space and I channel of HSI color space to remove the intensity change in the image. Finally, we designed an illumination compensated color space to compute the feature descriptor over it. The proposed idea is generic and can be implemented with the most of the feature descriptor. We used some stateof-the-art feature descriptor to show the effectiveness and robustness of proposed color transformation towards uniform and non-uniform illumination change. The experimental results suggest that proposed brightness invariant color transformation can be applied effectively in the retrieval task. The rest of this chapter is organized as follows: Section 6.1 proposes a multi-channel based illumination compensation mechanism; Section 6.2 introduces brightness invariant image retrieval; Section 6.3 explores the similarity measures; Section 6.4 presents the dataset and detailed result analysis over one natural and two synthesized illumination datasets; and finally Section 6.5 concludes the chapter.

6.1. Illumination Compensation Mechanism In this section, we introduce an approach to remove the effect of illumination difference in the color images. Color intensity compensation is performed using Red, Green, Blue and Intensity channels of the RGB and HSI color space of the image. By doing so, we generate a



The content of this chapter is published in the following research article:  S.R. Dubey, S.K. Singh, and R.K. Singh, “A multi-channel based illumination compensation mechanism for brightness invariant image retrieval,” Multimedia Tools and Applications, Vol. 74, No. 24, pp. 11223-53, 2015.

Chapter 6

Brightness Invariant Image Retrieval

new color illumination compensated color space RICGICBIC. The RICGICBIC color space is composed of the three channels, namely illumination compensated Red (RIC), illumination compensated Green (GIC) and illumination compensated Blue (BIC). The flowchart of the illumination compensation is illustrated in the Fig. 6.1. The mechanism operates in two phases (1) color intensity reduction and (2) contrast mapping.

Fig. 6.1. Work flow of illumination compensation in RICGICBIC color space.

6.1.1.

Color Intensity Reduction

Color intensity reduction is the first step of the illumination compensation to remove the effect of illumination from the image. We have the initial RGB image 𝑀 as an input image to be processed for illumination compensation. First of all, this method extracts the Red (R), Green (G), Blue (B) and Intensity (I) channels of the RGB and HSI image. To extract I component, the image is transformed into the HSI color space. The range of I is between 0 and 1, whereas the range of R, G and B is between 0 and 𝑙 − 1, where 𝑙 is the number of shades in each channel. To make the same range of each channel, R, G, and B channels are normalized such that their ranges fall between 0 and 1. The normalization is performed by dividing each value of R, G, and B components by 𝑙 − 1. To compensate the illumination, the intensity I is subtracted from the R, G and B channels. Let 𝑓𝑅 (𝑥, 𝑦), 𝑓𝐺 (𝑥, 𝑦) and 𝑓𝐵 (𝑥, 𝑦) be the functions to represent the normalized R, G and B components of the image 𝑀 where 𝑥 and

88

Chapter 6

Brightness Invariant Image Retrieval

𝑦 are the rows and column of any pixel. We generated an illumination removed image 𝑀𝐼 which consists of three components, i.e. intensity reduced Red (𝑅𝐼 ), intensity reduced Green (𝐺𝐼 ) and intensity reduced Blue (𝐵𝐼 ). 𝑅𝐼 , 𝐺𝐼 and 𝐵𝐼 are obtained by reducing I component from R, G and B components respectively as, 𝑅𝐼 = |𝑓𝑅 − 𝐼 , 𝐺𝐼 = |𝑓𝐺 − 𝐼 𝑎𝑛𝑑 𝐵𝐼 = |𝑓𝐵 − 𝐼|

(6.1)

where „| |‟ is the operator to find the absolute value. The I channel of HSI color can be derived from 𝑓𝑅 , 𝑓𝐺 and 𝑓𝐵 as, 𝐼 = (𝑓𝑅 + 𝑓𝐺 + 𝑓𝐵 )/3

(6.2)

From (6.1) and (6.2), 𝑅𝐼 , 𝐺𝐼 and 𝐵𝐼 can be written as, 𝑅𝐼 = |(2𝑓𝑅 − 𝑓𝐺 − 𝑓𝐵 )/3|, 𝐺𝐼 = |(2𝑓𝐺 − 𝑓𝑅 − 𝑓𝐵 )/3| 𝑎𝑛𝑑 𝐵𝐼 = |(2𝑓𝐵 − 𝑓𝑅 − 𝑓𝐺 )/3| (6.3) Let 𝑓𝑅 ′ , 𝑓𝐺 ′ and 𝑓𝐵 ′ are the normalized Red, Green and Blue components of the image 𝑀′ obtained after changes in the illumination of image 𝑀. We represent 𝑓𝑅 ′ , 𝑓𝐺 ′ and 𝑓𝐵 ′ of 𝑀′ by following equations, 𝑓𝑅 ′ = 𝑓𝑅 + 𝑒, 𝑓𝐺 ′ = 𝑓𝐺 + 𝑒 𝑎𝑛𝑑 𝑓𝐵 ′ = 𝑓𝐵 + 𝑒

(6.4)

where 𝑒 is the function of 𝑥 and 𝑦 to represent the difference in each channel caused by the change in the illumination. Note that difference 𝑒 is also uniform for uniform intensity change and also non-uniform for non-uniform intensity change. In the case of uniform illumination change, 𝑒 becomes a constant value. Conceptually, the range of e is –(l-1) to (l-1) where l is the number of shades in each channel of the image. The 𝑅𝐼′ , 𝐺𝐼′ and 𝐵𝐼′ components of the illumination reduced 𝑀𝐼′ using (6.3) is given as, 𝑅𝐼′ = |(2𝑓𝑅 − 𝑓𝐺 − 𝑓𝐵 )/3|, 𝐺𝐼′ = |(2𝑓𝐺 − 𝑓𝑅 − 𝑓𝐵 )/3| 𝑎𝑛𝑑 𝐵𝐼′ = |(2𝑓𝐵 − 𝑓𝑅 − 𝑓𝐺 )/3| (6.5) From (6.3) and (6.5), it is observed that, 𝑅𝐼′ = 𝑅𝐼 , 𝐺𝐼′ = 𝐺𝐼 𝑎𝑛𝑑 𝐵𝐼′ = 𝐵𝐼

(6.6)

𝑀𝐼′ = 𝑀𝐼

(6.7)

Or we can say that,

whereas, from (6.4), it can be pointed out that, 𝑀′ ≠ 𝑀 𝑖𝑓 𝑒 ≠ 0

(6.8)

From (6.7) and (6.8), it is deduced that the images having illumination difference is not same, whereas after color intensity reduction it becomes same (i.e. the effect of either uniform or non-uniform illumination is removed).

89

Chapter 6

6.1.2.

Brightness Invariant Image Retrieval

Contrast Mapping

It is observed that the range of each channel is reduced after color intensity subtraction phase. To map the range of each channel RI, GI and BI back between 0 and 𝑙 − 1, a contrast stretching strategy is adopted here. We refer the outputs of this step for RI, GI and BI as illumination compensated Red (RIC), illumination compensated Green (GIC) and illumination compensated Blue (BIC). The mapping function used in this chapter for contrast stretching is given as, 𝑅𝐼𝐶 = (𝑅𝐼 − 𝑙𝑏) × (𝑙 − 1)/(𝑢𝑏 − 𝑙𝑏),

𝐺𝐼𝐶 = (𝐺𝐼 − 𝑙𝑏) × (𝑙 − 1)/(𝑢𝑏 − 𝑙𝑏)

𝑎𝑛𝑑 𝐵𝐼𝐶 = (𝐵𝐼 − 𝑙𝑏) × (𝑙 − 1)/(𝑢𝑏 − 𝑙𝑏)

(6.9)

where lb and ub are the lower and upper bound for contrast stretching and given by, 𝑙𝑏 = 𝑚𝑖𝑛 𝑙𝑏𝑅𝐼 , 𝑙𝑏𝐺𝐼 , 𝑙𝑏𝐵𝐼 𝑎𝑛𝑑 𝑢𝑏 = 𝑚𝑎𝑥(𝑢𝑏𝑅𝐼 , 𝑢𝑏𝐺𝐼 , 𝑢𝑏𝐵𝐼 )

(6.10)

where „min‟ and „max‟ are the operators to find the minimum and maximum values in a set of values respectively. 𝑙𝑏𝑅𝐼 , 𝑙𝑏𝐺𝐼 and 𝑙𝑏𝐵𝐼 represents the bottom 1% of all pixel values of RI, GI and BI respectively. 𝑢𝑏𝑅𝐼 , 𝑢𝑏𝐺𝐼 and 𝑢𝑏𝐵𝐼 represents the top 1% of all pixel values of RI, GI and BI respectively. It should be noted that it is a linear function with 𝑙𝑏 and 𝑢𝑏 is mapped to the 0 and 𝑙 − 1 respectively. The effect of intensity subtraction and contrast stretching can be visualized in the Fig. 6.3-6.4 for the images having illumination differences taken from the Phos illumination benchmark dataset [178]. In Fig. 6.3, uniform illumination change is considered, whereas non-uniform illumination difference is considered in Fig. 6.4. It can be easily observed that the images generated after the illumination compensation are having nearly same illumination irrespective of the amount of change in the illumination in their original images.

st

Fig. 6.2. Visualization of the illumination compensation steps (1 row) original images having

uniform illumination differences, (2nd row) intensity subtracted images, and (3rd row) contrast stretched images. This example image is taken from the Phos database [178].

90

Chapter 6

Brightness Invariant Image Retrieval

Fig. 6.3. Visualization of the illumination compensation steps (a) original images having non-

uniform illumination differences, (b) intensity subtracted images, and (c) contrast stretched images. This example image is taken from the Phos database [178].

Image Database

Illumination Reduction

Contrast Stretching

Feature Extraction

Feature Database

Query Image

Illumination Reduction

Contrast Stretching

Feature Extraction

Similarity Measure

Retrieved Images

Fig. 6.4. Image retrieval using illumination compensation mechanism.

6.2. Brightness Invariant Image Retrieval In this section, we describe the image retrieval using illumination compensation which is inherently illumination invariant. Fig. 6.4 shows the steps involved to perform this task. The whole process consists of three steps (1) illumination compensation, (2) feature extraction, and (3) similarity measure and retrieval. In this chapter, we extracted 6 different color or texture features, namely Global Color Histogram (GCH) [115], Color Coherence Vector (CCV) [116], Border-Interior Classification (BIC) [120], Color Difference Histogram (CDH) [102], Structure Element Histogram (SEH) [24] and Square Symmetric Local Binary Pattern (SSLBP) [118] and performed the experiments with each feature individually. We refer GCH, CCV, BIC, CDH, SEH and SSLBP features extracted over illumination compensated image as GCHIC, CCVIC, BICIC, CDHIC, SEHIC and SSLBPIC respectively in this chapter. Whereas, features extracted without illumination compensation are denoted by the GCH, CCV, BIC, CDH, SEH and SSLBP.

6.3. Similarity Measures and Evaluation Criteria The main goal of image retrieval is to find the most similar images of an image from a database of images. The similar images are extracted on the basis of the similarity score

91

Chapter 6

Brightness Invariant Image Retrieval

between the descriptor of query image and database images. Let, FD = {fd1, fd2, … , fddim} is the descriptor for the images in the database and TD = {td1, td2, … , tddim} is the descriptor of the query image where dim is the dimension of the descriptor. Various performance measures produce different similarity score for the same set of image descriptors. Two distance measures used in SEH [24] and CDH [102] are adopted here. The distance metric used in [24] is defined as, 𝑑𝑖𝑚

𝑓𝑑𝑤 − 𝑡𝑑𝑤 1 + 𝑓𝑑𝑤 + 𝑡𝑑𝑤

(6.11)

𝑓𝑑𝑤 − 𝑡𝑑𝑤 𝑓𝑑𝑤 + 𝑓𝑑𝑤 + |𝑡𝑑𝑤 + 𝑡𝑑𝑤 |

(6.12)

𝑑𝑆𝐸𝐻 𝑇𝐷, 𝐹𝐷 = 𝑤=1

The distance metric used in [102] is defined as, 𝑑𝑖𝑚

𝑑𝐶𝐷𝐻 𝑇𝐷, 𝐹𝐷 = 𝑤 =1

where 𝑓𝜇 =

𝑑𝑖𝑚 𝑤 =1 𝑓𝑑𝑤 /𝑑𝑖𝑚

and 𝑡𝜇 =

𝑑𝑖𝑚 𝑤=1 𝑡𝑑𝑤 /𝑑𝑖𝑚 .

The performances of features extracted with and without illumination compensation are compared using these two distance measures in the experiments. The average retrieval precision (ARP) and average retrieval rate (ARR) defined in section 4.2.2 is also adopted in this chapter to report the retrieval results. In this chapter, the ARP is plotted against the ARR in a single plot. The number of retrieved images is considered from 5 to 15 in an interval of 1 for Phos dataset and from 5 to 40 in an interval of 5 for Corel-uniform and Corel-non-uniform datasets.

(a)

(b) Fig. 6.5. Sample images from (a) Corel-uniform and (b) Corel-non-uniform dataset.

92

Chapter 6

Brightness Invariant Image Retrieval

6.4. Experiments and Results This section presents the result obtained by applying various descriptors with and without illumination compensation for image retrieval. We test the robustness of proposed illumination compensation mechanism under varying illumination condition. In this section, first we discuss about the illumination datasets which is used for evaluation and then present the experimental result with discussion. 6.4.1.

Datasets

In order to evaluate the proposed approach, a standard Phos natural illumination database and two Corel synthesized illumination databases are used in this chapter for image retrieval. The Phos database consists of the 15 different categories with 15 images per category having different degrees of uniform (9 images) and non-uniform illumination (6 images) [178-180]. Some sample images of Phos dataset are already shown in Fig. 6.2-6.3. We also synthesized two datasets Corel-uniform and Corel-non-uniform from the Corel-1k dataset. The Corel-1k dataset is obtained from [6] and it contains 10 categories. To synthesize the Corel-uniform dataset, we selected first 20 images in each category of Corel-1k and generated 5 new images from each by adding -60, -30, 0, 30, and 60 in each channel of the original image. The number of images in each category of the Corel-uniform becomes 100 (20 image × 5 different intensities of uniform illumination). Fig. 6.5(a) shows the sample images from the Coreluniform dataset. We also synthesized Corel-non-uniform from the same 20 images per category of the Corel-1k dataset. The five degrees of different non-uniform illumination is adopted to generate 5 images of each original image including original one. Let 𝑖𝑚 𝑥, 𝑦, 𝑡 be the color image of size 𝑢 × 𝑣, where the value of 𝑡 is 1, 2, and 3 for red, green and blue components respectively and 𝑥 and 𝑦 are the rows and columns of any pixel. The 5 images 𝑖𝑚𝑤 for 𝑤 = −2, −1,0,1, and 2 are generated by the following equation, 𝑖𝑚𝑤 𝑥, 𝑦, 𝑡 = 𝑖𝑚 𝑥, 𝑦, 𝑡 − 𝑤 × 𝑥 × 𝑤× 𝑣−𝑥 ×

30 𝑢

− 𝑤× 𝑣−𝑦 ×

30 𝑣

20 𝑣

− 𝑤×𝑦×

20 𝑢



(6.13)

Now, Corel-non-uniform dataset is having 10 categories with 100 images per category (20 image × 5 different degrees of non-uniform illumination) (See Fig. 6.5(b)). 6.4.2.

Experimental Results

In image retrieval a query image is matched with all images in a database and most appropriate similar images are returned on the basis of certain features. Matching two similar images having some geometric and photometric differences are still problematic. The main

93

Chapter 6

Brightness Invariant Image Retrieval

aim of illumination compensation is to match the images under varying uniform and nonuniform illumination.

0.4

0.35

0.3 0.3

0.25

0.25

0.2 GCH using dSEH

0.15

GCHIC using dSEH

0.1

0

0.1

0.12

0.14

0.16

0.18

0

0.2

0.12

0.14

ARR

0.16

0.18

0.2

BIC IC using dSEH BIC using dCDH

0.05

CCV IC using dCDH 0.1

BIC using dSEH

0.1

CCV using dCDH

0.08

0.2 0.15

CCV IC using dSEH

0.05

GCHIC using dCDH 0.08

CCV using dSEH

0.1

GCH using dCDH

0.05

0.2 0.15

ARP

0.25

ARP

ARP

0.35

0.35

0.3

0

0.22

BIC IC using dCDH 0.08

0.1

ARR

0.12

0.14

0.16

0.18

0.2

ARR

0.5 CDH using dSEH

CDHIC using dCDH

0.45 0.4 0.35

SEH using dCDH SEHIC using dCDH

0.35

SSLBP IC using dSEH SSLBP using d CDH

0.35

SSLBP IC using dCDH

0.3

0.3 0.25

0.3 0.25

0.25 0.2

0.4

SEHIC using dSEH

0.4

ARP

ARP

0.45

CDH using dCDH

0.5

SSLBP using d SEH

SEH using dSEH

CDHIC using dSEH

ARP

0.6 0.55

0.15

0.2

0.25

0.3

0.35

0.4

0.45

ARR

0.5

0.2

0.2

0.14

0.16

0.18

0.2

ARR

0.22

0.24

0.26

0.14

0.16

0.18

0.2

0.22

0.24

ARR

Fig. 6.6. Results in terms of ARP and ARR curves for different features with and without

illumination compensation using dSEH and dCDH similarity measures over Phos illumination benchmark dataset.

a) Results on Phos dataset We performed the image retrieval experiment over the complete set of Phos illumination benchmark and precision-recall curves are presented in terms of average precision rate (APR) and average recall rate (ARR) in Fig. 6.6 using GCH, CCV, BIC, CDH, SEH and SSLBP features. We used two similarity measures, namely dSEH and dCDH to test the effect of different distances over proposed compensation method. Both ARP and ARR values are higher using both distances for each feature when images are preprocessed with the proposed approach to remove the effect of illumination. It is understood that the degree of improvement in GCH IC, CCVIC, BICIC and CDHIC are higher as compared to the SEHIC and SSLBPIC. The results of SSLBPIC are not too improved, but it is still better than the results of SSLBP. To visualize the effect of the proposed mechanism, top 10 retrieved images similar to a query image of Phos dataset are shown in Fig. 6.7 by each feature using dCDH distance. GCH, CCV and BIC features are able to retrieve only two images of the same category (Fig. 6.7(b, d, f)) whereas GCHIC, CCVIC and BICIC retrieved 4 similar images, each with 40% precision (Fig. 6.7(c, e, g)). Using CDH feature 3 similar images are returned whereas it is 7 using CDHIC feature (see Fig. 6.7(h-i)). Using both SEH and SEHIC, 5 images are retrieved, but the rank of returned images is better in the case of SEHIC (see Fig. 6.7(j-k)). It is observed that 6th image among returned images using SEH is similar to the query image and it should rank before the 5 th image which is from the different category whereas it is successfully achieved using SEH IC.

94

Chapter 6

Brightness Invariant Image Retrieval

Using SSLBP and SSLBPIC, 4 and 6 similar images are found by the retrieval experiment as displayed in Fig. 6.7(l-m). Most number of similar images are retrieved by the CDHIC feature. From the retrieval results of Fig. 6.7, it is concluded that each descriptor retrieve more number of similar images having different brightness when it is operated with the proposed method.

Fig. 6.7. Retrieval results using each feature to a query image of Phos dataset.

95

Chapter 6

Brightness Invariant Image Retrieval

b) Results on Corel-uniform dataset We also tested the discriminative ability of different features and the robustness of the proposed method over Corel-uniform illumination synthesized dataset. The results are depicted in the Fig. 6.8 using ARP and ARR values using both the similarity measures. An outstanding performance improvement is reported using CDH feature over uniform illumination. All features gained impressive positive result except SSLBPIC in this case, but the result of SSLBPIC is still comparable. We also retrieved the top 10 similar images to a query image of type flower to visualize the effect of introduced approach using d CDH distance measure from Corel-uniform dataset (see Fig. 6.9). The query image is shown in Fig. 6.9(a). In the Corel-uniform dataset, there are 5 instances of the query image with varying brightness, whereas total 100 relevant images are present in the dataset. Using GCH feature, only three images are retrieved from the same category whereas using GCHIC all 10 images returned are of the flower category (Fig. 6.9(b-c)). Another critical observation is the successfully retrieval of all 5 instances of the query image by GCHIC. CCV and BIC features also failed to retrieve the all instances and using both only 4 images are retrieved from the same category (Fig. 6.9(d, f)), whereas CCVIC and BICIC are able to retrieve all instances of the query image with 100% precision (Fig. 6.9(e, g)). CDH feature also fails to retrieve all instances and 5 similar images are retrieved by it with only 50% precision as depicted in Fig. 6.9(h) which is successfully overcome by CDHIC by retrieving all instances with 100% retrieval precision as displayed in Fig. 6.9(i). Three instances of query image are returned by the retrieval system using SEH texture feature with 50% precision, whereas SEHIC retrieved all instances with 50% precision (see Fig. 6.9(j, k)). In this case the precision is same for both SEH and SEH IC but using SEHIC more semantically correct images are retrieved. 0.9

0.9

ARP

ARP

0.6

CCV using d CDH

0.7

CCV IC using dCDH

0.6 0.5

0.5

0.4

0.4

0.06

0.08

0.1

0.12

0.14

0.16

0.04

0.06

0.08

ARR

0.1

0.12

0.14

0.16

BIC IC using dCDH

0.04

0.18

0.06

0.08

0.1

0.12

0.14

0.16

0.18

ARR

ARR

1

1 CDHIC using dSEH

0.8

CDH using d CDH CDHIC using dCDH

0.7 0.6

SEH using d SEH

0.9

SSLBP IC using dSEH

0.9

SEH using d CDH

0.8

SEHIC using dCDH

0.75 0.7 0.65 0.6

0.5

SSLBP using d SEH

SEHIC using dSEH

0.85

ARP

0.9

SSLBP using d CDH

ARP

CDH using d SEH

ARP

BIC using d CDH

0.6

0.4

0.04

BIC IC using dSEH

0.7

0.5

0.3

BIC using d SEH

0.8

ARP

GCH using dCDH GCHIC using dCDH

CCV IC using dSEH

0.8

GCHIC using dSEH

0.7

0.9

CCV using d SEH

GCH using dSEH 0.8

SSLBP IC using dCDH

0.8 0.7 0.6

0.55

0.4 0.05

0.1

0.15

ARR

0.2

0.5

0.05

0.1

0.15

ARR

0.2

0.5

0.05

0.1

0.15

0.2

ARR

Fig. 6.8. ARP and ARR curves for different features with and without illumination

compensation using dSEH and dCDH similarity measures over Corel-uniform illumination synthesized dataset.

96

Chapter 6

Brightness Invariant Image Retrieval

Fig. 6.9. Retrieval results using each feature from the Corel-uniform dataset.

The performance of SSLBP feature is good in this example, here using this feature, the system is able to gain 100% precision, but only 4 instances are retrieved using this feature as shown in Fig. 6.9(l). The performance of SSLBP is good, but its performance got boosted if it

97

Chapter 6

Brightness Invariant Image Retrieval

is generated after our illumination compensation preprocessing step (i.e. SSLBPIC) where 100% precision is earned with all instances of query image is also retrieved successfully (see Fig. 6.9(m)). It is deduced that the performance of each feature descriptor is enhanced in an attractive amount when applied with proposed retrieval system over the Corel-uniform dataset. c) Results on Corel-non-uniform dataset We also carried an experiment over Corel-non-uniform dataset using each feature derived with (i.e. in RICGICBIC color space) and without (i.e. in RGB color space) illumination compensation. The results are demonstrated in the Fig. 6.10 for each feature using both distance measures. In this case also each feature in proposed RICGICBIC color space performed outstanding with the highest improvement in the CDHIC feature. It should be noted that, the performance of SSLBPIC was not better than SSLBP in uniform illumination case, but it is reversed in this non-uniform illumination case (i.e. in this case SSLBPIC is performing better than the SSLBP). Fig. 6.11(b-m) shows the top 10 images retrieved by each feature to a query image of type elephant displayed in Fig. 6.11(a) from Corel-non-uniform dataset using dCDH distance. It should be noted that, there are total 100 similar images to the query image in the dataset among which 5 images are the just different instances of the query image with varying degree of non-uniform illumination difference. The retrieval precision achieved using each GCH, CCV and BIC features are just 20%, whereas it is 90%, 100% and 100% for GCHIC, CCVIC and BICIC (see Fig. 6.11(b-g)). CCVIC and BICIC are able to retrieve all instances of the query image while 4 instances are retrieved using GCHIC.

GCH using dSEH

0.6

ARP

ARP

GCHIC using dCDH

0.5

BIC IC using dSEH 0.8

CCV using d CDH CCV IC using dCDH

0.7 0.6 0.5

0.4 0.06

0.08

0.1

0.12

0.14

0.16

0.18

BIC IC using dCDH

0.6

0.4 0.04

0.06

0.08

ARR

0.1

0.12

0.14

0.16

0.18

0.04

0.06

0.08

ARR

0.1

0.12

0.14

0.16

0.18

ARR

1

1 CDHIC using dSEH

0.8

CDH using d CDH CDHIC using dCDH

0.7

SEH using d SEH

0.9

SEHIC using dCDH

0.7

0.6

0.4 0.05

0.1

0.15

ARR

0.2

0.5

SSLBP IC using dSEH

0.9

SEH using d CDH

0.6 0.5

SSLBP using d SEH

SEHIC using dSEH

0.8

ARP

0.9

SSLBP using d CDH

ARP

CDH using d SEH

ARP

BIC using d CDH

0.7

0.5

0.4 0.04

BIC using d SEH

0.9

CCV IC using dSEH

0.8

GCH using dCDH 0.7

CCV using d SEH

0.9

GCHIC using dSEH

0.8

ARP

0.9

SSLBP IC using dCDH

0.8 0.7 0.6

0.05

0.1

0.15

ARR

0.2

0.5

0.05

0.1

0.15

0.2

ARR

Fig. 6.10. ARP and ARR results for different features with and without illumination

compensation using dSEH and dCDH similarity measures over Corel-non-uniform synthesized dataset.

98

Chapter 6

Brightness Invariant Image Retrieval

Fig. 6.11. Image retrieval results overt Corel-non-uniform dataset using dCDH distance.

Both CDH and CDHIC obtained 70% precision by retrieving 7 images of the elephant category, but only two instances of query image are returned using CDH whereas all five instances are returned by the CDHIC (Fig. 6.11(h-i)). The nearly same situation also depicted in Fig. 6.11(j-k) for the case of SEH and SEHIC where both gained 50% precision, but SEH

99

Chapter 6

Brightness Invariant Image Retrieval

retrieved 4 instances while SEHIC retrieved all five instances. Similar scenario also arises with SSLBP and SSLBPIC where for both, systems returned 8 similar images, but the output images using SSLBPIC are more semantically accurate by retrieving all instances of a query image at different non-uniform illumination. In the case of Corel-non-uniform, all the features in proposed illumination compensated color space are more robust and perform better. Using GCH

Using CCV Proposed SQI PS+HE DCT_LD

0.8

0.8 0.7

ARP

ARP

0.7 0.6

Using BIC Proposed SQI PS+HE DCT_LD

0.9

0.8 0.7

0.6 0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.1

0.3 0.05

0.15

0.1

0.15

0.03

0.2

0.06

0.09

0.12

0.15

0.18

ARR Using SSLBP

ARR Using SEH

ARR Using CDH

1

1 Proposed SQI PS+HE DCT_LD

0.8

0.8

ARP

0.7 0.6

Proposed SQI PS+HE DCT_LD

0.9

0.5

0.7 0.6

0.8

0.5 0.4 0.08

0.11

0.14

0.17

0.2

0.3

0.23

0.6

0.4

0.3 0.05

0.7

0.5

0.4

0.02

Proposed SQI PS+HE DCT_LD

0.9

ARP

0.9

ARP

0.6

0.5

0.05

Proposed SQI PS+HE DCT_LD

0.9

ARP

0.9

0.05

0.1

ARR

0.15

0.05

0.2

0.1

0.15

0.2

ARR

ARR

Fig. 6.12. Comparison between proposed illumination compensation method and existing

illumination compensation methods in terms of the retrieval performance over Corel-nonuniform dataset using GCH, CCV, BIC, CDH, SEH and SSLBP feature descriptors. 1

1 LBP LBP IC

0.9

1 CSLBP CSLBP IC

0.9

0.7

0.8

ARP

ARP

ARP

0.8 0.8

0.7 0.6

0.5 0.05

0.5

0.4 0.1

0.15

0.05

0.2

0.1

0.15

0.4

0.2

0.05

0.1

ARR

ARR 1

CSLTP CSLTP IC

0.8

0.8

ARP

ARP

0.2

HOG HOGIC

0.9

0.7

0.8

0.15

ARR 1

0.9

LTP LTP IC

0.9

ARP

0.7 0.6

0.5

0.6

LIOP LIOPIC

0.9

0.6

0.7

0.7 0.5

0.6

0.6 0.4

0.5

0.05

0.1

0.15

0.2

0.025

ARR

0.5 0.05

0.075

0.1

0.125

ARR

0.15

0.175

0.05

0.1

0.15

0.2

ARR

Fig. 6.13. Performance evaluation of the proposed illumination compensation approach using

illumination invariant feature descriptors LBP, CSLBP, LIOP, LTP, CSLTP and HOG over Corel-non-uniform dataset.

6.5. Comparison and Analysis In this section, first we compare the proposed illumination compensation method with existing illumination compensation methods, after that we test the efficiency of the proposed approach using illumination invariant descriptors.

100

Chapter 6

6.5.1.

Brightness Invariant Image Retrieval

Comparison with Existing Illumination Compensation Methods

To demonstrate the efficiency and discriminative ability of the proposed illumination compensation mechanism using color intensity reduction and contrast mapping, we compared its performance with the performance of existing illumination compensation mechanisms such as self-quotient image (SQI) [181-182], plane subtraction and histogram equalization (PS+HE) [183-184] and discrete cosine transform in the logarithmic domain (DCTLD) [134]. Fig. 6.12 illustrates the precision-recall curve by applying GCH, CCV, BIC, CDH, SEH and SSLBP feature descriptors with proposed, SQI, PS+HE and DCT LD illumination compensation methods over Corel-non-uniform dataset using the dCDH distance measure. The performance of the proposed method is too much improved with the CDH descriptor because our method is based on the color information and CDH is also based on the color information. From the experimental results, it is clear that the proposed approach outperforms the existing approaches for illumination invariant image retrieval task. 6.5.2.

Performance Evaluation using Illumination Invariant Descriptors

We also tested our approach with illumination invariant descriptors such as local binary pattern (LBP) [16], center symmetric local binary pattern (CSLBP) [17], local intensity order pattern (LIOP) [12], local ternary pattern (LTP) [18], center symmetric local ternary pattern (CSLTP) [11] and histogram of oriented gradients (HOG) [185] because these descriptors already care about the illumination. Fig. 6.13 presents the precision-recall plot over Corelnon-uniform dataset using different illumination invariant descriptor with and without our illumination compensation step using the dCDH distance measure. We observed across the plots that the performance of each illumination invariant descriptor is improved significantly in conjunction with our illumination reduction step as compared to the without using our method. It concludes that although some descriptors are illumination invariant in nature, but can‟t handle the complex illumination difference which required an efficient illumination compensation mechanism. It is also observed that the feature descriptors with low dimension such as CSLBP and CSLTP are having more improvement as compared to the feature descriptors with high dimension. 6.5.3.

Performance Evaluation using the Descriptors Proposed in Chapter 5

The significance of proposed illumination compensation approach is also tested with the color descriptors namely RSHD, LCOD, maLBP and mdLBP proposed in Chapter 5. The ARP over Phos database is presented in Fig. 6.14 using these descriptors with and without illumination compensation. It is clearly demonstrated that the performance of these descriptors are improved significantly with illumination compensation over Phos illumination database.

101

Brightness Invariant Image Retrieval

0.6

0.6

0.5

0.5

ARP

ARP

Chapter 6

0.4

0.3

0.3 RSHD RSHDic

0.2 0.1

0.4

0.15

0.2

0.25

0.3

0.35

LCOD LCODic

0.2 0.4

0.1

0.15

0.2

ARR

(a) maLBP maLBPic

0.35

0.4

mdLBP mdLBPic

0.35

0.3

0.3

ARP

ARP

0.3

(b)

0.35

0.25

0.2

0.15 0.1

0.25

ARR

0.25

0.2

0.12

0.14

0.16

0.18

ARR

0.15

0.1

0.12

0.14

0.16

0.18

0.2

ARR

(c)

(d)

Fig. 6.14. Performance evaluation of the proposed illumination compensation approach using

the a) RSHD, b) LCOD, c) maLBP and d) mdLBP descriptors proposed in Chapter 5 over Phos illumination database.

6.6. Summary An illumination invariant color space transformation is presented in this chapter. The new illumination compensated color space RICGICBIC is generated by first removing the intensity information from the R, G and B channels of RGB image and after by contrast stretching. Introduced multi-channel based illumination compensation mechanism can be seen as a preprocessing step to remove the effect of illumination variations over the image. The proposed approach is generic and can be used with most of the color feature descriptors, but applicable only to those scenarios where illumination robust image matching is required. We tested the performance in image retrieval problem using 6 state-of-the-art features over 3 datasets having images with varying illuminations including standard Phos illumination benchmark. The proposed approach outperforms the existing illumination compensation approaches. The performances of illumination invariant feature descriptors are also boosted in conjunction with the proposed illumination compensation mechanism. The experimental results point out the robustness of the proposed mechanism and suggest that it can be successfully utilized as a pre-processing to compensate the illumination.

102

Chapter 7 Boosting Local Descriptors with BoF and SVD Local descriptors are mostly extracted over the raw intensity values of the image and require the analysis of patterns in local pixel neighborhoods. The raw images don‟t provide much local relationship information required for the local descriptors. To resolve this problem, a new method of local image feature description is proposed in this chapter based on the patterns extracted from multiple filtered images. The whole process is structured as follows: First, the images are filtered with the bag of filters (BoF) and then local binary pattern (LBP) is computed over each filtered image and finally concatenated to find out the single BoF-LBP descriptor. The local information enriches in the filtered images improves the discriminative ability of the final descriptor. Content based image retrieval (CBIR) experiments are performed to observe the effectiveness of the proposed approach and compared with the recent state-of-the-art methods. The experiments are conducted over four benchmark databases having natural and texture images such as Corel-1k, Corel-10k, MIT-VisTex and STex-512S. The experimental results confirm that the introduced approach is able to improve the retrieval performance of the CBIR system. We also preprocessed the images in the form of 4 sub-bands (i.e. S, U, V, and D sub-bands) which are obtained by applying the Singular Value Decomposition (SVD) over the original image. The local descriptors are computed over these sub-bands (mainly S sub-band) and termed as the SVD based local descriptors. The performance of four local descriptors over SVD sub-bands is tested for near-infrared face retrieval using PolyU-NIR and CASIA-NIR face databases and compared with the results obtained without using SVD. The experimental results confirm the superiority of using S subband of SVD in terms of performance of the local descriptors for NIR face retrieval.



The content of this chapter is published/communicated in the following research articles:  S.R. Dubey, S.K. Singh, and R.K. Singh, “Boosting Local Binary Pattern with Bag-of-Filters for Content Based Image Retrieval,” IEEE UP Section Conference on Electrical, Computer and Electronics (UPCON), 2015. (Best Paper Award)  S.R. Dubey, S.K. Singh, and R.K. Singh, “Boosting Performance of Local Descriptors with SVD Sub-band for Near-Infrared Face Retrieval,” IET Image Processing. (Submitted in Revised Form)

Chapter 7

Image Database

Boosting Local Descriptors with BoF and SVD

Filtering with Bag-ofFilters

Query Image

I1

LBP Operator

LBP1

I2

LBP Operator

LBP2

Ik

LBP Operator

LBPk

Retrieval

Similarity Measurement

Concatenation of All LBPs

Final BoF-LBP Descriptor

Fig. 7.1. The working framework of the proposed Content Based Image Retrieval (CBIR)

system using Bag-of-Filters and Local Binary Pattern (LBP).

The rest of this chapter is organized as follows: Section 7.1 proposes a Bag-of-Filtered Local Binary Pattern and validates through image retrieval experiments over natural and textural databases; Section 7.2 introduces the Singular Value Decomposition based Local Descriptors and reports the result over NIR face database; and finally concluding remarks are highlighted in Section 7.3.

7.1. Natural and Textural Image Retrieval using BoF-LBP 7.1.1.

Methodology

The schematic diagram of the proposed CBIR system is presented in the Fig. 7.1. First, images are processed by Bag-of-Filters (BoF) to obtain the multiple filtered images which are having different crucial information such as edges, corners, etc. Basically, these filters enhance the local information contained in the image. In order to encode such information locally in the descriptor form, Local Binary Pattern (LBP) operator is applied over each filtered image and finally all descriptors are concatenated to construct the final BoF-LBP feature descriptor. On the basis of the proposed descriptor the query image is matched with the database images by finding the distance between the BoF-LBP descriptor of query image and database images. The most relevant images of the query image are retrieved from the database on the basis of the shortest distances between the query image and database images. Let, 𝐼 is a gray scaled image of dimension 𝑚𝑥 × 𝑚𝑦 (i.e. 𝐼 is having 𝑚𝑥 rows and 𝑚𝑦 columns) over which we want to compute the BoF-LBP descriptor. Any particular pixel (𝑖, 𝑗) of image 𝐼 is denoted by 𝐼 𝑖,𝑗 . Consider 𝐹𝑎 |𝑎=1,2,…,𝑘 is the mask for 𝑎𝑡ℎ filter of BoF. Let the image 𝐼𝑎 is the filtered image obtained by applying the 𝑎𝑡ℎ filter over the image 𝐼 , and defined by the following formulae, 𝐼𝑎 = 𝐼 ∗ 𝐹𝑎 104

(7.1)

Chapter 7

Boosting Local Descriptors with BoF and SVD

where the operator „*‟ represents the convolution operation over image 𝐼 by mask 𝐹𝑎 . The 𝐼𝑎 basically highlights the important features of the 𝐼 which are having characteristics 𝐹 𝐹 of the filter mask 𝐹𝑎 . Note that the dimensions of 𝐼 and 𝐹𝑎 are 𝑚𝑥 × 𝑚𝑦 and 𝑚𝑥,𝑎 × 𝑚𝑦,𝑎 𝐼 𝐼 respectively, then the dimension of 𝐼𝑎 (i.e. 𝑚𝑥,𝑎 × 𝑚𝑦,𝑎 ) is, 𝐼 𝐹 𝑚𝑥,𝑎 = 𝑚𝑥 − 𝑚𝑥,𝑎 +1

𝐼 𝐹 and 𝑚𝑦,𝑎 = 𝑚𝑦 − 𝑚𝑦,𝑎 +1

(7.2)

Now, in order to encode the local characteristics of the image 𝐼𝑎 , the Local Binary Pattern (LBP) [16] operator is computed over it and the resulting image is represented by the 𝐿𝐵𝑃𝑎 which is having dimension of 𝑚𝑎𝐿𝐵𝑃 × 𝑛𝑎𝐿𝐵𝑃 . Let, the 𝐿𝐵𝑃𝑎 value computed for any particular 𝑖,𝑗

𝑖,𝑗

𝑖,𝑗

pixel (𝑖, 𝑗) of 𝐼𝑎 (i.e. for 𝐼𝑎 ) is represented by 𝐿𝐵𝑃𝑎 . The 𝐿𝐵𝑃𝑎 is calculated in following manner, 𝑖,𝑗

𝐿𝐵𝑃𝑎 =

𝑖,𝑗 ,𝜔 𝑁 𝜔 =1 𝐿𝐵𝑃𝑎

× 2

𝜔−1

(7.3)

where 𝑁 is the number of local neighbors of a pixel (𝑖, 𝑗) at a circle of radius 𝑅 from the (𝑖, 𝑗) 𝑖 ,𝑖𝜔

and spaced equally. If 𝐼𝑎𝜔

𝑖,𝑗

is the intensity value of the 𝜔𝑡ℎ neighbor of 𝐿𝐵𝑃𝑎 then its

coordinates (𝑖𝜔 , 𝑖𝜔 ) are given as follows, 𝑖𝜔 = 𝑖 − 𝑅 × 𝑠𝑖𝑛 𝑖,𝑗 ,𝜔

The 𝐿𝐵𝑃𝑎

𝜔−1 ×

2𝜋 𝑁

𝑎𝑛𝑑

𝑖,𝑗 ,𝜔

9

1 1 1

1 1 1

1 1 1

𝜔−1 ×

2𝜋 𝑁

(7.4)

is the binary value obtained as follows, 𝐿𝐵𝑃𝑎

1

𝑗𝜔 = 𝑗 + 𝑅 × 𝑐𝑜𝑠

0 −1 0

(a) 𝐹1

−1 4 −1

0 −1 0

= 1 0

−1 0 −1

(b) 𝐹2

0 4 0

𝑖 ,𝑖𝜔

𝑖𝑓 𝐼𝑎𝜔 𝐸𝑙𝑠𝑒

−1 0 −1

(c) 𝐹3

𝑖,𝑗

≥ 𝐼𝑎

1 0 −1

(7.5)

2 0 −2

1 0 −1

(b) F4

1 2 1

0 0 0

−1 −2 −1

(c) F5

Fig. 7.2. The five types filters used in this chapter as the Bag-of-Filters, (a) Average filter, i.e.

𝐹1 , (b) Horizontal-vertical difference filter, i.e. 𝐹2 , (c) Diagonal filters, i.e. 𝐹3 , (d) Sobel edge in vertical direction, i.e. 𝐹4 , and (e) Sobel edge in horizontal direction, i.e. 𝐹5 .

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 7.3. (a) An example image, (b-f) the image obtained after applying the 5 filters with mask

𝐹𝑎 |𝑎=1,2,3,4,5 respectively over the example image of (a).

105

Chapter 7

Boosting Local Descriptors with BoF and SVD

The final BoF-LBP feature vector is calculated by concatenating the LBP feature vector computed over each filtered image and given as follows, 𝐵𝑜𝐹-𝐿𝐵𝑃 = [𝐿𝐵𝑃1,ℎ , 𝐿𝐵𝑃2,ℎ , … , 𝐿𝐵𝑃𝑘,ℎ ]

(7.6)

where 𝐿𝐵𝑃𝑎,ℎ is the histogram of 𝐿𝐵𝑃𝑎 (i.e. LBP map over the 𝑎𝑡ℎ filtered image) and determined using the following formulae, 𝐼 −𝑅 𝑚 𝐼 −𝑅 𝑚 𝑥,𝑎 𝑦 ,𝑎

𝑖,𝑗

𝐿𝐵𝑃𝑎,ℎ =

𝜉 𝐿𝐵𝑃𝑎 , 𝛾

(7.7)

𝑖=𝑅+1 𝑗 =𝑅+1

for ∀ 𝛾 ∈ [0, 2𝑁 − 1], where, 𝜉(𝛼, 𝛽) is given as, 𝜉(𝛼, 𝛽) =

1, 0,

𝑖𝑓 𝛼 = 𝛽 𝐸𝑙𝑠𝑒

(7.8)

Finally, the 𝐵𝑜𝐹 -𝐿𝐵𝑃 feature vector is normalized such that the range of its values become 0 to 1. In this chapter, we have used 𝑘 = 5 different type of filter masks as the Bagof-Filters which are capturing varying range of information. The five filters used here are Average, Horizontal-vertical difference, Diagonal difference, Sobel edge in vertical direction and Sobel edge in horizontal direction (Refer [141] for more detail about these filters). Fig. 7.2 presents the 3×3 mask for these filters. The images obtained by convolving the five considered filter mask over an example image are depicted in Fig. 7.3. It can be observed that the Average filter (i.e. 𝐹1 ) gives the low frequency information (i.e. smooth variations), whereas, the remaining filters (i.e. 𝐹𝑎 |𝑎=2,3,4,5 ) provide the high frequency oriented information (i.e. edges in particular directions). The combination of both types of filters increases the discriminating ability of 𝐵𝑜𝐹-𝐿𝐵𝑃 descriptor. 7.1.2.

Experiments, Results and Discussions

In this subsection, first we discuss the databases used for the experiments, then the distance measures and evaluation criteria are explained and finally the experimental results are presented with discussions. Four benchmark and widely adopted databases, namely Corel-1k [6], Corel-10k [175], MITVis-Tex [176] and STex-512S [177] are used for the image retrieval experiments here. Corel-1k and Corel-10k are the natural databases, whereas, MITVis-Tex and STex-512S are the texture databases. We have converted all the images into the grayscale if it is a color image for the extraction of the descriptors. The number of categories in the database as well as the number of images in a category with a total number of images in the database is summarized in Table 7.1. In order to find the distance (i.e. dissimilarity) between two feature vectors, the distance measure described in section 1.3.3 is used in this chapter. We evaluated the performance of descriptors for CBIR system over a particular database using

106

Chapter 7

Boosting Local Descriptors with BoF and SVD

Average Retrieval Precision (ARP) and Average Retrieval Rate (ARR) metrics defined in section 4.3.2. In order to justify the improved performance of proposed BoF-LBP descriptor, we have compared the retrieval results with recent and state-of-the-art descriptors such as Local Binary Pattern (LBP) [16], Semi-structure Local Binary Pattern (SLBP) [140], Sobel Local Binary Pattern (SOBEL-LBP) [139], Local Ternary Pattern (LTP) [18],

Local

Derivative Pattern (LDP) [19], Local Tetra Pattern (LTrP) [20], and Spherical Symmetric 3Dimensional Local Ternary Pattern (SS-3D-LTP) [83]. We have used the online available code of LBP [186] and implemented the rest of descriptors. Table 7.1. Image databases summary No.

Database

Image

#Class

# Images in Each

#Images (Total)

Name

Size

1.

Corel-1k

384×256

10

100

1000

2.

Corel-10k

120×80

80

Vary

10,800

3.

MIT-VisTex

128×128

40

16

640

4.

STex-512S

128×128

26

Vary

7616

Class

Table 7.2. ARP values using BoF-LBP when number of top matches are 10 𝝌𝟐

Database/Distance

Euclidean

Canberra

L1

D1

Corel-1k

63.53

73.84

71.35

71.59

72.50

Corel-10k

33.07

31.25

38.35

38.65

39.87

MIT-VisTex

67.06

67.67

72.66

72.64

73.25

STex-512S

71.35

69.43

79.18

79.32

78.95

Average

58.75

60.55

65.39

65.55

66.14

Corel-1k database

Corel-1k database LBP SLBP SOBEL-LBP LTP LDP LTrP SS-3D-LTP BoF-LBP

60

LBP SLBP SOBEL-LBP LTP LDP LTrP SS-3D-LTP BoF-LBP

40 35

ARR (%)

70

ARP (%)

45

50

30 25 20 15 10

40 10

20

30

40

50

60

70

80

90

100

Number of Top Matches

10

20

30

40

50

60

70

80

90

100

Number of Top Matches

(a)

(b)

Fig. 7.4. The performance comparison BoF-LBP descriptor with LBP, SLBP, SOBEL-LBP,

LTP, LDP, LTrP, and SS-3D-LTP descriptors over Corel-1k database using (a) ARP (%) and (b) ARR (%).

107

Chapter 7

Boosting Local Descriptors with BoF and SVD

Corel-10k database

Corel-10k database 16

40

LBP SLBP SOBEL-LBP LTP LDP LTrP SS-3D-LTP BoF-LBP

30 25

14

ARR (%)

ARP (%)

35

12 LBP SLBP SOBEL-LBP LTP LDP LTrP SS-3D-LTP BoF-LBP

10 8 6

20

4 15 10

20

30

40

50

60

70

80

90

100

10

20

Number of Top Matches

30

40

50

60

70

80

90

100

Number of Top Matches

(a)

(b)

Fig. 7.5. The performance comparison proposed descriptor with other descriptors over Corel-

10k database using (a) ARP (%) and (b) ARR (%). MITVis-Tex database

STex-512S database

100

100 95 90 LBP SLBP SOBEL-LBP LTP LDP LTrP SS-3D-LTP BoF-LBP

80

70 1

ARP (%)

ARP (%)

90

2

3

4

LBP SLBP SOBEL-LBP LTP LDP LTrP SS-3D-LTP BoF-LBP

85 80 75 70

5

6

7

8

9

10

65 1

2

Number of Top Matches

3

4

5

6

7

8

9

10

Number of Top Matches

(a)

(b)

Fig. 7.6. Performance comparison using ARP (%) over (a) MITVis-Tex and (b) STex-512S

databases. Table 7.3. Performance comparison of 𝐵𝑜𝐹-𝐿𝐵𝑃 with 𝐿𝐵𝑃𝑎,ℎ |𝑎=1,2,3,4,5 in terms of the ARP values over each database when #𝐼𝑅 = 10 Database

𝑳𝑩𝑷𝟏,𝒉

𝑳𝑩𝑷𝟐,𝒉

𝑳𝑩𝑷𝟑,𝒉

𝑳𝑩𝑷𝟒,𝒉

𝑳𝑩𝑷𝟓,𝒉

𝑩𝒐𝑭-𝑳𝑩𝑷

Corel-1k

68.47

68.45

70.29

66.37

70.22

72.50

Corel-10k

35.02

30.95

31.30

28.67

29.06

39.87

MIT-VisTex

67.47

63.92

64.56

65.70

66.58

73.25

Stex-512S

68.78

64.87

61.63

60.79

60.50

78.95

The effect of Euclidean, Canberra, L1, D1, and Chi-square distance measures over the performance of BoF-LBP descriptor is listed in Table 7.2 in terms of the ARP when the number of images retrieved (#𝐼𝑅) is 10. The performance of proposed descriptor over Corel1k and STex-512S is better using Canberra and D1 distances respectively, whereas, its performance is better using Chi-square distance over other databases. The average

108

Chapter 7

Boosting Local Descriptors with BoF and SVD

performance using each distance measure over all databases is also reported in the last row of this table with best performance using Chi-square. On the basis of this result, we assumed the Chi-square distance measure to find the dissimilarity between two feature vectors in the rest of the results of this chapter. The performance comparison among each descriptor is made using the plots of ARP (%) vs #𝐼𝑅 and ARR (%) vs #𝐼𝑅 over natural databases in the Fig. 7.4-7.5 (i.e. results over Corel-1k and Corel-10k databases are plotted in the Fig. 7.4 and Fig. 7.5 respectively). It is clearly seen that the proposed BoF-LBP descriptor outperforms the LBP, SLBP, SOBEL-LBP, LTP, LDP, LTrP, and SS-3D-LBP descriptors over both the natural databases using both ARP and ARR evaluation metrics. We also reported the results over textural databases such as MITVis-Tex and STex-512S in terms of the ARP (%) as a function of the number of top matches (i.e. number of retrieved images) in Fig. 7.6. The performance of our method is clearly better than the other ones over both the textural databases. It is also noticed across the plots of Fig. 7.4-7.6 that the improvement caused by the proposed descriptor is more if the size of the database is large (see the database size and results over Corel-10k and STex-512S databases as compared to the remaining descriptors). From these results, we can make an assertion that the introduced feature descriptor outperforms recent and state-of-the-art feature descriptors over both natural and textural images. The ARP values in percentage when number of top matches are 10 over each database for 𝐿𝐵𝑃𝑎,ℎ |𝑎=1,2,3,4,5 and BoF-LBP descriptors (i.e. BoF as well standalone Filtered LBPs) are demonstrated in Table 7.3. It can be visualized that the performance of combined filtered LBP (i.e. BoF-LBP) is highly improved as compared to the standalone filtered LBP. Among standalone filtered LBPs, the descriptor over averaged filtered image (i.e. for 𝑖 = 1) is having a more discriminative ability than the remaining ones in most of the cases. The low frequency information contained by the averaging filter is the possible cause for this.

7.2. NIR Face Image Retrieval using SVD and Local Descriptors 7.2.1.

Methodology

The proposed framework for NIR face retrieval is illustrated in Fig. 7.7. It consists of the four main components, namely Singular Value Decomposition (SVD) sub-band formation, local descriptor extraction, feature vector computation and similarity measurement and NIR face retrieval. The feature vector is formed using the extracted local descriptors over the SVD subbands obtained by applying SVD decomposition over each face image of the database as well as query face image. The face retrieval is performed by finding the best similarities between the feature vector of query face with the feature vectors of the database faces. In this section, we will describe each component of the introduced methodology in details.

109

Chapter 7

a)

Boosting Local Descriptors with BoF and SVD

SVD Sub-bands Formation

This sub-section is devoted for description of the construction process of the sub-bands of the singular value decomposition (SVD) at a particular level (i.e. multi-resolution) over a given image. A concept of SVD based sub-band decomposition and multi-resolution representation [161] is used here to enhance the features of the original image (i.e. to cope with the illumination problem of the NIR imaging). Let 𝐼 𝑖,𝑗 is the intensity value of any pixel at 𝑖 𝑡ℎ row and 𝑗 𝑡ℎ column of any gray scaled NIR face image 𝑀 having 𝑚𝑥 rows and 𝑚𝑦 columns (i.e. the dimension of 𝑀 is 𝑚𝑥 × 𝑚𝑦 ). Let 𝑃𝐿 is the input image of dimension 𝑚𝑥𝐿 × 𝑚𝑦𝐿 for the 𝐿𝑡ℎ level of SVD factorization. The input image 𝑃𝐿 will be divided into 2 × 2 nonoverlapping blocks and SVD will be applied to each block. Thus, a total of 𝑛𝐿 = 𝑛𝑥𝐿 × 𝑛𝑦𝐿 number of SVD will be required, where 𝑛𝑥𝐿 = 𝑚𝑥𝐿 2 and 𝑛𝑦𝐿 = 𝑚𝑦𝐿 2 . The intensity values of the 𝑡𝑥 , 𝑡𝑦

𝑡ℎ

block of the image 𝑃𝐿 are given as follows, 2𝑡 𝑥 −1,2𝑡 𝑦 −1

𝑃𝐿,𝑡 𝑥 ,𝑡 𝑦 |𝑡 𝑥 ∈ 1,𝑛 𝑥𝐿 The SVD of 𝑡𝑥 , 𝑡𝑦

𝑡ℎ

& 𝑡 𝑦 ∈ 1,𝑛 𝑦𝐿

=

𝑃𝐿

2𝑡 𝑥 ,2𝑡 𝑦 −1

𝑃𝐿

2𝑡 𝑥 −1,2𝑡 𝑦

𝑃𝐿

(7.9)

2𝑡 𝑥 ,2𝑡 𝑦

𝑃𝐿

block of 𝑃𝐿 (i.e. 𝑃𝐿,𝑡 𝑥 ,𝑡 𝑦 ) can be represented in the following

factorization form, 𝑃𝐿,𝑡 𝑥 ,𝑡 𝑦 = 𝐴𝐿,𝑡 𝑥 ,𝑡𝑦

𝐵𝐿,𝑡 𝑥 ,𝑡 𝑦

𝐶𝐿,𝑡 𝑥 ,𝑡𝑦

𝑇

(7.10)

where 𝐴𝐿,𝑡 𝑥 ,𝑡𝑦 and 𝐶𝐿,𝑡 𝑥 ,𝑡𝑦 are the 2 × 2 matrices containing the orthogonal column vectors, and 𝐵𝐿,𝑡 𝑥 ,𝑡 𝑦 is a 2 × 2 diagonal matrix having singular values at the main diagonals.

S Sub-band NIR Face Database

U Sub-band

SVD Sub-band Formation

Local Descriptor Extraction V Sub-band D Sub-band

NIR Face Retrieval

Similarity Measurement

Query Face

Feature Vector Computation

Fig. 7.7. The proposed framework for NIR face retrieval using SVD and local descriptors.

110

Chapter 7

Boosting Local Descriptors with BoF and SVD

Fig. 7.8. Illustration of the sub-band formation (i.e. S, U, V and D sub-bands) from the SVD

factorization of any input PL using an example of size 4×4.

We can write Eq. (7.10) as follows, 2𝑡 𝑥 −1,2𝑡 𝑦 −1

𝑃𝐿

2𝑡 𝑥 ,2𝑡 𝑦 −1

𝑃𝐿

2𝑡 𝑥 −1,2𝑡 𝑦

𝑃𝐿

2𝑡 𝑥 ,2𝑡 𝑦

𝑃𝐿 =

𝑥 𝑦 𝐴𝐿,11

𝑡 ,𝑡

𝑥 𝑦 𝐴𝐿,12

𝑡 ,𝑡

𝑥 𝑦 𝐴𝐿,22

𝑥 𝑦 𝐴𝐿,21

𝑡 ,𝑡

𝑥 𝑦 𝐵𝐿,11

𝑡 ,𝑡

0

𝑥 𝑦 𝐶𝐿,11

𝑡 ,𝑡

𝑥 𝑦 𝐶𝐿,12

0

𝑥 𝑦 𝐵𝐿,22

𝑡 ,𝑡

𝑥 𝑦 𝐶𝐿,21

𝑡 ,𝑡

𝑥 𝑦 𝐶𝐿,22

𝑡 ,𝑡

𝑡 ,𝑡

T

(7.11)

𝑡 ,𝑡

Four sub-bands, namely S, U, V and D are formed from Eq. (7.11) and its values in 𝑡𝑥 𝑡ℎ row and 𝑡𝑦 𝑡ℎ column is given as follows: 𝑡 ,𝑡 𝑦

𝑆𝐿 𝑥

𝑡 ,𝑡

𝑡 ,𝑡 𝑦

𝑥 𝑦 = 𝐵𝐿,11 , 𝑈𝐿 𝑥

𝑡 ,𝑡

𝑡 ,𝑡 𝑦

𝑥 𝑦 = 𝐴𝐿,11 , 𝑉𝐿 𝑥

𝑡 ,𝑡

𝑡 ,𝑡 𝑦

𝑥 𝑦 = 𝐶𝐿,11 and 𝐷𝐿𝑥

𝑡 ,𝑡

𝑥 𝑦 = 𝐵𝐿,22

(7.12)

The input image 𝑃𝐿 for SVD at 𝐿𝑡ℎ level and its dimensions (i.e. 𝑚𝑥𝐿 and 𝑚𝑦𝐿 ) are defined 𝑡 ,𝑡

𝑥 𝑦 recursively in terms of the 𝐼 and S sub-band (𝑆𝐿−1 ) at 𝐿 − 1

𝐼, 𝑚𝑥 , 𝑚𝑦 𝑃𝐿 , 𝑚𝑥𝐿 , 𝑚𝑦𝐿

=

𝑡 ,𝑡

𝑥 𝑦 𝑆𝐿−1 ,

𝑚𝑥𝐿−1 2

𝑡ℎ

level as:

𝑖𝑓 𝐿 == 1 ,

𝑚𝑦𝐿−1 2

𝐸𝑙𝑠𝑒

(7.13)

For multi-resolution sub-bands the S sub-band obtained in the previous level will be treated as the input image. An example of SVD based sub-band formation is shown in Fig. 7.8. b)

Local Descriptor Extraction

In this sub-section, we introduce to extract the local descriptors over the sub-band face images obtained after applying the SVD at a particular level over the original face image. In order to show the significance of local descriptor construction over SVD sub-bands for NIR face

111

Chapter 7

Boosting Local Descriptors with BoF and SVD

retrieval, we extracted Local Binary Pattern (LBP) [16], Semi-structure Local Binary Pattern (SLBP) [140], Directional Binary Code (DBC) [142] and Local Gabor Binary Pattern (LGBP) [143] based local descriptors. The LBP descriptor computed over S, U, V, and D sub-bands are denoted as the SVD-S-LBP, SVD-U-LBP, SVD-V-LBP and SVD-D-LBP respectively. The other local descriptors (i.e. SLBP, DBC and LGBP) are also computed over each subbands of SVD. The naming convention for SLBP, DBC and LGBP over S, U, V, and D subbands are similar to LBP over these sub-bands. 7.2.2.

Experiments, Results and Discussions

The introduced SVD based local descriptors are used in the near-infrared (NIR) face image retrieval experiments to investigate its performance and discriminative ability. The values of number of local neighbors 𝑁 and radius of neighborhood 𝑅 are set to 8 and 1 in all the experiments of this work. The state-of-the-art descriptors such as LBP (𝑑𝑖𝑚: 256), SLBP (𝑑𝑖𝑚: 256), DBC (dim: 4 × 512) and LGBP (dim: 3 × 256) are used for the comparison of the results with and without SVD. We have used the online available codes [186] of the LBP, whereas we implemented the SLBP, DBC and LGBP following its algorithms. For computation of LGBP descriptor, we have considered three scales and single orientation. The Chi-square distance is used to find the similarity between two images. The average retrieval precision (ARP), average retrieval rate (ARR), F-score (F) and average normalized modified retrieval rank (ANMRR) evaluation measures defined in section 4.3.2 are used to report the NIR face image retrieval results. Two widely adopted and benchmark NIR face databases, namely PolyU-NIR [187] and CASIA-NIR [188] is used for the face retrieval experiments. We have considered the set-1 of the PolyU-NIR face database which consists of the total 7277 images from 55 subjects. CASIA-NIR face database is comprised of the total 3940 images from the 197 subjects having 20 faces each. We have cropped the face region of the CASIANIR face database because too much variation is present in the background of the images of this database. a)

Experiments over Different SVD Sub-bands

The local descriptors, including LBP, SLBP, DBC and LGBP over S, U, V and D sub-bands of SVD are compared using ARP vs 𝜂 and ARR vs 𝜂 over both PolyU-NIR and CASIA-NIR face databases in this experiment. In Fig. 7.9-7.10, the results over PolyU-NIR and CASIANIR databases are presented respectively. The performance of local descriptors degrades over U, V and D sub-bands in each cases except for SVD-U-LBP, SVD-U-DBC and SVD-ULGBP descriptor over PolyU-NIR database. A tremendous improvement in the performance is observed over S sub-band for each descriptor over each database.

112

Chapter 7

Boosting Local Descriptors with BoF and SVD

60

70 60 50

50

40

40

30

30 10

20 10

20

30

40

50

60

70

80

90 100

20

30

40

50

60

70

80

20 10

90 100

10

30

ARR (%)

20

20 10

20

30

40

50

60

70

80

0 10

90 100

20

Number of Top Matches

30

40

50

60

70

80

70 60 50 40 30

20

30

40

50

60

70

80

20 10

90 100

20

Number of Top Matches

SLBP SVD-S-SLBP SVD-U-SLBP SVD-V-SLBP SVD-D-SLBP

40

ARR (%)

ARR (%)

30

50 30

50 LBP SVD-S-LBP SVD-U-LBP SVD-V-LBP SVD-D-LBP

80

60

Number of Top Matches

50 40

70

LGBP SVD-S-LGBP SVD-U-LGBP SVD-V-LGBP SVD-D-LGBP

90

40

Number of Top Matches

0 10

80

ARP (%)

80

100 DBC SVD-S-DBC SVD-U-DBC SVD-V-DBC SVD-D-DBC

90

ARP (%)

70

SLBP SVD-S-SLBP SVD-U-SLBP SVD-V-SLBP SVD-D-SLBP

90

ARP (%)

80

ARP (%)

100

100 LBP SVD-S-LBP SVD-U-LBP SVD-V-LBP SVD-D-LBP

90

90 100

45 40 35 30 25 20 15 10 5 0 10

30

40

50

60

70

80

90 100

Number of Top Matches 50

DBC SVD-S-DBC SVD-U-DBC SVD-V-DBC SVD-D-DBC

LGBP SVD-S-LGBP SVD-U-LGBP SVD-V-LGBP SVD-D-LGBP

40

ARR (%)

100

30 20 10

20

30

40

50

60

70

80

0 10

90 100

20

Number of Top Matches

Number of Top Matches

30

40

50

60

70

80

90 100

Number of Top Matches

st

Fig. 7.9. The performance comparison of LBP, SLBP, DBC and LGBP descriptors in the 1 ,

2nd, 3rd and 4th column respectively, over different sub-bands of the SVD in terms of the ARP

6

8

10

12

14

16

18

20

100

4

6

8

10

12

14

16

18

35

25 20

30 20 15

10

10 4

6

8

10

12

14

16

50

18

Number of Top Matches

20

5 2

70 60 50

2

40 4

4

6

8

10

12

14

16

6

8

10

12

14

16

18

30 2

20

4

Number of Top Matches

25

15

80

60

20

SLBP SVD-S-SLBP SVD-U-SLBP SVD-V-SLBP SVD-D-SLBP

40

ARR (%)

ARR (%)

30

LGBP SVD-S-LGBP SVD-U-LGBP SVD-V-LGBP SVD-D-LGBP

90

30

45 LBP SVD-S-LBP SVD-U-LBP SVD-V-LBP SVD-D-LBP

35

70

Number of Top Matches

45

5 2

80

40

Number of Top Matches

40

100 DBC SVD-S-DBC SVD-U-DBC SVD-V-DBC SVD-D-DBC

90

ARP (%)

SLBP SVD-S-SLBP SVD-U-SLBP SVD-V-SLBP SVD-D-SLBP

18

20

Number of Top Matches

8

10

12

14

16

18

20

18

20

50

50 45 40 35 30 25 20 15 10 2

6

Number of Top Matches

DBC SVD-S-DBC SVD-U-DBC SVD-V-DBC SVD-D-DBC

LGBP SVD-S-LGBP SVD-U-LGBP SVD-V-LGBP SVD-D-LGBP

45 40

ARR (%)

4

100 90 80 70 60 50 40 30 20 10 2

ARP (%)

LBP SVD-S-LBP SVD-U-LBP SVD-V-LBP SVD-D-LBP

ARR (%)

100 90 80 70 60 50 40 30 20 10 2

ARP (%)

ARP (%)

(%) vs 𝜂 (in the 1st row) and ARR (%) vs 𝜂 (in the 2nd row) over PolyU-NIR face database.

35 30 25 20 15

4

6

8

10

12

14

16

Number of Top Matches

18

20

10 2

4

6

8

10

12

14

16

Number of Top Matches

st

Fig. 7.10. The performance comparison of LBP, SLBP, DBC and LGBP descriptors in the 1 ,

2nd, 3rd and 4th column respectively, over different sub-bands of the SVD in terms of the ARP (%) vs 𝜂 (in the 1st row) and ARR (%) vs 𝜂 (in the 2nd row) over CASIA-NIR face database.

The {F-score, ANMRR} of SVD-S-LBP is improved by {63.22%, 36.85%} and {11.22%, 8.35%} as compared to the LBP over PolyU-NIR and CASIA-NIR database respectively. The performance of SVD-S-DBC and SVD-S-LGBP is improved by {67.62%, 31.57%} and {86.99%, 39.76%} as compared to the DBC and LGBP respectively in terms of the {F-score, ANMRR} over PolyU-NIR database. The performance of SVD-S-LBP is also improved by the 12.74% in terms of the F-score over CASIA-NIR database. The performance gain using SLBP is less because SLBP also incorporates the pre-processing step to enhance the local features. The results suggest that the S sub-band of SVD is more preferable than other sub-bands for computing the descriptors over it. The performance of local descriptors is only enhanced over the S sub-band because S sub-band have richer local information (by approximation), whereas, U, V and D sub-bands only contain the horizontal, vertical and diagonal information.

113

Chapter 7

Boosting Local Descriptors with BoF and SVD

70

F (%)

60

50 SVD-S (L=1) SVD-S (L=2) SVD-S (L=3)

55 50 45 40

40 35 30 25

35 30

SVD-S (L=1) SVD-S (L=2) SVD-S (L=3)

45

ANMRR (%)

65

LBP

SLBP

DBC

20

LGBP

LBP

Number of Top Matches

SLBP

DBC

LGBP

Number of Top Matches

(a) Over PolyU-NIR face database 60

70

F (%)

50

SVD-S (L=1) SVD-S (L=2) SVD-S (L=3)

40 30 20 10 0

SVD-S (L=1) SVD-S (L=2) SVD-S (L=3)

55

ANMRR (%)

60

50 45 40 35 30 25

LBP

SLBP

DBC

20

LGBP

LBP

SLBP

DBC

LGBP

Number of Top Matches

Number of Top Matches

(b) Over CASIA-NIR face database Fig. 7.11. Comparison among different level of SVD decomposition using S sub-band in

conjunction with different descriptors over (a) PolyU-NIR and (b) CASIA-NIR face databases in terms of the F (%) and ANMRR (%).

st

Fig. 7.12. Retrieval results from CASIA-NIR face database using LBP (1 row), SVD-S-LBP

(2nd row), SLBP (3rd row), SVD-S-SLBP (4th row), DBC (5th row), SVD-S-DBC (6th row), LGBP (7th row) and SVD-S-LGBP (8th row) descriptors. The first image in each row is the query face and rest images are retrieved faces. The faces in rectangles are the false positives.

114

Chapter 7

Boosting Local Descriptors with BoF and SVD

st

Fig. 7.13. Retrieval results from PolyU-NIR face database using LBP (1 row), SVD-S-LBP

(2nd row), SLBP (3rd row), SVD-S-SLBP (4th row), DBC (5th row), SVD-S-DBC (6th row), LGBP (7th row) and SVD-S-LGBP (8th row) descriptors. The first image in each row is the query face and rest images are retrieved faces. The faces in rectangles are the false positives.

b)

Experiments over Level of SVD

We also investigated the effect of the level (L) of SVD factorization and compared the results for L=1, 2 and 3 using SVD-S-LBP, SVD-S-SLBP, SVD-S-DBC and SVD-S-LGBP descriptors in the Fig. 7.11(a-b) over the PolyU-NIR and CASIA-NIR face databases respectively in terms of the F (%) and ANMRR (%). It can be observed that the performance of each descriptor is improving (i.e. F-score is increasing and ANMRR is decreasing) at a higher level of SVD decomposition except for SVD-S-LBP over PolyU-NIR database. While we used the L=1 earlier in this chapter, the performance can be increased further at a higher level of SVD decomposition. c)

Retrieval Results

The top 10 matching retrieved faces are displayed for a query face from CASIA-NIR face database in Fig. 7.12 and for a query face image from PolyU-NIR face database in Fig. 7.13. The retrieval results are obtained using LBP, SVD-S-LBP, SLBP, SVD-S-SLBP, DBC, SVDS-DBC, LGBP, and SVD-S-LGBP feature vectors in 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, and 8th row

115

Chapter 7

Boosting Local Descriptors with BoF and SVD

respectively of Fig. 7.12 and Fig. 7.13. Note that the incorrect retrieved images are enclosed by a rectangle. The precision over CASIA-NIR and PolyU-NIR face databases are {60%, 100%} and {70%, 100%} respectively, using {LBP, SVD-S-LBP} feature vectors as depicted in the 1st and 2nd row. From 3rd and 4th row of Fig. 7.12, the number of correct matches are 8 and 10 using SLBP and SVD-S-LBP feature vectors respectively, over CASIA-NIR face database, whereas, it is 10 for both over PolyU-NIR face database as shown in the 3rd and 4th row of Fig. 7.13. The number of incorrect matches using DBC descriptor is 2 (see the 5 th row of Fig. 7.12) and 3 (see the 5th row of Fig. 7.13) over CASIA-NIR and PolyU-NIR face databases respectively, whereas, it is 1 (see the 6th row of Fig. 7.12) and 0 (see the 6th row of Fig. 7.13) respectively, using SVD-S-DBC descriptor. The precision using LGBP and SVDS-LGBP descriptors are 70% and 100% respectively over CASIA-NIR face database as displayed in the 7th and 8th row of the Fig. 7.12, while, it is 50% and 100% respectively over PolyU-NIR face database (see the 7th and 8th row of the Fig. 7.13). From the experimental results, it is found that the proposed technique of computing the local descriptors over S sub-band of SVD outperforms the local descriptors without SVD (i.e. the performance of local descriptor is boosted over S sub-band) under near-infrared face retrieval and confirms its discriminative ability as well as suitability. The SVD decomposition is able to cope with the illumination problem of the near-infrared imaging.

7.3. Summary A bag-of-filter (BoF) based local binary pattern (LBP) coding scheme is presented in this chapter for content based image retrieval. The BoF basically explores the local features of the image in several ways which is further captured by the LBP to generate a more discriminative BoF-LBP descriptor. The performance of introduced descriptor is tested using an image retrieval framework over four benchmark databases including two natural as well as two textural databases. The experimental results point out the superiority of the BoF-LBP descriptor over the existing descriptors over natural and textural databases. It is observed that the Chi-square distance measure is better suited with the BoF-LBP descriptor. It has also been experimented that the low frequency information such as smooth variations is having more importance as compared to the high frequency information such as corners, edges, etc. to model the more discriminative descriptor. Another conclusion of this research is that the BoFLBP is better performing over the large databases. In this chapter, we also presented the image feature descriptors for near-infrared face image retrieval by integrating the concepts of SVD and local descriptors. Four sub-bands, namely S, U, V and D are created by taking the SVD factorization. The LBP, SLBP, DBC and LGBP features are extracted over these subbands to form the various descriptors in conjunction with the SVD. The discriminating power

116

Chapter 7

Boosting Local Descriptors with BoF and SVD

of the proposed local descriptor is examined using the NIR face retrieval experiments over two benchmark NIR face databases namely PolyU-NIR and CASIA-NIR. The performance of local descriptors over S sub-band of the SVD is more preferable as compared to the other subbands. The discriminative ability of the local descriptors is further improved at a higher level of SVD decomposition of the S sub-band. Through experiments, it is confirmed that the performance of local descriptors boosted with SVD and outperforms the local descriptors without SVD over both NIR face databases.

117

Chapter 8 Conclusions and Future Directions

In this chapter, we present the outcome of this thesis regarding the feasibility, distinctiveness, robustness and efficiency of the presented descriptors. We also discuss that how the proposed solutions fill the research gaps identified in Chapter 2. Finally, we also discuss the possibility of further research in the area of robust image feature description, matching and applications. This chapter is organized as follows: Section 8.1 highlights the key point observations from the research of this thesis and Section 8.2 suggests some future directions of this research.

8.1. Conclusions From the experiments, results, analysis and discussions presented in the previous chapters, we have drawn the following conclusions: 

We have proposed a local descriptor from the interleaved intensity order to reduce the dimension of the local ordering based descriptor. By considering the interleaved intensity orders, we can use a number of local neighbors in the construction of descriptor which improves the discriminative ability. It is experimented over the Oxford and complex illumination change datasets using the region matching concept and found very promising performance.



We have exploited the relationship of center pixel with its neighbors along with the relationship exist among the local neighbors and proposed four descriptors namely LDEP, LBDP, LBDISP, and LWP. We have used diagonal neighbors in LDEP to reduce its dimension significantly. We have considered the bit-planes in LBDP and LBDISP to improve the distinctiveness. We have also utilized the wavelet decomposition of local neighbors in LWP to make it more discriminative. The dimension of proposed descriptors is very low as compared to the state-of-the-art descriptors. We tested these descriptors over five biomedical databases including one MRI database and four CT database and observed very appealing results. It is noticed that Chi-square distance is better suited to these kinds of descriptors.

Chapter 8



Conclusions and Future Directions

We have proposed two descriptors namely LCOD and RSHD from the color quantization of color images. These descriptors are rotation and scale invariant in nature and have shown very encouraging retrieval results over the natural color databases. The dimension of these descriptors is also very reasonable. We also proposed two other descriptors from the color information, namely maLBP and mdLBP by exploiting the concept of adder and decoder respectively, over the LBP patterns obtained over each color channel of the image. The dimension of these descriptors is relatively high, but still low as compared to some recently introduced descriptors. These descriptors have shown the greatest improvement in the retrieval performance over color texture databases.



We have also proposed an illumination compensation mechanism to reduce the effect of illumination change from the multichannel information of the color image. It has been tested over the databases having the uniform as well as non-uniform illumination difference images. Very satisfying results are obtained using several methods in conjunction with the proposed technique. It has also been noticed that the results of illumination invariant descriptors improved further when applied with this proposed method.



We have also proposed to combine the local descriptors with the BoF and SVD. The performance of local descriptor such as LBP has been improved significantly when applied with the BoF in CBIR over the gray scale natural and textural databases. It has been also investigated that the S sub-band of SVD enriches the local information of the image and the performance of local descriptors boosted when extracted over the S sub-band. The local descriptors with SVD have been experimented over the NIR face databases under the image retrieval framework and observed satisfactory performance. The use of SVD with local descriptors does not affect the size of the local descriptors.

8.2. Future Scopes We have observed the following future directions on the basis of the performance of the proposed descriptors: 

The region based descriptors mostly designed for the grayscale images; it can be explored for the color images too.



The performance of local descriptors can be further improved by utilizing the local neighborhood information more effectively.



The color descriptors are still having the dimensionality problem which can be tackled further by designing the more efficient color descriptors.

120

Chapter 8



Conclusions and Future Directions

The sensitivity to the illumination problem of color descriptors can be explored to introduce more illumination robust descriptors.



The performance of descriptors proposed for the one type of database can be explored with the other kind of databases.



Deep learning is being used very actively nowadays to develop the features at the intermediate layers. This may also one of the future works to integrate the existing descriptors with the deep learning based approaches.

121

References

5.

M. Pietikäinen, A. Hadid, G. Zhao, and T. Ahonen, “Computer Vision Using Local Binary Patterns,” Computational Imaging and Vision, Springer, 2011. D. Zhang, W. Wang, Q. Huang, S. Jiang, and W. Gao, “Matching images more efficiently with local descriptors,” In Proceedings of the 19th International Conference on Pattern Recognition, 2008, pp. 1-4. B. Julesz, “Visual pattern discrimination,” IRE Transactions on Information Theory, Vol.8, No. 2, pp. 84– 92, 1962. H.B. Yang and X. Hou, “A New Local Self-Similarity Descriptor Based on Structural Similarity Index,” Applied Mechanics and Materials, Vol. 519, pp. 615-622. 2014. O. Shahar and G. Levi, “Image and video descriptors,” Weizmann Institute of Science, Israel, 2010.

6.

Corel Photo Collection Color Image Database, http://wang.ist.psu.edu/docs/realted/.

7.

S.R. Dubey, S.K. Singh, R.K. Singh, “Rotation and scale invariant hybrid image descriptor and retrieval,” Computers & Electrical Engineering, Vol. 46, pp. 288-302, 2015. V. Takala, T. Ahonen, and M. Pietikainen, “Block-Based Methods for Image Retrieval Using Local Binary Patterns,” In Proceedings of the 14th Scandinavian conference on Image Analysis, 2005, pp. 882-891. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, Vol. 60, No. 2, pp. 91–110, 2004. S. Zhang, Q. Tian, K. Lu, Q. Huang, and W. Gao, “Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile Search,” IEEE Transactions on Image Processing, Vol. 22, No. 7, pp. 2889-2902, 2013. R. Gupta, H. Patil, and A. Mittal, “Robust order-based methods for feature description,” In Proceedings of the 23rd IEEE International Conference on Computer Vision and Pattern Recognition, 2010, pp. 334 –341. Z. Wang, B. Fan, and F. Wu, “Local Intensity Order Pattern for feature description,” In Proceedings of the 13th IEEE International Conference on Computer Vision, 2011, pp. 603-610. B. Fan, F. Wu, and Z. Hu, “Rotationally Invariant Descriptors Using Intensity Order Pooling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 10, pp. 2031 –2045, 2012. B. Kim, H. Yoo, and K. Sohn, “Exact order based feature descriptor for illumination robust image matching,” Pattern Recognition, Vol. 46, No. 12, pp. 3268-3278, 2013. B. Fan, F. Wu, and Z. Hu, “Aggregating Gradient Distributions into Intensity Orders: A Novel Local Image Descriptor,” In Proceedings of the 24th IEEE International Conference on Computer Vision and Pattern Recognition, 2011, pp. 2377–2384. T. Ojala, M. Pietikanen, and T. Maenpa, “Multi-resolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, pp. 971–987, 2002. M. Heikkila, M. Pietikainen, and C. Schmid, “Description of interest regions with local binary patterns,” Pattern Recognition, Vol. 42, No. 3, pp. 425–436, 2009. X. Tan and Bill Triggs, “Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions,” IEEE Transactions on Image Processing, Vol. 19, No. 6, pp. 1635-1650, 2010. B. Zhang, Y. Gao, S. Zhao, and J. Liu, “Local Derivative Pattern Versus Local Binary Pattern: Face Recognition With High-Order Local Pattern Descriptor,” IEEE Transactions on Image Processing, Vol. 19, No. 2, pp. 533-544, 2010. S. Murala, R.P. Maheshwari, and R. Balasubramanian, “Local tetra patterns: a new feature descriptor for content-based image retrieval,” IEEE Transactions on Image Processing, Vol. 21, No. 5, pp. 2874–2886, 2012. J. Sun, G. Fan, and X. Wu, “New local edge binary patterns for image retrieval,” In Proceedings of the 20th IEEE International Conference on Image Processing, 2013, pp. 4014-4018. G.H. Liu, L. Zhang, Y.K. Hou, Z.Y. Li, and J.Y. Yang, “Image retrieval based on multi-texton histogram,” Pattern Recognition, Vol. 43, No. 7, pp. 2380–2389, 2010. G.H. Liu, Z.Y. Li, L. Zhang, and Y. Xu, “Image retrieval based on micro-structure descriptor,” Pattern Recognition, Vol. 44, No. 9, pp. 2123–2133, 2011. W. Xingyuan and W. Zongyu, “A novel method for image retrieval based on structure elements descriptor,” Journal of Visual Communication and Image Representation, Vol. 24, No. 1, pp. 63–74, 2013.

1. 2. 3. 4.

8. 9. 10.

11. 12. 13. 14. 15.

16.

17. 18. 19.

20.

21. 22. 23. 24.

References

25.

26. 27. 28.

29. 30. 31. 32. 33. 34.

35. 36. 37. 38. 39.

40. 41. 42. 43. 44.

45. 46. 47.

48. 49. 50. 51.

G. Zhao, T. Ahonen, J. Matas, and M. Pietikäinen, “Rotation-Invariant Image and Video Description With Local Binary Pattern Features,” IEEE Transactions on Image Processing, Vol. 21, No. 4, pp. 1465-1477, 2012. Feature Descriptors, Detection and Matching, http://www.cs.toronto.edu/~kyros/courses/2503/Handouts/features.pdf. T. Tuytelaars and K. Mikolajczyk, “Local invariant feature detectors: a survey,” Foundations and Trends® in Computer Graphics and Vision, Vol. 3, No. 3, pp. 177-280, 2008. KEA Van De Sande, G. Theo, and GMS. Cees, “Evaluating color descriptors for object and scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, pp. 15821596, 2010. M. Unser, “Sum and difference histograms for texture classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 1, pp. 118-125, 1986. A. Amanatiadis, V.G. Kaburlasos, A. Gasteratos, and S.E. Papadakis, “Evaluation of shape descriptors for shape-based image retrieval,” IET Image Processing, Vol. 5, No. 5, pp. 493-499, 2011. Euclidean Distance, https://en.wikipedia.org/wiki/Euclidean_distance. S. Murala and Q.M.J. Wu, “Local ternary co-occurrence patterns: A new feature descriptor for MRI and CT image retrieval,” Neurocomputing, Vol. 119, pp. 399-412, 2013. S. Murala and Q.M.J. Wu, “Local Mesh Patterns Versus Local Binary Patterns: Biomedical Image Indexing and Retrieval,” IEEE Journal of Biomedical and Health Informatics, Vol. 18, No.3, pp. 929-938, 2014. M.M. Rahman, P. Bhattacharya, and B.C. Desai, “A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback,” IEEE Transactions on Information Technology in Biomedicine, Vol. 11, No. 1, pp. 58–69, 2007. Pairwise distance between two sets of observations, www.mathworks.com/help/stats/pdist2.html. Y. Liu, D. Zhang, G. Lu, and W.Y. Ma, “A survey of content-based image retrieval with high-level semantics,” Pattern Recognition, Vol. 40, No. 1, pp. 262-282, 2007. D. Unay, A. Ekin, and R.S. Jasinschi, “Local structure-based region-of-interest retrieval in brain MR images,” IEEE Transactions on Information Technology in Biomedicine, Vol. 14, No. 4, pp. 897-903, 2010. M.R. Hejazi and Y.S. Ho, “An efficient approach to texture‐based image retrieval,” International journal of imaging systems and technology, Vol. 17, No. 5, pp. 295-302, 2007. M. Kokare, P.K. Biswas, and B.N. Chatterji, “Texture image retrieval using new rotated complex wavelet filters,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol. 35, No. 6, pp. 1168-1178, 2005. J. Han and K.K. Ma, “Rotation-invariant and scale-invariant Gabor features for texture image retrieval,” Image and vision computing, Vol. 25, No. 9, pp. 1474-1481, 2007. S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, and R. Szeliski, “Building rome in a day,” In Proceedings of the IEEE International Conference on Computer Vision, 2009, pp. 72–79. M. Brown and D.G. Lowe, “Automatic panoramic image stitching using invariant features,” International Journal of Computer Vision, Vol. 74, No. 1, pp. 59–73, 2007. N. Snavely, S.M. Seitz, and R. Szeliski, “Photo tourism: Exploring photo collections in 3D,” ACM Transactions on Graphics, Vol. 25, No. 3, pp. 835–846, 2006. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local features and kernels for classification of texture and object categories: a comprehensive study,” International Journal of Computer Vision, Vol. 73, No. 2, pp. 213–238, 2007. C. Shan, S. Gong, and P.W. McOwan, “Facial expression recognition based on Local Binary Patterns: A comprehensive study,” Image and Vision Computing, Vol. 27, No. 6, pp. 803–816, 2009. J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” In Proceedings of the British Machine Vision Conference, 2002, pp. 384–393. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L.V. Gool, “A Comparison of Affine Region Detectors,” International Journal of Computer Vision, Vol. 65, No. 1-2, pp. 43–72, 2005. Z. Wang, B. Fan and F. Wu, “FRIF: Fast Robust Invariant Feature,” In Proceedings of the British Machine Vision Conference, 2013. W. Freeman and E. Adelson, “The design and use of steerable filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 9, pp. 891–906, 1991. S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation using local affine regions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1265–1278, 2005. S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 4, pp. 509–521, 2002.

124

References

52. 53. 54. 55. 56. 57.

58.

59. 60.

61. 62. 63. 64.

65.

66.

67.

68. 69.

70. 71.

72.

73.

74.

K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, pp. 1615 –1630, 2005. H. Bay, T. Tuytelaars, and L.V. Gool, “SURF: speeded up robust features,” In Proceedings of the European Conference on Computer Vision, 2006, pp.404–417. E.N. Mortensen, H. Deng, and L. Shapiro, “A SIFT descriptor with global context,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2005, pp. 184–190. E. Tola, V. Lepetit, and P. Fua, “Daisy: An efficient dense descriptor applied to wide-baseline stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 5, pp. 815–830, 2010. L. Gong, T. Wang, and F. Liu, “Shape of Gaussians as Feature Descriptors,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2009, pp. 2366 –2371. R. Gopalan, S. Taheri, P. Turaga, and R. Chellappa, “A Blur-Robust Descriptor with Applications to Face Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 6, pp. 1220 – 1226, 2012. H.Y. Chen, Y.Y. Lin, and B.Y. Chen, “Robust Feature Matching with Alternate Hough and Inverted Hough Transforms,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2013, pp. 2762–2769. C. Zhu, C.E. Bichot, and L. Chen, “Image region description using orthogonal combination of local binary patterns enhanced with color information,” Pattern Recognition, Vol. 46, No. 7, pp. 1949-1963, 2013. F. Tang, S.H. Lim, N.L. Chang, and H. Tao, “A novel feature descriptor invariant to complex brightness changes,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2009, pp. 2631–2638. F. Tang, S.H. Lim, and N.L. Chang, “An improved local feature descriptor via soft binning,” In Proceedings of the IEEE International Conference on Image Processing, 2010, pp. 861-864. T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, Vol. 29, No. 1, pp. 51–59, 1996. D.L. Rubin, H. Greenspan, and J.F. Brinkley, “Biomedical Imaging Informatics,” Biomedical Informatics, pp. 285-327, 2014. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” Computer, Vol. 28, No. 9, pp. 23-32, 1995. A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 12, pp. 1349–1380, 2000. H. Muller, A. Rosset, J.P. Vallee, and A. Geisbuhler, “Comparing feature sets for content-based image retrieval in a medical case database,” In Proceedings of the SPIE Medical Imaging: PACS and Imaging Informatics, 2004, pp. 99 -109. L. Zheng, A.W. Wetzel, J. Gilbertson, and M.J. Becich, “Design and analysis of a content-based pathology image retrieval system,” IEEE Transactions on Information Technology in Biomedicine, Vol. 7, No. 4, pp. 249-255, 2003. A. Quddus and O. Basir, “Semantic image retrieval in magnetic resonance brain volumes,” IEEE Transactions on Information Technology in Biomedicine, Vol. 16, No. 3, pp. 348-355, 2012. X. Xu, D.J. Lee, S. Antani, and L.R. Long, “A Spine X-Ray image retrieval system using partial shape matching,” IEEE Transactions on Information Technology in Biomedicine, Vol. 12, No. 1, pp. 100-108, 2008. H.C. Akakin and M.N. Gurcan, “Content-Based microscopic image retrieval system for multi-image queries,” IEEE Transactions on Information Technology in Biomedicine, Vol. 16, No. 4, pp. 758-769, 2012. M.M. Rahman, S.K. Antani, and G.R. Thoma, “A learning-based similarity fusion and filtering approach for biomedical image retrieval using SVM classification and relevance feedback,” IEEE Transactions on Information Technology in Biomedicine, Vol. 15, No. 4, pp. 640-646, 2011. G. Scott and C.R. Shyu, “Knowledge-Driven multidimensional indexing structure for biomedical media database retrieval,” IEEE Transactions on Information Technology in Biomedicine, Vol. 11, No. 3, pp. 320331, 2007. H. Muller, N. Michoux, D. Bandon, and A. Geisbuhler, “A review of content-based image retrieval systems in medical applications - Clinical benefits and future directions,” International Journal of Medical Informatics, Vol. 73, No. 1, pp. 1-23, 2004. B.L. Larsen, J.S. Vestergaard, and R. Larsen, “HEp-2 cell classification using shape index histograms with donut-shaped spatial pooling,” IEEE Transactions on Medical Imaging, Vol. 33, No. 7, pp. 1573-1580, 2014.

125

References

75.

76.

77. 78. 79.

80.

81. 82. 83. 84. 85.

86.

87.

88. 89.

90.

91.

92. 93.

94. 95. 96. 97.

F.S. Zakeri, H. Behnam, and N. Ahmadinejad, “Classification of benign and malignant breast masses based on shape and texture features in sonography images,” Journal of Medical Systems, Vol. 36, No. 3, pp. 16211627, 2012. R. Rahmani, S.A. Goldman, H. Zhang, S.R. Cholleti, and J.E. Fritts, “Localized content-based image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, No. 11, pp. 1902– 1912, 2008. V. Khanh, K.A. Hua, and W. Tavanapong, “Image retrieval based on regions of interest,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, pp. 1045–1049, 2003. K. Konstantinidis, A. Gasteratos, and I. Andreadis, “Image retrieval based on fuzzy color histogram processing,” Optics Communications, Vol. 248, No. 4–6, pp. 375–386, 2005. T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Applications to face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, pp. 20372041, 2006. S. He, J.J. Soraghan, B.F. O‟Reilly, and D. Xing, “Quantitative analysis of facial paralysis using local binary patterns in biomedical videos,” IEEE Transactions on Biomedical Engineering, Vol. 56, No. 7, pp. 18641870, 2009. L. Sorensen, S.B. Shaker, and M. de Bruijne, “Quantitative analysis of pulmonary emphysema using local binary patterns,” IEEE Transactions on Medical Imaging, Vol. 29, No. 2, pp. 559-569, 2010. S. Murala, A.B. Gonde, and R.P. Maheshwari, “Color and texture features for image indexing and retrieval,” In proceedings of the IEEE International Advance Computing Conference, 2009, pp. 1411-1416. S. Murala and Q.M.J. Wu, “Spherical symmetric 3D local ternary patterns for natural, texture and biomedical image indexing and retrieval,” Neurocomputing, Vol. 149, pp. 1502-1514, 2015. Z. Guo, L. Zhang, and D. Zhang, “Rotation invariant texture classification using LBP variance (LBPV) with global matching,” Pattern recognition, Vol. 43, No. 3, pp. 706-719, 2010. S. Peng, D. Kim, S. Lee, and M. Lim, “Texture feature extraction based on a uniformity estimation method for local brightness and structure in chest CT images,” Computers in Biology and Medicine, Vol. 40, No.11, pp. 931-942. 2010. B. Li and M.Q.H. Meng, “Tumor recognition in wireless capsule endoscopy images using textural features and SVM-Based feature selection,” IEEE Transactions on Information Technology in Biomedicine, Vol. 16, No. 3, pp. 323-329, 2012. J.C. Felipe, A.J.M. Traina, and C. Traina Jr., “Retrieval by content of medical images using texture for tissue identification,” In Proceedings of the 16th IEEE Symposium on Computer-Based Medical Systems, 2003, pp. 175-180. W. Cai, D.D. Feng, and R. Fulton, “Content-based retrieval of dynamic PET functional images,” IEEE Transactions on Information Technology in Biomedicine, Vol. 4, No. 2, pp. 152-158, 2000. L. Yang, R. Jin, L. Mummert, R. Sukthankar, A. Goode, B. Zheng, S.C.H. Hoi, and M. Satyanarayanan, “A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 1, pp. 33-44, 2010. I. El-Naqa, Y. Yang, N.P. Galatsanos, R.M. Nishikawa, and M.N. Wernick, “A similarity learning approach to content-based image retrieval: application to digital mammography,” IEEE Transactions on Medical Imaging, Vol. 23, No. 10, pp. 1233–1244, 2004. B. Andr´e, T. Vercauteren, A.M. Buchner, M.B. Wallace, and N. Ayache, “Learning semantic and visual similarity for endomicroscopy video retrieval,” IEEE Transactions on Medical Imaging, Vol. 31, No. 6, pp. 1276–1288, 2012. G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, and C. Roux, “Wavelet optimization for content-based image retrieval in medical databases,” Journal of Medical Image Analysis, Vol. 14, pp. 227-241, 2010. A.G.M. Traina, C.A. Castañón, and C. Traina Jr., “Multiwavemed: a system for medical image retrieval through wavelets transformations,” In Proceedings of the 16th IEEE Symposium on Computer-Based Medical Systems, 2003, pp. 150 -155. E. Stollnitz, T. DeRose, and D. Salesin, “Wavelet for computer graphics: Theory and applications,” Los Altos, CA: Morgan Kaufmann, 1996. H. Jégou, M. Douze, and C. Schmid, “Improving bag-of-features for large scale image search,” International Journal of Computer Vision, Vol. 87, No. 3, pp. 316-336, 2010. A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” International Journal of Computer Vision, Vol. 42, No. 3, pp. 145-175, 2001. M. Brown and S. Süsstrunk, “Multi-spectral SIFT for scene category recognition,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 177-184.

126

References

98.

99.

100. 101. 102. 103.

104.

105. 106.

107.

108.

109. 110. 111. 112.

113. 114. 115. 116. 117. 118. 119.

120.

121. 122.

M. Douze, H. Jégou, H. Sandhawalia, L. Amsaleg, and C. Schmid, “Evaluation of gist descriptors for webscale image search,” In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009, pp. 19-27. D. Huang, C. Shan, M. Ardabilian, Y. Wang, and L. Chen, “Local binary patterns and its application to facial image analysis: a survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 41, No. 6, pp. 765-781, 2011. S. Liao, M.W.K. Law, and A.C.S. Chung, “Dominant local binary patterns for texture classification,” IEEE Transactions on Image Processing, Vol. 18, No. 5, pp. 1107-1118, 2009. Z. Guo and D. Zhang, “A completed modeling of local binary pattern operator for texture classification,” IEEE Transactions on Image Processing, Vol. 19, No. 6, pp. 1657-1663, 2010. G.H. Liu and J.Y. Yang, “Content-based image retrieval using color difference histogram,” Pattern Recognition, Vol. 46, No. 1, pp. 188-198, 2013. W.T. Chu, C.H. Chen, and H.N. Hsu, “Color CENTRIST: Embedding color information in scene categorization,” Journal of Visual Communication and Image Representation, Vol. 25, No. 5, pp. 840-854, 2014. C.K. Heng, S. Yokomitsu, Y. Matsumoto, and H. Tamura, “Shrink boost for selecting multi-lbp histogram features in object detection,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2012, pp. 3250-3257. J.Y. Choi, K.N. Plataniotis, and Y.M. Ro, “Using colour local binary pattern features for face recognition,” In Proceedings of the 17th IEEE International Conference on Image Processing, 2010, pp. 4541-4544. C. Zhu, C.E. Bichot, and L. Chen, “Multi-scale Color Local Binary Patterns for Visual Object Classes Recognition,” In Proceedings of the IEEE International Conference on Pattern Recognition, 2010, pp. 30653068. S. Banerji, A. Verma, and C. Liu, “Novel color LBP descriptors for scene and image texture classification,” In Proceedings of the 15th International Conference on Image Processing, Computer Vision, and Pattern Recognition, 2011, pp. 537-543. S.H. Lee, J.Y. Choi, Y.M. Ro, and K.N. Plataniotis, “Local color vector binary patterns from multichannel face images for face recognition,” IEEE Transactions on Image Processing, Vol. 21, No. 4. pp. 2347-2353, 2012. Y. Xiao, J. Wu and J. Yuan, “mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization,” IEEE Transactions on Image Processing, Vol. 23, No. 2, pp. 823-836, 2014. I. Daoudi and K. Idrissi, “A fast and efficient fuzzy approximation-based indexing for CBIR,” Multimedia Tools and Applications, Vol. 74, No. 13, pp. 4507-4533, 2015. A. Irtaza, M.A. Jaffar, E. Aleisa, and T.S. Choi, “Embedding neural networks for semantic association in content based image retrieval,” Multimedia Tools and Applications, Vol. 72, No. 2, pp. 1911-1931, 2014. N. Singh, S.R. Dubey, P. Dixit, and J.P. Gupta, “Semantic Image Retrieval by Combining Color, Texture and Shape Features,” In Proceedings of the International Conference on Computing Sciences, 2012, pp. 116-120. Z. Wang, G. Liu, and Y. Yang, “A new ROI based image retrieval system using an auxiliary Gaussian weighting scheme,” Multimedia Tools and Applications, Vol. 67, No. 3, pp. 549-569, 2013. J. Wu, H. Shen, Y.D. Li, Z.B. Xiao, M.Y. Lu, and C.L. Wang, “Learning a Hybrid Similarity Measure for Image Retrieval,” Pattern Recognition, Vol. 46, No. 11, pp. 2927-2939, 2013. R.C. Gonzalez and R.E. Woods, “Digital Image Processing,” 3rd Edition, Prentice Hall, 2007. G. Pass, R. Zabih, and J. Miller, “Comparing images using color coherence vectors,” In Proceedings of the 4th ACM international conference on Multimedia, 1997, pp 65-73. X.Y. Wang, J.F. Wu, and H.Y. Yang, “Robust image retrieval based on color histogram of local feature regions,” Multimedia Tools and Applications, Vol. 49, No. 2, pp. 323–345, 2009. Z. Shi, X. Liu, Q. Li, Q. He, and Z. Shi, “Extracting discriminative features for CBIR,” Multimedia Tools and Applications, Vol. 61, No. 2, pp. 263-279, 2012. S.R. Dubey and A.S. Jalal, “Detection and Classification of Apple Fruit Diseases Using Complete Local Binary Patterns,” In Proceedings of the 3rd IEEE International Conference on Computer and Communication Technology, 2012, pp. 346-351. R.O. Stehling, M.A. Nascimento, and A.X. Falcão, “A compact and efficient image retrieval approach based on border/interior pixel classification,” In Proceedings of the 11th international conference on Information and knowledge management, 2002, pp 102-109. R. Hu, W. Jia, H. Ling, Y. Zhao, and J. Gui, “Angular Pattern and Binary Angular Pattern for Shape Retrieval,” IEEE Transactions on Image Processing, Vol. 23, No. 3, pp. 1118-1127, 2014. K.M. Saipullah and D.H. Kim, “A robust texture feature extraction using the localized angular phase,” Multimedia Tools and Applications, Vol. 59, No. 3, pp. 717-747, 2012.

127

References

123. X.Y. Wang, B.B. Zhang, and H.Y. Yang, “Content-based image retrieval by integrating color and texture features,” Multimedia Tools and Applications, Vol. 68, No. 3, pp. 545-569, 2014. 124. C. Shahabi and M. Safar, “An experimental study of alternative shape-based image retrieval techniques,” Multimedia Tools and Applications, Vol. 32, No. 1, pp. 29-48, 2007. 125. J.M. Saavedra and B. Bustos, “Sketch-based image retrieval using keyshapes,” Multimedia Tools and Applications, Vol. 73, No. 3, pp. 2033-2062, 2014. 126. I. Andreou and N.M. Sgouros, “Utilizing shape retrieval in sketch synthesis,” Multimedia Tools and Applications, Vol. 32, No. 3, pp. 275-291, 2007. 127. H.H. Chen, J.J. Ding, and H.T. Sheu, “Image retrieval based on quadtree classified vector quantization,” Multimedia Tools and Applications, Vol. 72, No. 2, pp. 1961-1984, 2014. 128. C.A. Hernández-Gracidas, L.E. Sucar, and M. Montes-y-Gómez, “Improving image retrieval by using spatial relations,” Multimedia Tools and Applications, Vol. 62, No. 2, pp. 479-505, 2013. 129. A. Shamsi, H. Nezamabadi-pour, and S. Saryazdi, “A short-term learning approach based on similarity refinement in content-based image retrieval,” Multimedia Tools and Applications, Vol. 72, No. 2, pp. 20252039, 2014. 130. E. Rashedi, H. Nezamabadi-pour, and S. Saryazdi, “Information fusion between short term learning and long term learning in content based image retrieval systems,” Multimedia Tools and Applications, Vol. 74, No. 11, pp. 3799-3822, 2015. 131. M. Gevrekci and B.K. Gunturk, “Illumination robust interest point detection,” Computer Vision and Image Understanding, Vol. 113, No. 4, pp. 565–571, 2009. 132. S. Wang, J. Zheng, H.M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Transactions on Image Processing, Vol. 22, No. 9, pp. 3538–3548, 2013. 133. J. Zhu, “Logarithm Gradient Histogram : A General Illumination Invariant Descriptor for Face Recognition,” In Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2013, pp 1-8. 134. W. Chen, M.J. Er, and S. Wu, “Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain,” IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics, Vol. 36, No. 2, pp. 458–466, 2006. 135. P. Vacha and M. Haindl, “Image retrieval measures based on illumination invariant textural MRF features,” In Proceedings of the 6th ACM international conference on Image and video retrieval, 2007, pp 448-454. 136. A. Ranganathan, S. Matsumoto, and D. Ilstrup, “Towards illumination invariance for visual localization,” In Proceedings of the IEEE International Conference on Robotics and Automation, 2013 pp. 3791 – 3798. 137. F. Moreno-noguer and R.I. De, “Deformation and Illumination Invariant Feature Point Descriptor,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1593-1600. 138. R. Gupta and A. Mittal, “SMD : A Locally Stable Monotonic Change Invariant Feature Descriptor,” In Proceedings of the European Conference on Computer Vision, 2008, pp 265-277. 139. S. Zhao, Y. Gao, and B. Zhang, “SOBEL-LBP,” In Proceedings of the 15th IEEE International Conference on Image Processing, 2008, pp. 2144-2147. 140. K. Jeong, J. Choi, and G. Jang, “Semi-Local Structure Patterns for Robust Face Detection,” IEEE Signal Processing Letters, Vol. 22, No. 9, pp. 1400-1403, 2015. 141. A.P. James, “One-sample face recognition with local similarity decisions,” International Journal of Applied Pattern Recognition, Vol. 1, No. 1, pp. 61-80, 2013. 142. B. Zhang, L. Zhang, D. Zhang, and L. Shen, “Directional binary code with application to PolyU nearinfrared face database,” Pattern Recognition Letters, Vol. 31, No. 14, pp. 2337-2344, 2010. 143. W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local gabor binary pattern histogram sequence (lgbphs): A novel non-statistical model for face representation and recognition,” In Proceedings of the 10th IEEE International Conference on Computer Vision, 2005, pp. 786-791. 144. D. Huang, M. Ardabilian, Y. Wang, and L. Chen, “3-d face recognition using elbp-based facial description and local feature hybrid matching,” IEEE Transactions on Information Forensics and Security, Vol. 7, No. 5, pp. 1551-1565, 2012. 145. H.T. Nguyen and A. Caplier, “Local Patterns of Gradients for Face Recognition,” IEEE Transactions on Information Forensics and Security, Vol. 10, No. 8, pp. 1739-1751, 2015. 146. N.S. Vu, “Exploring patterns of gradient orientations and magnitudes for face recognition,” IEEE Transactions on Information Forensics and Security, Vol. 8, No. 2, pp. 295-304, 2013. 147. M. Yang, L. Zhang, S.K. Shiu, and D. Zhang, “Monogenic binary coding: An efficient local feature extraction approach to face recognition,” IEEE Transactions on Information Forensics and Security, Vol. 7, No. 6, pp. 1738-1751, 2012. 148. S.Z. Li, R.F. Chu, S.C. Liao, and L. Zhang, “Illumination Invariant Face Recognition Using Near-infrared Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 4, pp. 627-639, 2007.

128

References

149. K.P. Hollingsworth, S.S. Darnell, P.E. Miller, D.L. Woodard, K.W. Bowyer, and P.J. Flynn, “Human and machine performance on periocular biometrics under near-infrared light and visible light,” IEEE Transactions on Information Forensics and Security, Vol. 7, No. 2, pp. 588-601, 2012. 150. J.Y. Zhu, W.S. Zheng, J.H. Lai, and S.Z. Li, “Matching NIR Face to VIS Face Using Transduction,” IEEE Transactions on Information Forensics and Security, Vol. 9, No. 3, pp. 501-514, 2014. 151. T. Konda and Y. Nakamura, “A new algorithm for singular value decomposition and its parallelization,” Parallel Computing, Vol. 35, No. 6, pp. 331-344, 2009. 152. H. Andrews and C. Patterson, “Singular value decompositions and digital image processing,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 1, pp. 26-53, 1976. 153. K. Konstantinides, B. Natarajan, and G.S. Yovanof, “Noise estimation and filtering using block-based singular value decomposition,” IEEE Transactions on Image Processing, Vol. 6, No. 3, pp. 479-483, 1996. 154. R. Kakarala and P.O. Ogunbona, “Signal analysis using a multiresolution form of the singular value decomposition,” IEEE Transactions on Image Processing, Vol. 10, No. 5, pp. 724-735, 2001. 155. S. Wei, A. Nahapetian, M. Nelson, F. Koushanfar, and M. Potkonjak, “Gate characterization using singular value decomposition: Foundations and applications,” IEEE Transactions on Information Forensics and Security, Vol. 7, No. 2, pp. 765-773, 2012. 156. J.F. Yang and C.L. Lu, “Combined techniques of singular value decomposition and vector quantization for image coding,” IEEE Transactions on Image Processing, Vol. 4, No. 8, pp. 1141-1146, 1995. 157. R. Liu and T. Tan, “An SVD-based watermarking scheme for protecting rightful ownership,” IEEE Transactions on Multimedia, Vol. 4, No. 1, pp. 121-128, 2002. 158. G. Gul and F. Kurugollu, “SVD-based universal spatial domain image steganalysis,” IEEE Transactions on Information Forensics and Security, Vol. 5, No. 2, pp. 349-353, 2010. 159. S.K. Singh and S. Kumar, “A Framework to Design Novel SVD Based Color Image Compression,” In Proceedings of the 3rd UKSim European Symposium on Computer Modeling and Simulation, 2009, pp. 235240. 160. G. Bhatnagar, A. Saha, Q.M.J. Wu, and P.K. Atrey, “Analysis and extension of multiresolution singular value decomposition,” Information Sciences, Vol. 277, pp. 247-262, 2014. 161. S.K. Singh and S. Kumar, “Singular value decomposition based sub-band decomposition and multiresolution (SVD-SBD-MRR) representation of digital colour images,” Pertanika Journal of Science and Technology, Vol. 19, No. 2, pp. 229-235, 2011. 162. W. Kim, S. Suh, W. Hwang, and J.J. Han, “SVD Face: Illumination-Invariant Face Representation,” IEEE Signal Processing Letters, Vol. 21, No. 11, pp. 1336-1340, 2014. 163. K.P. Chandar, M.M. Chandra, M.R. Kumar, and B. Swarnalatha, “Preprocessing using SVD towards illumination invariant face recognition,” In Proceedings of the Recent Advances in Intelligent Computational Systems, 2011, pp. 051-056. 164. K. Mikolajczyk and C. Schmid, “An Affine Invariant Interest Point Detector,” In Proceedings of the 7th European Conference on Computer Vision, 2002, pp. 128-142. 165. Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2004, pp. 511-517. 166. Affine Covariant Features dataset - Robotics Research Group, http://www.robots.ox.ac.uk/~vgg/research/affine/. 167. Complex Illumination Datasets, http://vision.ia.ac.cn/Students/wzh/datasets/illumination/Illumination_Datasets.zip. 168. Image Database - Feature Detector Evaluation Sequences, http://lear.inrialpes.fr/people/mikolajczyk/. 169. G. Finlayson, S. Hordley, G. Schaefer, and G.Y. Tian, “Illuminant and device invariant colour using histogram equalization,” Pattern Recognition, Vol. 38, No. 2, pp. 179–190, 2005. 170. K. Lu, N. He, J. Xue, J. Dong, and L. Shao, “Learning View-Model Joint Relevance for 3D Object Retrieval,” IEEE Transactions on Image Processing, Vol. 24, No. 5, pp. 1449-1459, 2015. 171. D.S. Marcus, T.H. Wang, J. Parker, J.G. Csernansky, J.C. Morris, and R.L. Buckner, “Open access series of imaging studies (OASIS): Crosssectional MRI data in young, middle aged, nondemented, and demented older adults,” Journal of Cognitive Neuroscience, Vol. 19, No. 9, pp. 1498-1507, 2007. 172. NEMA–CT image database, ftp://medical.nema.org/ medical/Dicom/Multiframe/. 173. K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, and F. Prior, “The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository,” Journal of Digital Imaging, Vol. 26, No. 6, pp. 1045-1057, 2013. 174. P. Lo, B.V. Ginneken, J.M. Reinhardt, T. Yavarna, P.A.D. Jong, B. Irving, C. Fetita, M. Ortner, R. Pinho, J. Sijbers, and M. Feuerstein, “Extraction of airways from CT (EXACT'09),” IEEE Transactions on Medical Imaging, Vol. 31, No. 11, pp. 2093-2107, 2012.

129

References

175. The COREL Database for CBIR, https://sites.google.com/site/dctresearch/Home/content-based-imageretrieval. 176. MIT Vision and Modeling Group, Cambridge, „Vision texture‟, http://vismod.media.mit.edu/pub/. 177. Salzburg Texture Image Database, http://www.wavelab.at/sources/STex/. 178. Phos Dataset, http://robotics.pme.duth.gr/phos2.html. 179. V. Vonikakis, D. Chrysostomou, R. Kouskouridas, and A. Gasteratos, “A biologically inspired scale-space for illumination invariant feature selection,” Measurement Science and Technology, Vol. 24, No. 7, pp. 074024-074036, 2013. 180. V. Vonikakis, D. Chrysostomou, R. Kouskouridas, and A. Gasteratos, “Improving the Robustness in Feature Detection by Local Contrast Enhancement,” In Proceedings of the IEEE International Conference on Imaging Systems and Techniques, 2012, pp. 158 - 163. 181. H. Wang, S.Z. Li, and Y. Wang, “Face recognition under varying lighting conditions using self quotient image,” In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 819-824. 182. J. Ruiz-del-Solar and J. Quinteros, “Illumination compensation and normalization in eigenspace-based face recognition: A comparative study of different pre-processing approaches,” Pattern Recognition Letters, Vol. 29, No. 14, pp. 1966-1979, 2008. 183. K.K. Sung and T. Poggio, “Example-based learning for view-based human face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, pp. 39-51, 1998. 184. J. Ruiz-del-Solar and P. Navarrete, “Eigenspace-based face recognition: a comparative study of different approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 35, No. 3, pp. 315-325, 2005. 185. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 886-893. 186. LBP Matlab Code, http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab. 187. PolyU-NIR Face Database, http://www.comp.polyu.edu.hk/~biometrics/NIRFace/polyudb_face.htm. 188. CASIA-NIR Face Database, http://www.cbsr.ia.ac.cn/english/NIR_face%20Databases.asp.

130

Publications Journals 1.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Multichannel Decoded Local Binary Patterns for Content Based Image Retrieval,” IEEE Transactions on Image Processing, Vol. 25, No. 9, pp. 4018-4032, 2016.

2.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Local Wavelet Pattern: A New Feature Descriptor for Image Retrieval in Medical CT Databases,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5892-5903, 2015.

3.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Rotation and Illumination Invariant Interleaved Intensity Order Based Local Descriptor,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5323-5333, 2014.

4.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Local Bit-plane Decoded Pattern: A Novel Feature Descriptor for Biomedical Image Retrieval,” IEEE Journal of Biomedical and Health Informatics, Vol. 20, No. 4, pp. 1139-1147, 2016.

5.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Local Diagonal Extrema Pattern: A New and Efficient Feature Descriptor for CT Image Retrieval,” IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1215-1219, 2015.

6.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Local neighbourhoodbased robust colour occurrence descriptor for colour image retrieval,” IET Image Processing, vol. 9, no. 7, pp. 578-586, 2015.

7.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “A Novel Local Bit-plane Dissimilarity Pattern for CT Image Retrieval,” IET Electronics Letters, Vol. 52, No. 15, pp. 1290-1292, 2016.

8.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Rotation and scale invariant hybrid image descriptor and retrieval,” Computers & Electrical Engineering, vol. 46, pp. 288-302, 2015. (Elsevier)

9.

Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “A multi-channel based illumination compensation mechanism for brightness invariant image retrieval,” Multimedia Tools and Applications, vol. 74, no. 24, pp. 11223-11253, 2015. (Springer)

10. Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Boosting Performance of Local Descriptors with SVD Sub-band for Near-Infrared Face Retrieval,” IET Image Processing. (Submitted in the Revised Form) Conferences 11. Shiv Ram Dubey, Satish Kumar Singh, Rajat Kumar Singh, “Boosting Local Binary Pattern with Bag-of-Filters for Content Based Image Retrieval,” IEEE UP Section Conference on Electrical, Computer and Electronics (UPCON), 2015. (Best Paper Award)

131

robust image feature description, matching and ...

Jun 21, 2016 - Y. Xiao, J. Wu and J. Yuan, “mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene. Categorization,” IEEE Transactions on Image Processing, Vol. 23, No. 2, pp. 823-836, 2014. 110. I. Daoudi and K. Idrissi, “A fast and efficient fuzzy approximation-based indexing for CBIR,” Multimedia.

9MB Sizes 18 Downloads 286 Views

Recommend Documents

Robust Image Feature Description, Matching and ...
Nov 12, 2016 - ... Feature Generation Mechanism for Scene Categorization,” IEEE Transactions on Image. Processing, Vol. 23, No. 2, pp. 823-836, 2014. [37] http://www.robots.ox.ac.uk/~vgg/research/affine/. [38] http://vision.ia.ac.cn/Students/wzh/da

DNN Flow: DNN Feature Pyramid based Image Matching - BMVA
Figure 2: The sample patches corresponding to top activations on some dimensions of DNN features from ... hand, the dimensions of bottom level feature response the patches with similar simple pat- terns and with .... ferent viewpoints (3rd example),

Robust Stability in Matching Markets
Aug 14, 2010 - A matching problem is tuple (S, C, P,≻,q). S and C are finite and disjoint sets of students and schools. For each student s ∈ S, Ps is a strict ...

Efficient and Robust Feature Selection via Joint l2,1 ...
1 p and setting ui = ∥ai∥r and vi = ∥bi∥r, we obtain. (. ∑ i. ∥ai∥p r. )1 p. + ... the second inequality follows the triangle inequality for ℓr norm: ∥ai∥r+∥bi∥r ...

1 Introduction 2 Feature extraction and matching
The demonstration will present a html-based visualization tool that we recently built in order to be able to directly see and assess the results ... tional probability distributions or as a simple stochastic shape-emission process characterized by a.

feature extraction & image processing for computer vision.pdf ...
feature extraction & image processing for computer vision.pdf. feature extraction & image processing for computer vision.pdf. Open. Extract. Open with. Sign In.

Multiresolution Feature-Based Image Registration - CiteSeerX
Jun 23, 2000 - Then, only the set of pixels within a small window centered around i ... takes about 2 seconds (on a Pentium-II 300MHz PC) to align and stitch by ... Picard, “Virtual Bellows: Constructing High-Quality Images for Video,” Proc.

feature matching in model-based software engineering
There is a growing need to reduce the cycle of business information systems development and make it independent of .... Examples of this approach to the engineering of embedded and .... memory management, synchronization, persistence ...

Robust Feature Extraction via Information Theoretic ...
Jun 17, 2009 - Training Data. ▫ Goal:Search for a ... Label outliers: mislabeling of training data. ▫ Robust feature ... Performance comparison on MNIST set ...

user verification: matching the uploaders of videos ... - Description
build massive corpora out of “wild” videos, images, and au- dio files. While the ... from posted text across different social networking sites, link- ing users by ...

Robust point matching method for multimodal retinal ...
Gang Wang, Zhicheng Wang∗, Yufei Chen, Weidong Zhao. CAD Research Center, Tongji University, No. 4800, Cao'an Highway, ... [email protected] (W. Zhao). Recently, many related registration approaches have been ...... 110 (3) (2008) 346–359. [37] A.

CVPR'00: Reliable Feature Matching across Widely ...
linear transformations of the image data including rotation, stretch and skew. .... varied in the calculation of M – the integration scale, and the “local scale” at ...

Robust Face-Name Graph Matching for Movie ...
Dept. of Computer Science and Engineering, KVG College of Engineering, Sullia .... Principal Component Analysis is to find the vectors that best account for the.

feature matching in model-based software engineering
the business processes change, supporting software ... without impeding the business process and without a .... (FAST) developed in AT&T (Weiss, 1996), and.

Robust Feature Extraction via Information Theoretic ...
function related to the Renyi's entropy of the data fea- tures and the Renyi's .... ties, e.g., highest fixed design breakdown point (Miz- era & Muller, 1999). Problem ...

A Robust Color Image Quantization Algorithm Based on ...
Clustering Ensemble. Yuchou Chang1, Dah-Jye Lee1, Yi Hong2, James Archibald1, and Dong Liang3. 1Department of Electrical and Computer Engineering, ...

Robust Content Based Image Retrieval system using ... - Dipan Pal
robust classification shows that the image database can be expanded while ... algorithm which tries to capture the data modeling and processing ... input during learning of the spatial pooler is compared with .... big advantage in scalability.

Robust Content Based Image Retrieval system using ... - Dipan Pal
machine learning, pattern recognition and computer vision to search large image databases. Some CBIR systems employ processing over text annotations and ...