RECONSTRUCTION OF 3-D OBJECTS FROM 2-D VIEWS: SIMULATION, APPLICATIONS & ERROR ANALYSIS

Ph. D. THESIS

by SANJEEV KUMAR

DEPARTMENT OF MATHEMATICS INDIAN INSTITUTE OF TECHNOLOGY ROORKEE ROORKEE- 247 667 (INDIA) MARCH, 2008

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE ROORKEE

CANDIDATE’S DECLARATION I hereby certify that the work which is being presented in the thesis entitled “RECONSTRUCTION OF 3-D OBJECTS FROM 2-D VIEWS: SIMULATION, APPLICATIONS & ERROR ANALYSIS”, in fulfillment of the requirement for the award of the Degree of Doctor of Philosophy and submitted in the Department of Mathematics of the Indian Institute of Technology Roorkee is an authentic record of my own work carried out during a period from July, 2004 to March, 2008 under the supervision of Dr. N. Sukavanam and Dr. R. Balasubramanian.. The matter presented in this thesis has not been submitted by me for the award of any other degree of this or any other Institute.

Dated: 10th March, 2008

(SANJEEV KUMAR)

This is to certify that the above statement made by the candidate is correct to the best of our knowledge. Date: 10th March, 2008

(Dr. N. Sukavanam) Associate Professor Department of Mathematics, Indian Institute of Technology, Roorkee – 247 667, India.

(Dr. R. Balasubramanian) Assistant Professor Department of Mathematics Indian Institute of Technology Roorkee – 247 667, India

The Ph.D. Viva-Voce examination of Mr. Sanjeev Kumar, Research Scholar has been held on 1st September 2008. Signature of Supervisors

Signature of External Examiner

i

ii

Abstract

Recovering, reconstructing and recognizing 3D objects from a set of 2D images has been one of the core topics of interest for many researchers in the field of computer vision. Most researchers had concentrated their efforts on obtaining the structural parameters of 3D objects from one or more views. The methodologies proposed in this thesis involve mathematical models for reconstruction of algebraic curves from arbitrary perspective images and its error analysis in the absence of motion. The 2D image obtained from the projection of a 3D object depends on the calibration parameters of the corresponding cameras. A hybrid approach is also presented for stereo camera calibration. This thesis, comprising of eight chapters, is concerned with the formulation of appropriate mathematical model for reconstruction of algebraic curves, camera calibration, integration of reconstruction techniques and some applications of these techniques in various fields. The first chapter presents a brief description of various reconstruction problems in 3D space. This is followed by the motivation for the studies and a brief review of some salient work in the related field. The second chapter iii

iv

contains some necessary concepts, definitions and algorithms from stereo reconstruction, camera calibration, artificial neural network (ANN) and genetic algorithm (GA) that will be used in subsequent chapters. In chapter 3, a novel approach is introduced for reconstruction of algebraic curve in 3D space from arbitrary perspective views. This approach takes care of uniqueness, generalization and noisy environment. The main advantage of the proposed technique is to overcome the matching problem that occurs in pair of projections of the curve. This chapter also contains the estimation of error in this reconstruction approach. Simulation results are presented to evaluate and demonstrate reconstruction methodology using synthetic as well as real data. In chapter 4, the camera calibration problem is modeled as a nonlinear optimization problem and solved using a Laplace crossover and power mutation (LX-PM) based real coded GA. The results obtained from GA are used as seed of the weight vectors of feed-forward neural network. It can be seen from the simulation results that the proposed hybridization of GA and ANN is more accurate and robust to solve the camera calibration problem. In chapter 5, a neural network based integration of shape from shading (SFS) and stereo has been proposed. In some of the existing algorithms, the systems that integrate SFS and stereo vision into one system use stereo vision for the initialization and SFS for the boundary conditions. However, these approaches may allow the propagation of errors from stereo vision to the solution of SFS. In this chapter, stereo

v

vision and shape from shading have been used as constraints on the depth map information simultaneously. A feed-forward neural network has been used for the integration. Divide and conquer technique have been used in the network training process for reducing the computational cost of proposed integration algorithm. In chapter 6, an application of 3D reconstruction in stereo image coding via digital watermarking is presented. The original (left) image is degraded by means of ZIGZAG sequence and transformed into fractional Fourier transform (FrFT) domain. Singular value decomposition (SVD) is performed on the transform degraded image as well as watermark. Right disparity map has been used as a watermark. The effects of various watermark attacks have been studied. In chapter 7, an application of stereo vision in robot control is presented. The task under consideration is to control the manipulator, such that the tip of a tool grasped by the end-effector follows an unknown path on a surface with the help of a pair of stereo cameras mounted on the manipulator and pressure sensor at the end-effector. The control algorithm utilizes pressure information and visual sensors simultaneously. Inverse kinematics has been solved for a redundant manipulator for tracking the resultant path. An optimization based approach is presented to solve inverse kinematics by converting it into a nonlinear optimization problem. An improved energy function is defined to solve the optimization problem even in the case when the matrix associated with objective function is not positive definite. The stability analysis has been done for the proposed algorithm using Lyapunov method.

vi

The result has been illustrated through simulation of inverse kinematics solution for a seven arm redundant manipulator. In chapter 8, the salient contributions of the work described in this thesis are given, with the future scope of work in this field.

Acknowledgements Starting with my thanking note, I first of all thank GOD for providing me the opportunity to pursue higher studies under the guidance of Dr. N. Sukavanam, Associate Professor and Dr. R. Balasubramanian, Assistant Professor, Department of Mathematics, I.I.T. Roorkee. I feel privileged to express my sincere regards and gratitude to my supervisors for their valuable guidance, and constant encouragement throughout the course of my research work. The critical comments, rendered by them during the discussions are deeply appreciated. I pay my hearted and deep tributes to all the researchers in the world around, working for the development of Science and Technology for the betterment and enlightenment of the society and feeling to be the part of that community gives me a great pride and pleasure. I am highly obliged to Prof. S. P. Sharma, Head, Department of Mathematics, I.I.T. Roorkee, all the members of Student Research Committee, Department Research Committee and faculty members for providing me encouragement and necessary facilities for carrying out my research work. My sincere thanks are due to Dr. Vikas Panwar, Department of Mathematics, vii

viii

CDLU Sirsa and Dr. Manoj Thakur, Research Analyst, Marketopper, Gurgaon, India for their valuable suggestions and generous help. My heartfelt thanks are to my wife Asha, without whose support this would not have been possible. She supported me unconditionally all through my research. It is difficult to find adequate words to express my appreciation for the help given by her. I am highly thankful for the facilities that I availed from Computer Vision Graphics & Image Processing Laboratory, Department of Mathematics, IIT Roorkee, during my research work. I am also grateful to my friends Nutan, K.P. Singh, Ajay, Navin, Rishi, Satish, Amit, Vipin, Harendra, Jai, Manoj, Gaurav Bhatnagar, Gaurav Gupta, Jugmendra, Alok and many more for their timely help and the moral support they provided me during my research work. I wholeheartedly thank my parents and all other family members for their enduring patience and for providing the moral support during the course of this work. I thank Council of Scientific and Industrial Research (CSIR), New Delhi, India for the financial support to carry out this research work.

Roorkee March , 2008

(Sanjeev Kumar)

Table of Contents Abstract

iii

Acknowledgements

vii

Table of Contents

ix

1 INTRODUCTION

1

1.1

General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Problem of reconstruction in 3D space . . . . . . . . . . . . . . . . .

3

1.3

Motivation for the Study . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.4

Brief Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.5

Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2 PRELIMINARIES 2.1

27

Binocular Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.1.1

Correspondence Analysis . . . . . . . . . . . . . . . . . . . . .

29

2.1.2

Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . .

31

2.1.3

The Pinhole Camera Model . . . . . . . . . . . . . . . . . . .

33

2.1.4

Camera Parameters . . . . . . . . . . . . . . . . . . . . . . . .

34

2.1.5

Epipolar Constraints . . . . . . . . . . . . . . . . . . . . . . .

36

2.1.6

Rectification Process . . . . . . . . . . . . . . . . . . . . . . .

37

2.1.7

Disparity Map . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

ix

x

2.2

Shape from Shading

. . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.3

Artificial Neural networks . . . . . . . . . . . . . . . . . . . . . . . .

41

2.3.1

Characteristics of Neural Networks . . . . . . . . . . . . . . .

43

2.3.2

Functional Approximation Capabilities of Neural Networks . .

44

2.3.3

Feed-Forward Neural Networks . . . . . . . . . . . . . . . . .

45

2.3.4

Function Approximation Property . . . . . . . . . . . . . . . .

47

2.3.5

Error Backpropagation Algorithm . . . . . . . . . . . . . . . .

48

2.3.6

Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . .

48

Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

2.4.1

52

2.4

Distinguishing Characteristics . . . . . . . . . . . . . . . . . .

3 RECONSTRUCTION OF ALGEBRAIC CURVES IN 3D SPACE FROM 2D ARBITRARY PERSPECTIVE VIEWS

55

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.2

Imaging Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

3.3

Reconstruction Methodology . . . . . . . . . . . . . . . . . . . . . . .

59

3.4

Case of Quadratic Curves . . . . . . . . . . . . . . . . . . . . . . . .

63

3.4.1

Mathematical Formulation . . . . . . . . . . . . . . . . . . . .

63

3.4.2

Reconstruction Results . . . . . . . . . . . . . . . . . . . . . .

67

3.4.3

Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

Case of Cubic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

3.5.1

Mathematical Formulations . . . . . . . . . . . . . . . . . . .

79

3.5.2

Reconstruction Results . . . . . . . . . . . . . . . . . . . . . .

80

3.5.3

Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

Reconstruction Using Real Data . . . . . . . . . . . . . . . . . . . . .

86

3.6.1

Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

3.5

3.6

3.7

xi

4 A HYBRID APPROACH FOR CAMERA CALIBRATION IN STEREO VISION SYSTEM

93

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

4.2

Stereo Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . .

95

4.3

Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

4.4

Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.4.1

Encoding of Chromosomes . . . . . . . . . . . . . . . . . . . . 100

4.4.2

Laplace Crossover . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.4.3

Power Mutation Operator . . . . . . . . . . . . . . . . . . . . 101

4.5

Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.6

Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.6.1

4.7

Generation of Synthetic Data . . . . . . . . . . . . . . . . . . 104

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5 A NEURAL NETWORK BASED INTEGRATION OF SHAPE FROM SHADING AND STEREO

115

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.2

Stereo Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.3

5.2.1

Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.2.2

Stereo Imaging System . . . . . . . . . . . . . . . . . . . . . . 118

5.2.3

Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . 119

5.2.4

Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.2.5

Disparity Estimation . . . . . . . . . . . . . . . . . . . . . . . 121

Shape from Shading Algorithm . . . . . . . . . . . . . . . . . . . . . 122 5.3.1

Reflectance Maps . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.3.2

Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.4

Integration of the Shape from Shading and Stereo Data . . . . . . . . 126

5.5

Neural network Architecture . . . . . . . . . . . . . . . . . . . . . . . 127

5.6

A Brief Overview of the Whole Process . . . . . . . . . . . . . . . . . 130

xii

5.7

5.8

Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.7.1

Experiments with synthetic images . . . . . . . . . . . . . . . 131

5.7.2

Experiments with real images . . . . . . . . . . . . . . . . . . 134

5.7.3

Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6 WATERMARKING BASED STEREO IMAGE CODING

139

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.2

Disparity Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.3

6.2.1

Normalization of Stereo Pairs . . . . . . . . . . . . . . . . . . 142

6.2.2

Sum of Square Differences (SSD) Algorithm . . . . . . . . . . 143

Fractional Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 143 6.3.1

Computation of the Fractional Fourier Transform . . . . . . . 144

6.3.2

The Discrete Fractional Fourier Transform . . . . . . . . . . . 145

6.4

Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . 146

6.5

Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.5.1

Image Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.5.2

Image Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.6

Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 150

6.7

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7 VISION BASED KINEMATIC CONTROL OF A REDUNDANT MANIPULATOR

161

7.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

7.2

Robot Kinematics and Dynamics . . . . . . . . . . . . . . . . . . . . 164

7.3

7.2.1

Inverse Kinematics Problem . . . . . . . . . . . . . . . . . . . 164

7.2.2

Robot Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 165

Constrained Motion and Control Algorithm . . . . . . . . . . . . . . 166 7.3.1

Constrained Motion . . . . . . . . . . . . . . . . . . . . . . . 166

xiii

7.3.2 7.4

Setup and Control Design . . . . . . . . . . . . . . . . . . . . 168

Solution of Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . 174 7.4.1

PA-10 Manipulator . . . . . . . . . . . . . . . . . . . . . . . . 174

7.4.2

Proposed Algorithm to Solve Inverse Kinematics . . . . . . . . 175

7.4.3

Convergence and Stability Analysis . . . . . . . . . . . . . . . 176

7.5

A Brief overview of Overall Control Process . . . . . . . . . . . . . . 179

7.6

Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 180

7.7

7.6.1

Results for Tracking . . . . . . . . . . . . . . . . . . . . . . . 180

7.6.2

Results for Inverse Kinematics . . . . . . . . . . . . . . . . . . 185

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

8 CONCLUSIONS AND FUTURE SCOPE

193

8.1

Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

8.2

Future Scope of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Bibliography

196

xiv

List of Tables 4.1

Left Camera Parameters (Ground Truth [80] and Bounds) . . . . . . 106

4.2

Right Camera Parameters (Ground Truth and Bounds) . . . . . . . . 107

4.3

Results for Left and Right Camera Parameters using LX-PM GA and GA-ANN hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . 108

5.1

Average depth map error for synthetic images . . . . . . . . . . . . . 137

5.2

Standard deviation of depth map error for synthetic images . . . . . . 137

6.1

Fractional Fourier Transform of Some Simple Signals . . . . . . . . . 147

6.2

PSNR between original, original degraded, watermarked degraded and watermarked images . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6.3

Correlation coefficient between original and extracted watermark . . . 153

7.1

Root mean square error between simulated and desired trajectories . 191

xv

xvi

List of Figures 2.1

Binocular stereo image formation . . . . . . . . . . . . . . . . . . . .

28

2.2

Epipolar geometry: (a) General, (b) Standard . . . . . . . . . . . . .

32

2.3

(a) Pinhole Camera Model, (b) Disparity . . . . . . . . . . . . . . . .

34

2.4

Intrinsic and Extrinsic parameters of the Camera . . . . . . . . . . .

35

2.5

A Simple Shape from Shading Process . . . . . . . . . . . . . . . . .

41

2.6

example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

2.7

example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

2.8

Flow Chart for Genetic Algorithm . . . . . . . . . . . . . . . . . . . .

53

3.1

example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

3.2

Reconstruction of a parabola in 3D (top) in the case of perfect stereo and in the absence of noise. Projection of the original and reconstructed points in the third view (bottom). Equation of the parabola is considered as xw (t) = 16t2 , yw (t) = 4t and zw (t) = 40. In this case the parameters are f1 = f2 = 1, xd = 100, yd = 0.0, zd = 0.0. . . . . . xvii

69

xviii

3.3

Reconstruction of a parabola in 3D in the case of perfect stereo and in the presence of Gaussian noise of variance 1.0 (top) and variance 5.0 (bottom). Equation of the parabola is considered as in figure 3.2. . .

3.4

70

Reconstruction of an ellipse in 3D (top) in the case of perfect stereo and in the absence of noise. Projection of the original and reconstructed points in the third view (bottom). Equation of the ellipse is considered as xw (t) = 5 + 3cos(t) − 2sin(t), yw (t) = 4 − 2cos(t) + 3sin(t) and zw (t) = .82 − .02cos(t) − .02sin(t). In this case the parameters are f1 = f2 = 1, xd = 15, yd = 0.0, zd = 0.0. . . . . . . . . . . . . . . . . .

3.5

71

Reconstruction of an ellipse in 3D in the case of perfect stereo and in the presence of Gaussian noise of variance 1.0 (top) and variance 5.0 (bottom). Equation of the ellipse is considered as in figure 4. . . . . .

3.6

72

Reconstruction of a pair of straight lines in 3D (top) in the case of perfect stereo and in the absence of noise. Projection of the original and reconstructed points in the third view (bottom). Equation of the pair of straight lines is considered as xw (t) = 4 + 5t, yw (t) = 5 + t and zw (t) = 10, xw (t) = 4+4t1 , yw (t) = 5+5t1 and zw (t) = 10. In this case the parameters are f1 = f2 = 1,0 ≤ t ≤ 1, xd = 20, yd = 0.0, zd = 0.0. .

73

xix

3.7

Reconstruction of a pair of straight lines in 3D in the case of perfect stereo and in the presence of Gaussian noise of variance 1.0 (top) and variance 5.0 (bottom). Equation of the pair of straight lines is considered as in figure 3.6. . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.8

74

Reconstruction of a parabola in 3D (top) in general case and in the absence of noise. Projection of the original and reconstructed points in the third view (bottom). Equation of the parabola is considered as xw (t) = 40 + 9.96195t − 0.31751t2 , yw (t) = 70 + .59696t + 4.980975t2 and zw (t) = 100 + .63502t + .29848t2 , In this case the parameters are f1 = f2 = 1,−4 ≤ t ≤ 4, xd = 30, yd = 80, zd = 30, θ = π4 , φ = π4 , ψ = π4 . 75

3.9

Reconstruction of a parabola in 3D in the general case and in the presence of Gaussian noise of variance 0.2 (top) and variance 1.0 (bottom). Equation of the pair of parabola is considered as in figure 3.8. . . . .

76

3.10 Error analysis in reconstruction of parabola in prefect stereo (top), ellipse in prefect stereo(midle) and parabola in general case. Equation of the parabola is considered as xw (t) = 16t2 , yw (t) = 4t and zw (t) = 40. In this case the parameters are f1 = f2 = 1, xd = 100, yd = 0.0, zd = 0.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

3.11 Reconstruction of a cubic in prefect stereo case. The value of threshold ² is considered 1 × 10−5 in the reconstruction process. . . . . . . . . .

81

xx

3.12 Reconstruction of a cubic curve in prefect stereo case in presence of Gaussian noise: σ = 0.001 (top), and σ = 1.0 (bottom). The ² is considered 0.001 (top) and 0.1 (bottom). . . . . . . . . . . . . . . . .

82

3.13 Reconstruction of a cubic curve in general stereo case with ² = 0.001 and in absence of noise. . . . . . . . . . . . . . . . . . . . . . . . . . .

83

3.14 Reconstruction in general stereo case and in presence of Gaussian noise: σ = 0.0001 (top), and σ = 1.0 (bottom). The ² is considered 0.001 (top) and .1 (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

3.15 Effect of Gaussian noise in reconstruction of cubic curves. . . . . . . .

85

3.16 Two dimensional images containing a circular object . . . . . . . . .

87

3.17 Detected circle from the two dimensional images using Canny edge detection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

3.18 Results for the reconstruction of circle in 3D space from a set of 2D images : third view(top) and 3D circle (bottom).

. . . . . . . . . . .

90

4.1

Model for stereo imaging setup . . . . . . . . . . . . . . . . . . . . .

97

4.2

Calibration Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.3

Behavior of camera intrinsic parameters in presence of image pixel noise (σ). Estimated error in focal lengths (top); Estimated error in lens distortion coefficient (bottom). . . . . . . . . . . . . . . . . . . . 109

4.4

Behavior of camera intrinsic parameters in presence of image pixel noise (σ). Estimated error in Nx (top); Estimated error in Ny (bottom). . . 110

xxi

4.5

Behavior of camera intrinsic parameters in presence of image pixel noise (σ). Estimated error in u0 (top); Estimated error v0 (bottom). . . . . 111

4.6

Behavior of camera extrinsic parameters in presence of image pixel noise (σ). Estimated error in left camera’s translation parameters (top); Estimated error in right camera’s translation parameters (bottom).112

4.7

Behavior of camera extrinsic parameters in presence of image pixel noise (σ). Estimated error in left camera’s rotation parameters (top); Estimated error in right camera’s rotation parameters (bottom). . . . 113

5.1

(a) A general stereo imaging system, (b) A rectified stereo imaging system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.2

Block diagram of overall integration process. . . . . . . . . . . . . . . 128

5.3

Architecture of artificial neural network . . . . . . . . . . . . . . . . . 129

5.4

Results for the Mozart: (a) left image, (b) right image, (c) 3D sparse stereo depth, (d) depth obtained using shape from SFS alone, (e) depth obtained using integration of stereo and SFS. . . . . . . . . . . . . . . 132

5.5

Results for the Vase: (a) left image, (b) right image, (c) 3D sparse stereo depth, (d) depth obtained using SFS alone, (e) depth obtained using integration of stereo and SFS. . . . . . . . . . . . . . . . . . . . 133

5.6

Results for the Pentagon: (a) left image, (b) right image, (c) 3D sparse stereo depth, (d) depth obtained using SFS alone, (e) depth obtained using integration of stereo and SFS . . . . . . . . . . . . . . . . . . . 135

xxii

5.7

Results for the Tsukuba: (a) left image, (b) right image, (c) 3D sparse stereo depth, (d) depth obtained using SFS alone, (e) depth obtained using integration of stereo and SFS . . . . . . . . . . . . . . . . . . . 136

6.1

Block diagram of proposed method. . . . . . . . . . . . . . . . . . . . 149

6.2

Cone Image: (a) Left Stereo Image, (b) Right Stereo Image, (c) Right Disparity Map, (d)Original degraded image, (e) Watermarked Degraded Image, (f) Watermarked Image, (g) Extracted Disparity, (h) 3D view from original Disparity, (i) 3D view from extracted Disparity. . . . . . 154

6.3

Tsukuba Image: (a) Left Stereo Image, (b) Right Stereo Image, (c) Right Disparity Map, (d)Original degraded image, (e) Watermarked Degraded Image, (f) Watermarked Image, (g) Extracted Disparity, (h) 3D view from original Disparity, (i) 3D view from extracted Disparity. 155

6.4

Pentagon Image: (a) Left Stereo Image, (b) Right Stereo Image, (c) Right Disparity Map, (d)Original degraded image, (e) Watermarked Degraded Image, (f) Watermarked Image, (g) Extracted Disparity, (h) 3D view from original Disparity, (i) 3D view from extracted Disparity. 156

6.5

3 × 3 average filtering results:, (a) Cone Imag, ( b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

6.6

3 × 3 median filtering results:, (a) Cone Image, (b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

xxiii

6.7

50% Additive Gaussian noise results: (a) Cone Image, (b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . 157

6.8

Results for resizing attack: (a) Cone Image, (b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.9

10◦ rotation attack results: (a) Cone Image, (b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.10 50 : 1 JPEG compression results: (a) Cone Image, (b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.11 Cropping results: (a) Cone Image, (b) Tubushka Image, (c) Pentagon Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.1

Setup for image based servo control . . . . . . . . . . . . . . . . . . . 168

7.2

Tracking between points P and Q . . . . . . . . . . . . . . . . . . . . 172

7.3

Coordinate system of the PA-10 manipulator

7.4

Surface S1 and marker points on it (Ground Truth) . . . . . . . . . . 181

7.5

Surface S2 and marker points on it (Ground Truth) . . . . . . . . . . 181

7.6

Surface S3 and marker points on it (Ground Truth) . . . . . . . . . . 182

7.7

The angle φ which makes the direction cosine of the end effector normal

. . . . . . . . . . . . . 174

to the surface S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.8

The angle φ which makes the direction cosine of the end effector normal to the surface S2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

i

7.9

The angle φ which makes the direction cosine of the end effector normal to the surface S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.10 The tracked trajectory on surface S1 by the end effector . . . . . . . . 184 7.11 The tracked trajectory on surface S2 by the end effector . . . . . . . . 184 7.12 The tracked trajectory on surface S3 by the end effector . . . . . . . . 185 7.13 Representation of desired trajectory Γ2 in 3D space . . . . . . . . . . 186 7.14 Desired cartesian coordinates rd (t) corresponding to time during the simulation on trajectory Γ1 . . . . . . . . . . . . . . . . . . . . . . . . 186 7.15 Desired cartesian coordinates rd (t) corresponding to time during the simulation on trajectory Γ2 . . . . . . . . . . . . . . . . . . . . . . . . 187 7.16 Desired velocities r˙d (t) corresponding to time during the simulation on trajectory Γ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.17 Desired velocities r˙d (t) corresponding to time during the simulation on trajectory Γ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.18 Simulated joint velocities q(t) ˙ corresponding to time during the simulation on trajectory Γ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.19 Simulated joint velocities q(t) ˙ corresponding to time during the simulation on trajectory Γ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 7.20 Simulated joint angles q(t) corresponding to time during the simulation on trajectory Γ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

ii

7.21 Simulated joint angles q(t) corresponding to time during the simulation on trajectory Γ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Chapter 1 INTRODUCTION This chapter describes motivation leading to the presentation of this thesis, brief literature review and outline of the thesis.

1.1

General Introduction

There are mainly five senses: vision, taste, smell, touch and hearing that keep us in touch with our surrounding, allowing us to perceive and interact with our environment. Reality is defined in the following way: “Reality is nothing more than the capability of our senses to be wrong” by Albert Einstein. However, human senses provide the only way of acquiring information from reality. Vision is considered as the most developed human sense out of these five. How does vision work? How do we see a 3D world? For nearly two millennia, the generally accepted theory was that the eyes send out probing rays, which feel the world. This notion persisted until early 17th century, when Kepler published the first theoretical explanation of the optics of the eye in 1604 and Scheiner experimentally observed the existence of images formed at the rear of the eyeball in 1625. Those 1

2

discoveries raised another question: How does depth perception work? How, from 2D images projected on the retina, we perceive three dimensions in the world?

The advent of computers in the mid-20th century has created interest in vision. Later, researchers have introduced the concepts of computer vision. Computer vision tries to copy the way human being perceive visual information by means of using cameras as eyes and computers as processer of image information in an intelligent way. Some of the applications of Computer vision are vehicle guidance, visual surveying, industrial manufacturing, robot navigation, sports (hawk-eye) and so on. In robot navigation, the real-time performance is often more important than the measurement accuracy. On the other hand, manufacturing or assembling sophisticated mechanical parts may require high precision 3D measurement even at the cost of production speed. Thus the model of a vision system is made according to the application requirements.

The extracted 3D information of an object in space, or of a scene in real world, can be typically a range map, or a set, of 3D coordinates. The methods of obtaining the depth information of a 3D scene in real world without affecting the scene (non-contact depth acquisition) are classified into two categories: passive and active methods. The former is generally based on solving an inverse problem of the process of projecting a 3D scene or object onto 2D image planes. The latter is accomplished by emitting some radio or light energy to a target scene and receiving their reflections. Passive

3

techniques are often referred as shape from x, where x can be shading, texture, contour, focus, stereo, motion, and so on. These passive techniques are roughly classified into two approaches: One is based on solving photometric inverse problem using one or more images taken mainly from a single viewpoint and the other is based on solving geometric inverse problems using multiple images taken from different view points.

1.2

Problem of reconstruction in 3D space

Obtaining information about a 3D object from a set of given (one or more) two dimensional images of it, is called the problem of reconstruction. The images may be either orthographic or perspective projections. A set of 2D digital images and parameters of the viewing geometry are the basic inputs to this problem. Obtaining the structural parameters of the model representing the 3D object is the corresponding output. The following are the steps involved in the process of reconstruction of a 3D object: • Identify corresponding 2D features of the projection of the object on the image planes. • Obtain values of parameters of the 2D geometrical features of the projection of the object on the image planes. • Establish correspondence between the 2D features projected on the image planes. • Use inverse projective equations to compute the parameters of features of the structure of the 3D object.

4

• Visualize 3D object/model using line drawings or shading model. The problem of reconstruction of a 3D object from 2D images is a very complex process as the available images are only 2D projections (perspective/orthographic) of the object. There are two major difficulties to be dealt in reconstruction process. The first one is to obtain the corresponding points/lines/curves for reconstruction and the second is to obtain inverse mathematical functions based on appropriate camera models, e.g. perspective (calibrated /uncalibrated), weak perspective, affine etc. The 2D images obtained from the projection of a 3D object depends on the relative position of the object with respect to the cameras, line of sight and other parameters of the viewing geometry of the cameras. The problem generally becomes ill-posed in the presence of large amount of noise.

1.3

Motivation for the Study

The problem of reconstruction of 3D object from a set of 2D images can be divided into two subproblems: 1. Correspondence Problem and 2. Triangulation A lot of research has been done on 3D reconstruction using this pattern. A few methods are reported on the reconstruction of algebraic curves without solving the correspondence problem. It is also necessary to evaluate the performance of the reconstruction process in the presence of noise. The another issue is camera calibration in

5

3D reconstruction. It is impossible to reconstruct a 3D object from 2D images without having the knowledge of the intrinsic and the extrinsic parameters of camera. This has motivated the study of developing appropriate techniques for reconstruction of algebraic curves without establishing the correspondence and camera calibration in stereo imaging system. Lee et al., (1994) have shown that the assumption of orthographic projection is not valid when an object is not far away from the camera as it causes severe reconstruction errors in many real applications. Hence, in this thesis, only perspective projections are considered for the reconstruction of 3D object primitives. It is well known that the individual vision modules (stereo or shading) can not accurately reconstruct the surface shape. Rather stereo determines depths at surface features, often only sparsely distributed while shading is not able to obtain accurate depth values on the boundary. This has motivated to integrate the best possible reconstruction of 3D objects. Development of a method is not useful without applications in real life. In this thesis, we have implemented the 3D reconstruction technique in some application domains like stereo image coding and robot control.

1.4

Brief Literature Review

In earlier days, the main thrust of research in the field of image processing was concentrated on improving the quality of transmitted images, Gonzalez, (1987), Schalkoff, (1989). However, with technological advancements in hardware, the interest shifted

6

towards better understanding of the scene. Recovering, reconstructing and recognizing 3D objects from a set of 2D images have been one of the core topics of interest for many researchers in the field of computer vision, Longuet-Higgins, (1981), Blostein et al., (1987), Faugeras, (1987), Nagendra et al, (1988), Rodriguez et al., (1988), Weng et al., (1988), Alevertos, (1989), Aloimonos, (1990), Safaee-Rad et al., (1992), Xie et al., (1992), Hartley, (1994), Shahsua et al., (1994), Xie, (1994), Andersson et al., (1995), Quan, (1996), Hartley, (1997), Dornaika et al., (2000), Grossoman et al., (2005), Quan et al., (2007). More literature review on 3D reconstruction can be seen in Balasubramanian, (2001), PhD thesis. The problem of generation of 3D solid object from a given 2D orthographic views has been the subject of study for the number of years. Nagendra et al., (1988), have given a brief review of collection of eleven papers published between 1973 and 1984 on this topic. They have discussed relevant features of these algorithms in a comprehensive table form. Some of these algorithms are very interesting from a mathematical point of view. They have presented formal mathematical treatment of the geometry involved in the construction of three-dimensional objects. Later several researchers have studied shape from shading problems using orthographic projections, Bichsel et al., (1992), Lee et al., (1994), Hougen et al., (1996), Cho et al., (1997), Birkbeck et al., (2006). Most of the research works that have been done on the problem of obtaining the structural parameters of 3D objects from one or more perspective views are in both

7

presence and absence of motion. Many of the researchers have considered noisy images. Longuet-Higgins, (1981), has given a 2-stage algorithm for finding eight or more 3D spatial points and estimating the rigid motion parameters from the corresponding 2D points of different image frames. Weng et al., (1988), have presented a robust approach for estimating the motion and structure from image sequences. Weng et al., (1989) have given an algorithm for the estimation of motion parameters and the structure of the scene from point correspondences, between two perspective views. These algorithms have been tested on images of real world scenes with automatic computations of point correspondence. An algorithm for the reconstruction of a 3D object, using stereo and motion have been presented by Grosso et al., (1989). They have shown a method to integrate the range data, computed from stereo matching and optical flow, which ends with a 3D representation of the solid in view. Oliensis et al., (1999) have demonstrated the existence of a new, approximate and intrinsic ambiguity in Euclidean structure from motion (SFM). Dornaika et al., (2000) have introduced a new general framework of cooperation between stereo and motion. This framework combines the advantages of both cases: (1) easy correspondence from motion and (2) accurate 3D reconstruction from stereo. Further works on motion analysis have been reported, part of which also involve in the problems of reconstruction of lines and curves. Lhuillier et al., (2003), have proposed a new image-based rendering system that tackles the two most difficult problems of image-based modeling: pixel matching and

8

visibility handling. They first, introduced the joint view triangulation (JVT), a novel representation for pairs of images that handles the visibility and occlusion problems created by the parallaxes between the images. The JVT is built from matched planar patches regularized by local smooth constraints encoded by plane homographies. Then, an incremental edge-constrained construction algorithm and finally, a pseudopainters rendering algorithm for the JVT. They have demonstrated the performance of these methods experimentally. Arithmetic interpretation of projective scheme are given by Cutkosky et al., (2005). Grossmann et al., (2005), have presented a method to reconstruct a scene from one or more images that is rich in planes, alignments, symmetries, orthogonalities, and other forms of geometrical regularity. Given image points of interest and some geometric information, their method was able to recover least-squares estimates of the 3D points, camera position(s), orientation(s), and eventually calibration parameters. Recently, Quan et al., (2007), have proposed image based modeling using joint segmentation. The key advantage of their method was that it not only delivers the structure from motion in a robust manner for practical modeling purposes, but also provides a cloud of sufficiently dense 3D points that allow the objects to be explicitly modeled. They have validated it on real data set. Correspondence or matching being a sub problem of the reconstruction problem, a brief literature review of the same is presented here. Medioni et al., (1985), have proposed a segmentation based approach for stereo matching. Alvertos et al., (1989) have

9

given a methodology for determining relevant camera geometries for image matching in 3D machine vision. Barnard, (1989), has presented a stochastic stereo matching based on a scale. Nasrabadi et al., (1992), have proposed a neural network based stereo correspondence algorithm. They have used hopfield network in their approach. Sakaue, (1994), has presented a method for stereo matching using the combination of genetic algorithm and neural network. Loung et al., (1996), have presented an article on the theoretical algorithms for solving fundamental matrix. Also, they have presented stability analysis of the algorithms. Han et al., (2000) have presented a hybrid approach including an edge and regionbased matching method. Meerbergen et al., (2002), have presented a hierarchical symmetric stereo algorithm using dynamic programming. Chesi et al., (2002), have presented a method to estimate fundamental matrix using constraint least square. Using fundamental matrix, they have established the correspondence between images. Zuo et al., (2004), have proposed a hybrid matching algorithm based on the features of the images. They have used this algorithm for the pin picking task. Many researchers have concentrated their efforts on the use of point correspondences to estimate the depth of an object, Shirai et al., (1985), Faugeras et al., (1987), Blostein et al., (1987), Rodriguez et al., (1988), Faugeras et al., (1992), Xie et al., (1992), Rothwell et al., (1997), Hartley, (1997, 2000), Solem et al., (2004) and Vidal et al., (2006). Andersson et al., (1995) have presented a methodology for the reconstruction of points from noisy images formed by any number of cameras. In their

10

work they have assumed that (i) the location of the optical center and orientation of the axes of N cameras are known, (ii) N images of point P belonging to <3 have been identified and matched and (iii) the error of every image point is normally distributed with zero mean and known covariance matrix. Under the above assumptions they have computed the least square estimate of the point P and estimated its covariance matrix. Hartley et al., (2000) have presented an algorithm for computing projective structure from a set of six points seen in a sequence of many images. Solem et al., (2004), have presented a method to reconstruct surface models based on points, lines and curves reconstruction. The approach relies solely on images, and the output of the system is a geometric surface model consistent with the input images. Vidal et al., (2006), have proposed an approach for nonrigid shape and motion recovery from point correspondences in multiple perspective views. They have demonstrated the existence of intrinsic ambiguities in the reconstruction of camera translation, shape coefficients and shape bases. Other than points, efforts have also been made in the use of other features like lines, conics, general curves and surfaces for reconstruction. Xie et al., (1992) have reconstructed a line by obtaining two end points of the line from their corresponding pair of projections. This concept is based on triangulation. Hartley, (1995), has given a linear method for reconstruction of points and lines. He has discussed the basic role of trifocal tensor in scene reconstruction using line correspondences. Again,

11

Hartley, (1997), has given a fast algorithm for projective reconstruction of a scene, consisting of a set of lines from three or more images with uncalibrated cameras. The algorithm is rapid and quite reliable, provided the degree of error in the image-toimage correspondence is not excessive. Balasubramanian et al., (2000), have analyzed the performance analysis of the reconstruction of a line using stereoscopic projections. The reconstruction of a line in 3D space was based on the plane and line intersection method. Again, Balasubramanian et al., (2003), have studied the performance analysis in the reconstruction of a line in 3D from two arbitrary perspective views using two plane intersection method. More details about geometric intersection can be seen in Mount, (2004). Bartoli et al., (2004), have studied the problem of aligning two 3D line reconstruction in projective, affine and metric space based on their algebraic properties. They have introduced a 6 × 6 3D line motion matrix that acts on P lcker co − ordinates, then characterized its algebraic properties and its relation to the usual 4 × 4 point motion matrix, and proposed various methods to estimate 3D motion from line correspondences, based on cost functions defined in images or 3D space. Finally, they have analyzed the quality of different estimation methods using simulated data and real images. The next case of this literature review is the reconstruction of algebraic curves in 3D space from arbitrary perspective views. Berthilsson et al., (1997), have proposed an algorithm for reconstruction of general 3D curves from a number of 2D images taken by uncalibrated cameras. In this problem, no point correspondences between

12

the images are assumed. The curve and the view points are uniquely reconstructed and this method is called Affine shape methods. Again, Berthilsson et al., (2001) have suggested a method for the reconstruction of 3D curves from its 2D projections using factorization and bundle adjustment techniques. Here, they have provided the experimental verification for synthetic and real data. Kaminski et al., (2004), have introduced a number of results in the context of multi-view geometry from general algebraic curves. They have started with the recovery of camera geometry from matching curves. They have shown the computation of the homography induced by a single planar curve without any knowledge of camera. They derived the extended Kruppa’s equations which are responsible for describing the epipolar constraint of two projections of a general algebraic curve. Kpalma et al., (2006), have applied the multi-scale contour description in the problem of pattern recognition. Martinsson et al., (2007), have presented an algorithm for the reconstruction of 3D parametric curves, based on a fixed complexity model, embedded in an iterative framework of control point insertion. The successive increase of degrees of freedom provides for a good precision while avoiding to over-parameterize the model. The curve is reconstructed by adapting the projections of a 3D NURBS snake to the observed curves in a multi-view setting. Conics are widely accepted as one of the fundamental image features together with points and line segments. Conics have more compact and more global features than points and lines. Conics are invariant as are points and lines under projective

13

transformation. Rothwell et al., (1992) have developed a methodology to recover the position and orientation of a pair of known coplanar conics from a single perspective image. They have described and compared a number of methods for determining this relationship. In a method, they have used a simple four point back projection model, and in the other utilized transformation invariants. These results can be extended in a number of ways. Firstly, they can be applied to arbitrary plane curves using an invariant fitting technique. Secondly, the recovery methods are applicable to higher order algebraic curves. They have shown examples of pose determination for both synthetic data and real images of conic and non-conic objects. Xie et al., (1992) have given the analytical formulation for the reconstruction of quadratic curves, but did not determine the unique solution from the roots of a quadratic equation. In case of planar curves they have assumed the existence of point to point correspondence between the sets of contour points on the pair of projected curve segments. Ma, (1993), has proposed a method to obtain the closed form solution of correspondence problem and for global reconstruction of conics in 3D space. The limitation of this method was the reconstruction of planar curve only. Xie, (1994), has used the planarity constraint and formulations that have been given by Xie et al., (1992), to obtain the unique solution in case of conics, using point to point correspondence and used it to improve the quality of 3D data related to quadratic curves. Quan, (1996), has solved correspondence in the case of conics from two views. He has also solved the ambiguity (non-uniqueness) in the solutions with

14

the use of a non-transparency constraint of a conic section. Kahl et al., (1998), have shown how corresponding conics in two images can be used to estimate the epipolar geometry in terms of fundamental or essential matrix. Again, Kahl et al., (1999), have proposed several new methods for estimating structure and motion from points, lines and conics. They have tested their algorithm on both simulated and real data. Further they have made comparative studies with other methods. Balasubramanian et al., (2002), have presented a methodology for the reconstruction of quadratic curves in 3D space from arbitrary perspective views. They have used the intersection of a line and a conical surface to obtain the parameters of required curve in 3D. Shukla et al., (2005), have presented an approach to reconstruct the parabola in 3D space from arbitrary perspective images. They have analyzed the performance of reconstruction algorithm in the presence of Gaussian noise in image pixels. Camera calibration is an essential step in computer vision. It is very difficult to obtain 3D structure of an object without having the information about camera parameters. A lot of work has been done to solve the calibration problem using various techniques. These techniques are mainly classified into two classes: analytical and computational. The analytical methods are linear and Sobel et al., (1974), Faig et al., (1975), Paquette et al., (1990), and nonlinear methods , Wong et al., (1975), Gennery et al., (1979) and Okamoto et al., (1981). Later, Weng et al., (1992), have introduced a two step method. The first step

15

generates an approximate solution by using a linear approach, and it improves by using a nonlinear iterative process in the second step. The computational methods for camera calibration are based on soft computing techniques such as artificial neural network (ANN), genetic algorithm (GA) and so on. First, the camera calibration problem is converted into nonlinear optimization problem and then GA is applied on it. More details about nonlinear optimization problem can be seen in Joshi et al., (2004). A lot of work has been done using this pattern to solve calibration problem, Ahmed et al., (1999), Jun et al., (1999), Hati et al., (2001), Ji et al., (2001), Memon et al., (2003) and Xing et al., (2007). Heikkila et al., (1997), have introduced a four-step camera calibration process with implicit image correction. This method was the improved version of two step method. Quan, (1996), has proposed a self-calibration technique of an affine camera with the help of multiple images. Tsai, (1986), has proposed an efficient camera calibration technique for machine vision applications. Zhang, (2000), has presented a flexible approach for camera calibration procedure. Hati et al., (2001), have proposed a genetic algorithm based approach to determine the external parameters of the camera from the knowledge of a given set of points in object space. They have studied the effect of noise and presence of outliers, and also mismatch resulting from incorrect correspondences between the object space points and the image space points, on the estimation of three translation parameters and three rotational parameters of a camera.

16

Memon et al., (2003), have proposed an artificial neural network based approach to reconstruct 3D objects in real world from perspective images. The advantage of their method was the reconstruction without camera parameters knowledge. They have implemented their approach into robot navigation. Ahmed et al., (2003), have proposed a nice approach for camera calibration using neural network. Their calibrating network can reveal the perspective transformation matrix between the 3D points and the corresponding 2D image pixels. Xing et al., (2007), have proposed a method to improve the accuracy of neural network based calibration method. They claimed that their method can maintain the major advantages of linear methods and obtain improved accuracy without any complicated mathematical modeling process. The main contribution of their work was to introduce p-variance error (PVE) in the learning process of neural network. Some more advance applications of ANN can be seen in Ramamurthy et al., (2005, 2007). Stereo vision is an effective method to estimate depth and structural parameters of 3D objects from a pair of 2D images, Schalkoff (1989) and Shirai (1985). This technique has been extensively used for the purpose of object reconstruction, object recognition, estimation of motion parameters and structure, path planning etc. Barnard, (1982), has reported a survey on the stereo techniques. Hara et al., (2007), have presented a geometrical consistency based robust alignment between two dimensional and three dimensional space. Kanade et al., (1997), have presented an approach

17

related to virtual reality. They have reconstructed virtual world from the real images. Mulligan et al., (2002) have presented a real time reconstruction algorithm for the trinocular stereo. Hailin et al., (2005), have presented the problem of estimating the three-dimensional shape and complex appearance of a scene from a calibrated set of views under fixed illumination. Their approach was relied on a rank condition that must be satisfied when the scene exhibits specular and diffuse reflectance characteristics. This constraint is used to define a cost functional for the discrepancy between the measured images and those generated by the estimate of the scene, rather than attempting to match image-to-image directly. Lhuillier et al., (2005) have proposed a quasi-dense approach to 3D surface model acquisition from uncalibrated images. First, correspondence information and geometry are computed based on new quasi-dense point features that are re-sampled subpixel points from a disparity map. The quasi-dense approach has given more robust and accurate geometry estimations than the standard sparse approach. The robustness is measured as the success rate of full automatic geometry estimation with all involved parameters fixed. The accuracy is measured by a fast gauge-free uncertainty estimation algorithm. Second, The quasi-dense approach also works for more largely separated images than the sparse approach, therefore, it requires fewer images for modeling. More importantly, the quasi-dense approach delivers a high density of reconstructed 3D points on which a surface representation can be reconstructed.

18

Apart from stereo methods, shape from shading is also a popular technique among researchers to obtain 3D shape information of an object from a set of 2D images of object. Hence, A brief literature review is presented on some SFS techniques. In order to obtain the shape and the depth map of the 3D object, shape from shading techniques have been used by many researchers. They have considered perspective projections (two images with the relative camera orientations) as well as orthographic projections with a single or many light sources. Shape from shading (SFS) deals with the recovery of shape from a gradual variation of shading in the image. In SFS, it is important to study the process of formation of images. A simple model of image formation is the Lambertian model. According to Lambertian model, the gray level at a pixel in the image depends on the light source location and the surface normal. In SFS, given a gray level image the aim is to find the depth and orientation of the 3D object. There are mainly four approaches used in SFS viz. minimization, local, propagation and linear. The minimization approach is used by Frankot et al., (1987), in which they enforced integrability constraint. Bichsel et al., (1992), have developed an efficient propagation approach, which directly recovers depth and guarantees a continuous surface. Lee et al., (1991), have used local approach in which surface is approximated by spherical patches. Tsai et al., (1994) have used linear approach by linearizing Lambertian reflectance map in terms of depth (Z). Ikeuchi et al., (1981) have described an iterative method for computing shape from shading using occluding boundary and applications of this method have also

19

been illustrated. Lee et al., (1994) have found the solution of this problem using perspective projections and by using fixed camera with different light intensities. Saito et al., (1994) have proposed the use of genetic algorithm to extract the 3D shape recovery from shading image. Hougen et al., (1996) have dealt with the estimation of light source distribution and its use in surface shape estimation. Zhang et al., (1999) have given a brief review of the works on shape from shading problems. Recently, Gao et al., (2007), have presented an integrated algorithm for 3D reconstruction, which includes the estimation of parameters and shape from shading. The parameters, that are the illuminant direction and albedo, are estimated from the statistics of image intensities. The shape from shading algorithm is implemented with a minimum downhill principle on the Lambertian reflectance model. Bhargava et al., (2007), have used linear approach as Tsai et al., (1994) together with generalized Lambertian reflectance map instead of Lambertian reflectance map given by Oren et al., (1993). Some other algorithms on SFS are given by Daum et al., (1998), Horn, (1975). Integration of SFS and stereo has many advantages to reconstruct both the shape and reflectance properties of surfaces. Earlier, Bulthoff et al., (1988), have proposed an integration of SFS and stereo. In the 1990’s, several algorithms was proposed for the integration of SFS and stereo modules. Hougen et al., (1993), have presented the integration of SFS and stereo based on the light source distribution. Cho et al., (1997), have implemented the divide and conquer approach in SFS process. Fua et

20

al., (1995), have presented a method to reconstruct a surface using the combination of SFS and stereo. The integration was based on the multi image information. Pankanti et al., (1995), have integrated the stereo, shading, grouping and line labeling for the best possible reconstruction of the object. Bae et al., (2003) have used the integration of stereo matching and SFS for spatial object recognition. Birkbeck et al., (2006), have presented a variational method which is implemented as a PDE-driven surface evolution interleaved with reflectance estimation. The surface is represented on an adaptive mesh allowing topological change. To provide the input data, they have designed a capture setup that simultaneously acquires both viewpoint and light variation while minimizing self-shadowing. Recently, application of stereo vision becomes popular in image coding and security. Most of the image coding algorithm are based on a single image. Very few work has been done for the stereo image coding. Some image coding algorithms has been developed using digital watermarking, Djurovic et al., (2001), Liu et al., (2002), Ganic et al., (2005). Tripathi et al., (2006), have presented a novel approach of watermarking for digital images. They have transformed the original image into frequency domain by using discrete cosine transform and discrete Wavelet transform. Yu et al., (2006), have proposed a watermarking scheme for digital images using fractional Fourier transform (FrFT). The more details of FrFT can be seen in Ozaktas et al., (2001). Again, Tripathi et al., (2007), have proposed a quantization based blind watermarking scheme using discrete Wavelet transformation.

21

Here, we are also presenting some review on stereo image coding. Aydinoglu et al., (1995), have proposed a region-based stereo image coding algorithm. They have considered three types of regions: occlusion, edge and smooth regions. The nonoccluded region is segmented into edge and smooth regions. Each region is composed of fixed size blocks. The disparity for each block in a non-occluded region is estimated using a block-based approach. The estimated disparity field is encoded by employing a lossy residual uniform scalar quantizer and an adaptive arithmetic coder based on segmentation. Jiang et al., (1999), have proposed a wavelet based stereo image pair coding algorithm. The wavelet transform is used to decompose the image into an approximation and detail images. A new disparity estimation technique is developed for the estimation of the disparity field using both approximation and edge images. Duarte et al., (2002), have proposed an algorithm that relies on the matching of recurrent patterns. The input image is segmented into variable sized blocks and coded based on contraction, expansion and displacement of elements of a dictionary. The segmentation is ruled by the distortion criterion and dictionary is updated with the concatenation of previously coded elements. The main feature of their work was the absence of the disparity map in coding. Frajka et al., (2003), have proposed a progressive coding technique for the compression of stereo images. The main emphasis of their work was on the coding of residual image. Hwang et al., (2003), have proposed stereo image watermarking scheme using discrete cosine transform(DCT) and disparity map. A watermark image is embedded

22

into the right image of a stereo image pair in the frequency domain through the conventional DCT operation and then disparity information is extracted from the watermarked right image. Coltuc et al., (2007), have proposed a stereo embedding technique using reversible watermarking. Their scheme investigates the storage and bandwidth requirements reduction for stereo images. Instead of compression of stereo pair, they have relied on the embedding of reversible watermarking. Stereo vision plays an important role in the process of control and tracking of a robot manipulator. A comprehensive review is presented here based on this application of stereo vision. Early studies in this domain, in the 1970’s were mainly based on the heuristic approaches. More formalized approaches developed around 1982, with the choice and the extraction of the visual feature. Papanikalopoulos et. al., (1993), have presented an algorithm for robotic (eyein-hand configuration) real-time visual tracking of arbitrary 3D objects traveling at unknown velocities in 2D space (depth is known). They presented a mathematical formulation of a control problem that includes the sensory information of a visual sensor. Ghosh et al., (1995), introduced two module approaches to motion and shape estimation either by observing dynamically moving intensity or by observing dynamically moving feature points, lines or curves. When restricted to planar surface undergoing affine motion, the problem could be tackled by estimating an intermediate set of parameters known as essential parameters.

23

Rastogi et al., (1997) have described the design of a real-time camera based tracking systems for fast moving objects using image processing techniques. A tracking window is placed over the appropriate location of the target either automatically or by a human operator. The target image was segmented from the background, using a new adaptive co-operative segmentation technique that utilizes background histogram in the immediate vicinity of the target-image and edge strengths of the pixels in the tracking window. The segmented target image was then processed to estimate tracking-errors and computes a confidence measure. Malis et al., (1999), have proposed a new approach to vision based robot control which presents many advantages with respect to classical position-based and imagebased visual servoing. This new method does not need any 3D target model or a precise camera calibration and presents very interesting decoupling and stability properties. One of the drawbacks of their methods was that for a non planar target, at least eight points were necessary to estimate the homography matrix, while at least four points were theoretically needed in the other schemes. Another drawback was that their method was more sensitive to image noise than 2D visual servoing, since this scheme directly uses visual features as input of the control law, without any supplementary estimation step. Xiao et al., (2000), proposed multisensor based control strategy to enable the endeffector of a robot manipulator track along a class of constrained motions. Their task was to control the tip of a tool grasped by the end-effector of a robot to follow a curve

24

on an unknown surface. Sunita et al., (2003), have designed a controller based on artificial neural network for the visual servoing. The task under consideration was to intercept a moving object by a robot manipulator with the help of two fixed cameras mounted in the workspace. Murugesan et al., (2004), have proposed a numerical solution of robot arm control problem using RK-Butcher algorithm. Sukavanam et al., (2007), have proposed a neural network based controller for the visual servoing of robot eye system. More review on visual servoing can be seen in Sunita, (2005) Ph.D thesis.

1.5

Outline of the Thesis

Chapter 2 contains some necessary concepts, definitions and algorithms from stereo reconstruction, camera calibration, artificial neural network, robot kinematics and other related areas that will be used in subsequent chapters. In chapter 3, a novel approach is introduced for reconstruction of algebraic curve in 3D space from arbitrary perspective views. This chapter also contains the estimation of error in this reconstruction approach. Simulation results are presented to evaluate and demonstrate reconstruction methodology using synthetic as well as real data. In chapter 4, the camera calibration problem is modeled as a nonlinear optimization problem and solved using a Laplace crossover and power mutation (LX-PM) based real coded GA. The results obtained from GA are used as seed of the weight vectors of feed-forward neural network. It can be seen from the simulation results that the proposed hybridization of GA and ANN is more accurate and robust to solve

25

the camera calibration problem. In chapter 5, a neural network based integration of SFS and stereo has been proposed. The results have been shown the performance of proposed scheme in the integration of SFS and stereo vision modules. In chapter 6, an application of 3D reconstruction in stereo image coding via digital watermarking is presented. The effects of various watermark attacks have been studied for analyzing the robustness of proposed algorithm. In chapter 7, an application of stereo vision in robot control is presented. Inverse kinematics has been solved for a redundant manipulator for tracking the resultant path. The stability analysis has been done for the proposed algorithm using Lyapunov method. The result has been illustrated through simulation of inverse kinematics solution for a seven arm redundant manipulator. In the next chapter, some necessary concepts, definitions and algorithms are presented that will be used in subsequent chapters.

26

Chapter 2 PRELIMINARIES Acquisition of three-dimensional (3D) information of a real world scene from twodimensional images has been one of the most important issues in computer vision in the last two decades. Non-contact range acquisition techniques are essentially classified into two categories: passive and active methods. The first one is generally based on solving an inverse problem of the process of projecting a 3D scene onto a 2D image plane and has an advantage that 3D information can be obtained without affecting the scene. The second one is accomplished by emitting radio or light energy from a source and receiving their reflections. Passive range sensing techniques are often referred to as shape from x, where x is one of the visual clues such as shading, texture, contour, focus, stereo, motion, and so on. As mentioned above, these type of techniques require solving inverse problems of image formation process. Some may be referred to optical or photometric inverse problems and other as geometric inverse problems. The passive techniques are roughly classified into two approaches: One is based on solving photometric inverse 27

28

problems using one or more images taken mainly from a single viewpoint like shape from shading and the other one is based on solving geometric inverse problems using multiple images taken from different viewpoints like shape from stereo. In this chapter, we introduce some necessary concepts, definitions and models from passive methods (stereo and shading), artificial neural network and genetic algorithm which will be used in subsequent chapters.

2.1

Binocular Stereo W(xw, yw,zw)

wl

(x1, y1) left image

f1

wr right image

(x2, y2)

epipolar line

f2

I

O1 centre of projection

baseline

O2 centre of projection

Figure 2.1: Binocular stereo image formation Binocular stereo or two views stereo imitates the human stereo vision and is a typical non-contact passive range sensing method. A pair of images of a 3D object (scene) are obtained from two different viewpoints under perspective projection as illustrated in figure 2.1. To obtain a 3D object from these perspective images, a distance measurement (depth) from a known reference coordinate system is computed

29

based on triangulation. The main problem in binocular stereo is to find the correspondences in stereo pair called the stereo correspondence problem. In this section, some concepts (camera model, epipolar geometry, rectification and disparity map) related to stereo correspondence problem have been introduced.

2.1.1

Correspondence Analysis

Correspondence analysis tries to solve the problem of finding which pixels or objects in one image correspond to which pixels or objects in the other. The algorithms can roughly be divided into feature based and area based, also known as region based or intensity based. Area based algorithms solve the correspondence problem for every single pixel in the image. Therefore these algorithms take color values and/or intensities into account as well as a certain pixel neighborhood. A block consisting of the middle pixel and its surrounding neighbors will then be matched to the best corresponding block in the second image. These algorithms result in dense depth values as the depth is known for each pixel. However selecting the right block size is difficult because a small neighborhood will lead to less correct maps but short run times whereas a large neighborhood leads to more exact maps at the expense of long run times. On the other hand, feature based correspondence algorithms extract the features from first image and then try to detect these features in the second image. These features should be unique within the images, like edges, corners, geometric figures, hole objects or part of objects. The resulting maps will be less detailed as the depth is

30

not calculated for every pixel. There is less possibility to match a feature incorrectly due to its detailed description, so feature based algorithms are less error sensitive and result in very exact depth values. Besides the major correspondence algorithms, area based and feature based, there are also phase based algorithms which transform the images using fast Fourier transformation (FFT). The depth is therefore proportional to the phase displacement. Wavelet based algorithms are subcategories of phase based algorithms and use wavelet transformation. There are a number of subproblems of the correspondence problem. An object seen by one of the camera could be occluded in the other camera that has a slightly different point of view. This object will cause wrong correspondences when trying to match images. The cameras itself may cause distorted images due to lens distortion which will lead to wrong correspondences especially in the outer regions of the image. Some more problems are caused by the objects themselves. Due to more number of small objects or a special pattern that repeats, quite often makes it hard to find the matching object as there are more than one possible match. This is known as the aperture problem. Another big problem is homogeneity. Large homogeneous regions are difficult to match when seen through a small window. The same textures on different positions in the image will cause similar problems. There are a number of constraints which ease the corresponding problem and improve the results like similarity, uniqueness, continuity and ordering. Another possibility to improve these

31

correspondence algorithms are some pre-processing steps like pre-reduction of noise with a low-pass filter, the adjustment of different illuminations or a white balance of each camera but the most effective pre-processing step is the calibration of the cameras and the use of epipolar constraint. These two constraints are described briefly in preceding subsections.

2.1.2

Epipolar Geometry

With two cameras arranged arbitrarily, the general epipolar geometry is shown in figure 2.2(a). The relative position of both the cameras are known as the optical centers C1 and C2 of each camera. The straight line that connects both the optical centers, is called baseline. Each point W , observed by two cameras at the same time along with the two corresponding light rays through the optical centers C1 and C2 , form an epipolar plane. The epipole e is the intersection of baseline with the image plane. The epipolar line is therefore defined as a straight line g through e and w which is intersection of the line through W and the optical center with the respective image plane. The point W in figure 2.2(b) is projected as wl in the left image plane. The corresponding point in the right image therefore lies on the previously described epipolar line g. This reduces the search space from two dimensional, which would be the whole image, to one dimensional, a straight line only. A simplification of the general epipolar geometry is shown in figure 2.4. Both cameras are arranged in parallel, their focal lengths are identical and the two retinal planes are the same. Assuming these conditions, all epipolar lines are horizontal

32

within the retinal planes and the projected images wl and wr of a point W will have

Figure 2.2: Epipolar geometry: (a) General, (b) Standard

the same vertical coordinate. Therefore the corresponding point of wl lies on the same horizontal line in the right image. According to the stereo epipolar geometry, the disparity as seen in figure 2.3(b) is defined as d = C2 − C1 . The depth Z therefore

33

is calculated by triangulation Z=b

f d

(2.1.1)

where b is the distance between the two optical centers and f is the focal length. A disparity of zero indicates that the depth of the appropriate point equals to infinity. In order to assure the stereo epipolar geometry, the rectification of both images is necessary. Therefore both cameras need to be calibrated first in order to get the camera parameters which are needed for the rectification. The next section will deal with the problem of camera calibration and image rectification.

2.1.3

The Pinhole Camera Model

Every camera maps the points of the three dimensional environment to a two dimensional image. The simplest camera model that models this mapping is the pinhole camera model. As shown in figure 2.3(a), the pinhole camera model consists of two screens. Re is the retinal plane where the two dimensional image is formed, F is the focal plane with the optical center C in the middle. Both these planes are parallel at certain distance f which is the focal length. The straight line joining the point W of the 3D world and C is called the optical axis. Via perspective projection, this point W is mapped onto the two dimensional image. Since the focal plane is parallel to the retinal plane, the points that lie on focal plane have no image on the retinal plane.

34

Figure 2.3: (a) Pinhole Camera Model, (b) Disparity

2.1.4

Camera Parameters

In order to transform a point of the 3D world into a 2D point of the image plane the knowledge of special camera parameters is necessary. There are two kinds of camera parameters. The intrinsic or internal parameters which describe the internal geometric and optical characteristics of the camera, and the extrinsic or external parameters defining the position and orientation of the camera in a world reference system.

35

Figure 2.4: Intrinsic and Extrinsic parameters of the Camera

As seen in figure 2.4, the system for modeling two or more cameras consists of three different coordinate systems, the world reference frame (xw , yw , zw ), the camera frame (xc , yc , zc ) with the optical center as origin and the image frame (X, Y). A three dimensional point, given in homogeneous world coordinates, can be converted into the camera frame, by a rotation rij and a translation tj which is expressed by the extrinsic parameters as



xc





xw





     yc  = Tl  yw  ,     zw zc

where

r11 r12 r13 t1



   Tl =  r r r t  21 22 23 2  . (2.1.2) r31 r32 r33 t3

Then this point is converted to the two dimensional image plane using the intrinsic parameters. These are in particular the focal length f , the principle point (u0 , v0 ), which is the center of the image plane, and (k0 , k1 ) the pixel size in mm or α = f / k0

36

and β = f / k1 . The transformation using the intrinsic parameters is as follows:       X xc α 0 u0        Y  = Ti  y c  ,  where Ti =  (2.1.3)      0 β v0  Z zc 0 0 1 Since (X, Y, Z) is homogeneous, all the three variables are divided by Z in order to get the pixel coordinates X 0 and Y 0 . Points on the focal plane, where Z = 0 and zc = 0 respectively, can not be transformed to image plane coordinates because division by zero is not defined and the straight line joining this point and the optical center does not intersect with the image plane as it is parallel to the image plane. In summary, a point given in world coordinates is transformed onto a two dimensional image plane using the following equation:    xw  X    yw  Y  = Ti Tl      zw  Z 1

      

(2.1.4)

knowledge of intrinsic and extrinsic parameters of the camera allows for the rectification of images and ensures the epipolar constraint. The calculation of these parameters is the aim of camera calibration and will be discussed in chapter 4.

2.1.5

Epipolar Constraints

Knowledge of intrinsic and extrinsic parameters of each camera separately is sufficient to undistort the appropriate images but it is not sufficient to assure the epipolar constraint. The epipolar constraint ensures that the epipolar lines coincide with the horizontal scan lines, and therefore corresponding points in both the images are only

37

horizontally shifted, which reduces the search space from two-dimensional to onedimensional. Given two cameras, every point W = [xw yw zw 1]T of the world reference frame can be projected to the appropriate image frame (wl and wr ) using the transformation matrix P = Ti Tl as known from section 2.1.4 as follows: wr = Pr W,

wl = Pl W

(2.1.5)

In order to rectify the images according to the epipolar constraint these projection matrices Pl and Pr have to be adapted the following special conditions: • both camera systems need to have equal focal length • both camera systems need to have the same focal plane • the optical centers need to be constant • correspondence of the vertical coordinate of each point in the left and the right images need to be same. Using these and further conditions, Pl and Pr can be calculated and the taken images can be transformed according to the epipolar constraint.

2.1.6

Rectification Process

Given a pair of stereo images, rectification determines a transformation of each image plane such that the pairs of conjugate epipolar lines become collinear and parallel to one of the image axis (usually the horizontal one). The rectified images can be

38

thought of as acquired from a new stereo rig, through rotating the original cameras. Computation of correspondence is the main advantage of rectification. This process is very simple with the use of rectification. We assume that the stereo rig is calibrated, i.e. the old perspective projection matrices (PPMs) Pol and Por are known. The idea behind the rectification is to define two new perspective matrices Pnl and Pnr , which preserve the optical centers and with the baseline contained in the focal planes. This ensures that the epipoles are at infinity and hence these epipolar lines are parallel. In addition, to have a proper rectification, it is required that the epipolar lines are horizontal and the corresponding points have the same vertical coordinates. In this section, analytical requirements and formulations are given for the rectification process. The new PPMs will have the same orientation but different positions. Positions are same as the old cameras, whereas the orientation changes because we rotate both the cameras around the optical centers in such a way that the focal plane becomes coplanar and contain the baseline. The intrinsic parameters of both the cameras will be same. The new PPMs in terms of their factorization in the intrinsic and extrinsic parameter matrices are given as

Pnl = A[R| − RC1 ],

Pnr = A[R| − RC2 ]

(2.1.6)

where C1 and C2 are the old optical centers. The rotation matrix R is same for both the PPMs, and is computed as detailed below. The intrinsic parameter matrix A is

39

also same for both PPMs . We will specify R by means of its row vectors   T r  1  T  R=  r2  r3T

(2.1.7)

which are x, y and z axis respectively of the camera standard reference frame, expressed in world coordinates. The algorithm for detailed rectification process is explained as follows: 1. The new x axis is parallel to baseline: r1 =

(C1 −C2 ) . kC1 −C2 k

2. The new y axis is orthogonal to x axis and to k: r2 = k ∧ r1 . 3. The new z axis is orthogonal to xy plane: r3 = r1 ∧ r2 where k is an arbitrary unit vector, that fixes the position of the new y axis in the plane orthogonal to x.

2.1.7

Disparity Map

Once both the stereo images are rectified, the next step is to find the correspondence between them. For this, define similarity or difference measures between the left and right images. Typical measures for area-based are the sum of square differences (SSD), the sum of absolute differences (SAD), the normalized cross correlation (NCC), and the census. These measures of standard geometry are defined as follows. SSD(d) =

XX i

SAD(d) =

[Il (x + d + i, y + j) − Ir (x + i, y + j)]2

XX i

(2.1.8)

j

j

|Il (x + d + i, y + j) − Ir (x + i, y + j)|

(2.1.9)

40 P P C(Il , Ir ) − i j µl µr P P N CC(d) = i j σl σr

(2.1.10)

where Il and Ir represent the left and right images of stereo pair and d denotes the disparity at a point (x, y) in the right image. µl and µr represent the mean intensities in the corresponding windows of the left and right images respectively, and σl and σr are standard deviations in the windows respectively. C(Il , Ir ) is a cross correlation between the corresponding windows: C(Il , Ir ) =

XX i

Il (x + d + i, y + j)Ir (x + i, y + j)

(2.1.11)

j

The disparity d can be found by minimizing the difference measures or by maximizing the similarity measures but due to ill-posed nature of the problem one can not find the unique correspondence.

2.2

Shape from Shading

Surfaces are bright or dark for two main reasons: their albedo and the amount of light they are receiving. A model obtaining the brightness of a surface is usually called a shading model. To recover the information about 3D structure of the surface from the shading model is called shape from shading (SFS). It has been discussed in a wide variety of context since 1975. In the ideal case, the simplest shape from shading problem is formulated as: I(x) = ρIs cosφ(x)

(2.2.1)

where x denotes a point in the image, I(x) the reflected light intensity observed at x, Is the illumination intensity, ρ the constant albedo on the surface, and φ(x) the

41

angle between the light source direction and the surface normal at the 3D point on the object surface corresponding to x. A model is shown in figure 2.5 for SFS process. It is well known that φ(x) can be calculated for a given ρIs . Thus, the shape of the object surface can be reconstructed from φ(x) by combining additional constraints such as photometric stereo, smoothness assumption on the surface and so on.

Figure 2.5: A Simple Shape from Shading Process Integration of SFS and stereo has many advantages to reconstruct both the shape and the reflectance properties of surfaces. From 1990, many researchers are working on the problem of the integration of SFS and stereo.

2.3

Artificial Neural networks

Artificial neural networks (ANN) are relatively crude electronic models based on the neural structure of the brain. The brain basically learns from experience. It is natural proof that some problems that are beyond the scope of current computers are indeed solvable by human brain due to its vast network of computing elements.

42

This brain modeling also promises a less technical way to develop machine solutions. This new approach for computing also provides a more graceful degradation during system overload than its more traditional counterparts. We may offer the following definition of a neural network. An artificial neural network is a massively parallel-distributed processor made up of simple processing units called neurons, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects:

1. The network from its environment through a teaching/learning process acquires knowledge.

2. Inter-neuron connection strengths, known as synaptic weights are used to store the acquired knowledge.

The procedure used to perform the teaching process is called teaching algorithm or learning algorithm, the function of which is to modify the synaptic weights of the network in an orderly fashion to attain a desired design objective. A simple mathematical model of the basic processing unit of an artificial neural network is depicted in figure 2.6. Neural Networks are also referred in literature as neurocomputers, connectionist neural networks, parallel-distributed processors, etc.

43

x1 x2 x3

. . .

w2

w1

Sun Transfer

w3

Output Path

wn

Processing Element

xn Input xn

Weights

wn

Figure 2.6: A simple processing unit (Neuron).

2.3.1

Characteristics of Neural Networks

The application of neural networks possesses the following attractive properties and capabilities.

1. Nonlinearity: An artificial neural network can be linear or nonlinear. A neural network, composed of an interconnection of nonlinear neurons, is very much suitable for nonlinear modeling of various processes or systems.

2. Adaptive Learning: Neural networks have a built-in capability to adapt their synaptic weights to change in the surrounding environment. The paradigm of modification of synaptic weights is called learning. The learning can be either

44

supervised or unsupervised. Due to inherent flexibility in the architecture of neural network, the network can be designed to change its synaptic weights in real time even when operating in non-stationary environment.

3. Self-Organization: An artificial neural network can create its own organization or representation of information that receives during learning time. This property adds more robustness to neural network design operating under changing environment.

4. Real Time Operation: Artificial neural network computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.

2.3.2

Functional Approximation Capabilities of Neural Networks

The following theorem by Kolmogorov [70] is the basis for approximation capabilities of neural network for continuous functions. Theorem 2.1: Any continuous real-valued functions f (x1 , x2 , ..., xn ) defined on [0, 1]n , n ≥ 2 can be represented in the form

f (x1 , x2 , ..., xn ) =

2n+1 X j=1

gj

" n X

# φij (xi )

i=1

where gj terms are properly chosen continuous functions of one variable, and φij functions are continuous monotonically increasing functions independent of f .

45

In other words, the above theorem states that one can express a continuous multivariate function on a compact set in terms of sums and compositions of a finite number of single variable functions. Cybenko [70] stated more significant features of approximation capabilities of neural networks in the following theorem. h Theorem 2.2: Let φ be any continuous sigmoid-type function e.g., φ(ξ) =

1 (1+e−ξ )

Then, given any continuous real-valued function f on [0, 1]n (or any other compact subset of 0, there exist vectors w1 , w2 , ..., wN , α, and θ and a parameterized function G( · , w, α, θ) : [0, 1]n → < such that | G(x, w, α, θ − f (x) | < ε for all x ∈ [0, 1]n where

G(x, w, α, θ) =

N X

αj φ(wjT x + θj )

j=1

and wj ∈
2.3.3

Feed-Forward Neural Networks

A two-layer feed-forward neural network (FFNN) with n input units, m output units and N units in the hidden layer, is shown in the figure 2.7. The output vector y is determined in terms of the input vector x by the formula yi =

N X j=1

" wij σ

Ã

n X k=1

! vjk xk + θvj

# + θwi

; i = 1, · · · , m

(2.3.1)

i .

46

θ v1

V

T

σ (.)

1

WT

θ w1

x1

y1 2 σ (.)

x2

• •

xn

• •





θ wm

• • θ vN



ym N

σ (.)

OUTPUTS

INPUTS HIDDEN LAYER Figure 2.7: Model of a feed-forward neural network.

where σ(.) are the activation functions of the neurons of the hidden-layer. The inputs to hidden layer interconnection weights are denoted by vjk and the hidden-layer-tooutputs interconnection weights by wij . The bias weights are denoted by θvj and θwi . There are many classes of activation functions e.g. sigmoid, hyperbolic tangent and Gaussian. The sigmoid activation function used in our work, is given by

σ(x) =

1 1 + e−x

(2.3.2)

47

By collecting all the NN weights vjk and wij into matrices of weights V T and W T , we can write the ANN equation in terms of vectors as

y = W T σ(V T x)

(2.3.3)

with the vector of activation functions defined by σ(z) = [σ(z1 ) · · · σ(zn )]T for a vector z ∈
2.3.4

Function Approximation Property

Let f (x) be a smooth function from

reconstruction of 3-d objects from 2-d views: simulation ...

Mar 10, 2008 - tion, camera calibration, artificial neural network (ANN) and genetic algorithm ...... industrial manufacturing, robot navigation, sports (hawk-eye) and so on. .... have given a brief review of collection of eleven papers published ...

6MB Sizes 6 Downloads 140 Views

Recommend Documents

reconstruction of 3-d objects from 2-d views: simulation ...
Mar 10, 2008 - an algorithm for reconstruction of general 3D curves from a number of 2D images taken by .... They have reconstructed virtual world from the real images. ...... 800. 900. 1000. 1100. 1200. 1900. 2000. 2100. 2200. 2300. 2400.

Methodology for 3D reconstruction of objects for ...
return high quality virtual objects. Based on this ... A better solution is the usage of a cloud of web service solution. A review of ... ARC3D (Vergauwen, 2006) is a free web service that provides a standalone software application for uploading.

CT reconstruction from parallel and fan-beam projections by 2D ...
Sep 3, 2006 - for next-generation real-time imaging systems. Several fast .... an analytic expression for its projections can be dervied. Let f(x, y) be the density ...

Complex 3D General Object Reconstruction from Line ...
object found is the pyramid, and the algorithm does not add ... object (cube and pyramid in this example). .... The following scheme can solve this problem.

2D and 3D Caging-Based Grasping of Objects in ...
parts of a robot hand and simultaneously grasping by soft parts ... analysis. Caging is another method to constrain objects. It makes an object inescapable from ...

WrightEagle2008 Simulation 2D Team Description ...
We take simulation 2D as a typical problem of multi-agent systems, and our ... of decision-making at the end of the cycle, so he will do nothing in the current .... considered as a potentially good forward and given a high evaluation value. In.

2D and 3D Course Outline.pdf
solid models and animations in 3D by the help of Auto-Cad and. 3D Max. Auto-Cad (2D and 3D). With auto-cad, participants will learn to do 2D drafting, archi- tectural drawings and 3D modeling of architectural and. mechanical objects like buildings, m

HELIOS2007 Team Description - WrightEagle 2D Soccer Simulation ...
tool. Our positioning mechanism has many advantages than previous methods. 1 Introduction .... training process and enables us to edit the training data easily.

HELIOS2008 Team Description - WrightEagle 2D Soccer Simulation ...
and can be adjusted by human's intuitive operations using GUI tools. .... BI| = n2. 3.3 Feature of Basic Model. The basic model can perform with light-weight ...

Skewed mirror symmetry in the 3D reconstruction of ...
Feb 7, 2003 - method of determination. Keywords. Planes of Symmetry. 3D Reconstruction. Mirror Symmetry. Skewed facial-symmetry. Axis of Symmetry. Sketch. Input. ...... Workshop on Geometric Modeling and Computer. Graphics, 2000. [Var00c] Varley P. A

Reconstruction of Freeform Objects with Arbitrary ...
This means that jik k. , ò". -Ç. -Ç. - vp vp vp j i . We pick a point q on the line segment j vp, and write it as. 1. 0,). 1(. ÇÇ. -+. = u u u j v p q. , In order to proceed, we need to prove that there is no site k v closer to point q (other th

Absolute 3D reconstruction of thin films topography in ...
stricted to chips made of PDMS and glass, and for other types of chips, one ... Data normalisation along eqn (1) allows correction of background noise and ... cients of the sample. At the interface between two semi- infinite media i and j of index ni

Traceable 2D finite-element simulation of the ...
fields of the whispering-gallery (WG) modes of axisymmetric ... ters from experimental data is a lack of traceability: significant ..... post-processing features.

Fabrication of 2D and 3D Electromagnetic ...
Resonance occurs near telecommunications frequencies (193 THz ≃ 1.55 ñm). Figure 4 shows the plot of inner radius versus frequency for split ring resonators ...

Traceable 2D finite-element simulation of the whispering-gallery ...
Abstract—This paper explains how a popular, commercially- available software package for solving partial-differential- equations (PDEs), as based on the finite-element method (FEM), can be configured to calculate, efficiently, the frequencies and f

CREATE BASIC 3D OBJECTS - By www.EasyEngineering.net.pdf ...
... Home tab ➤ Modeling panel ➤ Solid Primitives. drop-down ➤ Box. Lesson 1: Create 3D Solid Primitives | 3. Visit : www.Easyengineering.net. Visit : www.Easyengineering.net. Page 3 of 13. CREATE BASIC 3D OBJECTS - By www.EasyEngineering.net.pd

Research on RoboCup Simulation 3D - Chaosscripting
behavior, strategy acquisition, learning, real-time planning, multi-agent systems, ... soccer simulation league takes a big step forward to the RoboCup's ultimate.

Research on RoboCup Simulation 3D - Chaosscripting
Such areas include: real-time sensor fusion, reactive behavior, strategy ... soccer simulation league takes a big step forward to the RoboCup's ultimate .... Kourogi, M., Kurata, T.: A method of personal positioning based on sensor data fusion of ...

3d modeling of close-range objects: photogrammetry or ...
Compare range data and image-based approach concerning the correct modeling ... 3D point cloud obtained with forward intersection (smoothed results) ... Possible solutions: higher scan resolution and use of artificial targets, in order to help.

Retrieval of 3D Articulated Objects using a graph-based representation
called protrusion function, pf() [APP. ∗ ... the minimum cost paths between the representative salient .... The optimal cost of the optimization process is the EMD.

Numerical Simulation of 3D EM Borehole ...
exponential convergence rates in terms of a user prescribed quantity of interest against the ... interpolation operator Π defined in Demkowicz and Buffa (2005) for.

Simulation of 3D neuro-musculo-skeletal systems ... - Semantic Scholar
between them, is taken into account to determine the action of the forces generated in ... A graphics-based software system to develop and analyze models.

Simulation of 3D neuro-musculo-skeletal systems with contact
Simulation of 3D neuro-musculo-skeletal systems with contact. Dinesh K. Pai ... contact: [email protected] ... Virtual muscle: A computational approach to un-.