Applicant’s name: Zhen, Yi Proposed Topic/Title of Research: Binocular Stereo Vision Based Gesture Segmentation Background: Gesture recognition and gesture segmentation Nowadays, hand gesture recognition has become an important part of human-computer interaction (HCI) and computer virtual reality. Various applications with gesture recognition could be found in all fields as follows: the body sense recreation equipment [1]; robots manipulated by human’s gestures [2]; a gesture detection system for the deaf mutes communication. At present, the methods to recognize gestures are mainly classified into two categories: “data gloves” based and visual based. Compared to expensive and inconvenient data glove, visual based recognition merely uses one or more cameras [3] to realize HCI. Besides, the artificial vision system can realize recognizing gestures or tracking hand motion. The most critical part in visual based gesture recognition is gesture segmentation [4]. Gesture segmentation refers to locate and separate ROI (region of interest), namely users’ gestures or the region of the gestures, from one picture or video. And, only in the basis of locating and tracking gestures correctly can feature extraction and recognition be completed accurately.

Stereoscopic vision based gesture segmentation Stereoscopic vision allows representation of objects in three spatial dimensions, width, height and depth. Compared to two dimension vision, it adds perception of depth to offer complete image information. To avoid the loss of image information and detrimental effect of visual angle, stereoscopic vision based gesture segmentation has been put forward to replace image appearance based segmentation. Stereoscopic vision is able to restore some gesture image information (e.g. finger joint angle, palm position) which is neglected through appearance segmentation methods. In addition, stereoscopic vision mainly studies the way to collect the distance (depth) information of objects in the scene from multiple images depending on imaging technology. Therefore, stereoscopic vision is naturally introduced in gesture segmentation, and is much better applied in gesture segmentation and recognition. Brief Literature Review: In recent ten years, the stereo vision technology has just been introduced into gesture segmentation for high computational complexity. With high-speed developing of integrated circuit, hardware processing technology and computer vision, stereo vision can be implemented in low-cost and real-time gesture recognition. Michael Van den Bergh and Luc Van Gool [5] applied a TOF camera and an ordinary RGB camera in gesture segmentation. The TOF camera and the RGB camera were respectively used to get depth information and color information of each pixel in the scene image. And then, they located the position of hand through combining the

information. Mohamad El-Jaber, Khaled Assaleh and Tamer Shanableh [6] developed a binocular stereo vision system with two Bumblebee XB3 cameras, used K-means clustering method to segment body of user with the depth feature of each pixel in scene image. After that, they separated gesture according to frame difference method. Jamie Shotton Andrew, Fitzgibbon Mat and Cook Toby [7] extracted depth information feature of neighborhood pixels via depth cameras based on “Prime Sense” light encoded depth imaging technology. And, they adopted random forest to classify and recognize each joint point of body. Finally they located hand clearly and accurately. The method was also adopted by Microsoft’s body sense recreation equipment Kinect attached by XBOX360. Motivated by the above research and works, some melioration which improves accuracy and robustness of gesture segmentation via binocular stereo vision should be proposed and investigated to solve the following problems: i. The limited accuracy in computing depth which is dependent on triangulation survey with pixel parallax; ii. The large computational stereo matching algorithm which is easily affected by optical distortion and noise, specula reflection of smooth surface, perspective distortion, low and repetitive texture, overlap and discontinuity, etc. ; iii. The weak robustness in locating and detecting ROI by using depth information image. Methodology: The gesture segmentation method via binocular stereo vision (BSV) consists of two processes: generating depth image by BSV; gesture segmentation by using depth information of each pixel in depth image.

i. The process of generating depth image by BSV includes five steps: Step1 Binocular stereo calibration: it is a process to compute the geometrical relationship (rotation and translation parameters, RTP) between the space positions of two cameras. Using the checkerboard, RTP between the two cameras can be computed by the RTP calculated in each single camera calibration. Considering to the image noise, the mid-value is chosen as the initial value of RTP. After that, the minimum projection error of the checkerboard corner is calculated by the Levenberg-Marquardt iterative algorithm, and RTP are returned [8-9]. Step2 Binocular stereo correction: it is a process to repeat two cameras ’projection, keep them in the same plane accurately, and align two projection images. Step3 Binocular stereo matching: it is the most crucial steps in BSV that aims to determine the relationship between corresponding points in two vision images. And the parallax of pixels can be calculated by real-time computation. Improving the stereo matching process will be specified in continued research. Step4 Refining parallax: it is to refine the parallax image processed by stereo matching. Step5 Triangle computation: a process of computing the depth of each pixel with the parallax and length of base-line by using similar triangles method. In the proposed method, stereo matching step will adopt Shiftable Windows [10] as the

support window. For each pixel, it uses 9 different windows for match-cost computing and selects the lowest match-cost window as the support window. Then, Box-Filtering optimizing strategy is utilized to reduce the direct match-cost computation. ii. The process of gesture segmentation In the research, gesture segmentation is achieved by using depth information, extracting certain features, and selecting suitable classifier (Boosting classifier) [11-14]. In the segmentation, extracted features should conform to the following constrains: a) scale invariant; b) rotate invariant and shift invariant; c) a certain degree of illumination change invariant; d) a certain degree of perspective change invariant; All above 4 properties help reduce interference from noise, overlap and complex background when extracting image features and thus improve gesture segmentation performance. While getting the depth information, the research choose a proposed image depth feature fθ(I,x) [15 ] ,

Notation: I denotes Image; x denotes a pixel in I; parameter θ= (u, v) describes offset of u and v. 1/dI(x) is to normalize u and v; it will ensure that the feature is depth invariant. Outcomes and Value: The proposed method is indicated to get a good trade-off between speed and effectiveness of stereo matching algorithms by making some improvements on the existing ones. Furthermore, the improved algorithm should run fast enough to satisfy real-time gesture application scenes with good performance. Specifically, running the algorithm with the image dataset provided by Middlebury College, the error rate should be kept blow 10%. And, a Pentium Dual-Core E5400 machine with 2G memory could process more than 20 frames per seconds. The processing frames' size is 320*240, and the radius of search area is 64 pixels. Based on the improved stereo matching algorithm, the research implements a new gesture segmentation algorithm which combines both luminance and depth information. The expected accuracies of hand detection and hand location would be greater than 90%. References:

[1] Heng-Tze Cheng, An Mei Chen, Ashu Razdan, Elliot Bulle, "Contactless Gesture Recognition System Using Proximity Sensors", icce, p.149 - 150, Consumer Electronics (ICCE), 2011 IEEE International Conference, 2011 [2] Jagdish Lal Raheja, Radhey Shyam, Umesh Kumar, P. Bhanu Prasad, "Real-Time

Robotic Hand Control Using Hand Gestures," icmlc, p.12-16, 2010 Second International Conference on Machine Learning and Computing, 2010 [3] F. Quek, “Towards a Vision Based Hand Gesture Interface”, p. 17-31, in Proceedings of Virtual Reality Software and Technology, Singapore, 1994. [4] Cao Xin-yan, Liu Hong-fei, Zou Ying-yong, "Gesture Segmentation Based on Monocular Vision Using Skin Color and Motion Cues", IASP, p.358-362,

Image

Analysis and Signal Processing 2010 International Conference, 2010 [5] Michael Van den Bergh, Luc Van Gool, "Combining RGB and ToF Cameras for Real-time 3D Hand Gesture Interaction", wacv, p.66-72, Applications of Computer Vision, 2011 [6] Mohamad El-Jaber, Khaled Assaleh, Tamer Shanableh, "ENHANCED USER-DEPENDENT RECOGNITION OF ARABIC SIGN LANGUAGE VIA DISPARITY IMAGES", isma(10):1-4, Proceeding of the 7th International Symposium on Mechatronics and its Applications, 2010 [7] Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake, "Real-Time Human Pose Recognition in Parts from Single Depth Images", Microsoft Research Cambridge & Xbox Incubation, CVPR2011 Best Paper [8] R.Y.Tsai, "A versatile camera calibration technique for high accuracy 3D machine vision metrology using off-the-shelf TV camerae and lenses," IEEE Journal of Robotics and Automation 3(1987):323-344 [9] W. Ouyang, F. Tombari, S. Mattoccia, L. Di Stefano, W. Cham, "Performance evaluation of full-search equivalent pattern matching algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2011 [10] D. Scharstein and R. Szeliski, Taxonomy and evaluation of dense two-frame stereo correspondence algorithms Int. Jour. Computer Vision, 47(1/2/3):7–42, 2002 [11] Viola and Jones, "Rapid object detection using boosted cascade of simple features", Computer Vision and Pattern Recognition, 2001 [12] Y Freund, R Schapire, "A Short Introduction to Boosting", Journal of Japanese

Society for Artificial Intelligence, 14(5):771-780, 1999 [13] Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine". February 1999 [14] Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. Doi: 10.1023/A:1010933404324 [15] Y Chu, L Li, DB Goldgof, Y Qui, RA Clark, Classificationof masseson mammograms using support vector machine, Proc. SPIE, 2003, Vol. 5032, 940-948

Binocular Stereo Vision Based Gesture Segmentation ...

users' gestures or the region of the gestures, from one picture or video. And, only ... stereo vision system with two Bumblebee XB3 cameras, used K-means clustering method to .... Electronics (ICCE), 2011 IEEE International Conference, 2011.

120KB Sizes 0 Downloads 162 Views

Recommend Documents

Stereo Vision based Robot Navigation
stereo vision to guide a robot that could plan paths, construct maps and explore an indoor environment. ..... versity Press, ISBN: 0521540518, second edi-.

a fast algorithm for vision-based hand gesture ...
responds to the hand pose signs given by a human, visually observed by the robot ... particular, in Figure 2, we show three images we have acquired, each ...

Computer Vision Based Hand Gesture Recognition ...
Faculty of Information and Communication Technology,. Universiti ... recognition system that interprets a set of static hand ..... 2.5 Artificial Neural Network (ANN).

Binocular Vision and Squint.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Binocular Vision and Squint.pdf. Binocular Vision and Squint.pdf.

pdf-1894\normal-binocular-vision-theory-investigation-and-practical ...
pdf-1894\normal-binocular-vision-theory-investigation-and-practical-aspects.pdf. pdf-1894\normal-binocular-vision-theory-investigation-and-practical-aspects.

pdf-1487\binocular-anomalies-diagnosis-and-vision-therapy-by ...
Try one of the apps below to open or edit this item. pdf-1487\binocular-anomalies-diagnosis-and-vision-therapy-by-john-r-grisham-j-david-griffin.pdf.

Underwater Stereo Vision and vSLAM Support for the ...
Digital Compass- OS5000-USD. • Cameras- Allied Vision Prosilica GC750 (x3) two front-facing, one down-facing. • Batteries- Thunder Power RC G4 Pro Power 45C. • Hydrophones- Aquarian H1a. • Depth Sensor- SSI part P51-15-S-B-I36-4.5V-R. Stingra

ACTIVITY-BASED TEMPORAL SEGMENTATION FOR VIDEOS ... - Irisa
The typical structure for content-based video analysis re- ... tion method based on the definition of scenarios and relying ... defined by {(ut,k,vt,k)}t∈[1;nk] with:.

ACTIVITY-BASED TEMPORAL SEGMENTATION FOR VIDEOS ... - Irisa
mobile object's trajectories) that may be helpful for semanti- cal analysis of videos. ... ary detection and, in a second stage, shot classification and characterization by ..... [2] http://vision.fe.uni-lj.si/cvbase06/downloads.html. [3] H. Denman,

Segmentation of Markets Based on Customer Service
Free WATS line (800 number) provided for entering orders ... Segment A is comprised of companies that are small but have larger purchase ... Age of business.

Outdoor Scene Image Segmentation Based On Background.pdf ...
Outdoor Scene Image Segmentation Based On Background.pdf. Outdoor Scene Image Segmentation Based On Background.pdf. Open. Extract. Open with.

Segmentation of Connected Chinese Characters Based ... - CiteSeerX
State Key Lab of Intelligent Tech. & Sys., CST ... Function to decide which one is the best among all ... construct the best segmentation path, genetic algorithm.

Spatiotemporal Video Segmentation Based on ...
The biometrics software developed by the company was ... This includes adap- tive image coding in late 1970s, object-oriented GIS in the early 1980s,.

Efficient Hierarchical Graph-Based Video Segmentation
els into regions and is a fundamental problem in computer vision. Video .... shift approach to a cluster of 10 frames as a larger set of ..... on a laptop. We can ...

Segmentation-based CT image compression
The existing image compression standards like JPEG and JPEG 2000, compress the whole image as a single frame. This makes the system simple but ...

Hand gesture recognition for surgical control based ... - Matthew P. Reed
the desired hand contours. If PointCloud Data. (PCD) files of these gestures already exist, this method can be adjusted quickly. For the third method, the MakeHuman hand model has to be shaped manually into the desired pose and exported as a PCD-file

ACTIVITY-BASED TEMPORAL SEGMENTATION FOR VIDEOS ... - Irisa
based indexing of video filmed by a single camera, dealing with the motion and shape ... in a video surveillance context and relying on Coupled Hid- den Markov ...

Hand gesture recognition for surgical control based ... - Matthew P. Reed
Abstract. The introduction of hand gestures as an alternative to existing interface techniques could result in .... Above all, the system should be as user-friendly.

Vision-based hexagonal image processing based hexagonal ... - IJRIT
addresses and data of hexagonal pixels. As shown in Fig. 2, the spiral architecture is inspired from anatomical consideration of the primate's vision system.

VISION-BASED CONTROL FOR AUTONOMOUS ...
data, viz. the mean diameter of the citrus fruit, along with the target image size and the camera focal length to generate the 3D depth information. A controller.

VISION-BASED CONTROL FOR AUTONOMOUS ... - Semantic Scholar
invaluable guidance and support during the last semester of my research. ..... limits the application of teach by zooming visual servo controller to the artificial ... proposed an apple harvesting prototype robot— MAGALI, implementing a spherical.