Multi-instance Object Segmentation with Occlusion Handling Yi-Ting Chen, Xiaokai Liu, Ming-Hsuan Yang
IEEE 2015 Conference on Computer Vision and Pattern Recognition
page: http://faculty.ucmerced.edu/mhyang/code
(a) Occluding region
(d) Horse
• For each superpixel covered by a categorized proposal, we record c j c j ∈C the corresponding category and score, and form a list {ssp }sp∈I
…
Sn
categorized object occluding hypotheses regions
output …
0.30 Mm
…
…
…
…
…
SDS CNN
Grabcut with occlusion handling
…
…
…
…
class-specific likelihood map …
…
…
…
cropped images
shape predictions
feature extraction Box CNN Region CNN
foreground images
inferred masks
shape prior
• The shape prior is defined as the weighted mean of inferred masks in the cluster cls(n) 1 cj CM s ⋅ s ∑ hk hk ⋅ M m | cls(n) | M m ∈cls(n)
• s is the classification score of the proposal hk for the class c j • s is chamfer matching score between the contour of an exemplar the contour of a proposal
Grabcut with Occlusion Handling feature vectors
Class-specific classifiers
avg
TV
train
sofa
sheep
plant
person
mbike
horse
dog
dtable
Table 2: Results of the joint detection and segmentation task using APr metric at different IoU thresholds on the VOC PASCAL 2012 segmentation validation set. The top two rows show APr results using all validation images. The bottom two rows show APr using the images with occlusions between instances.
nth
cj hk CM hk
SDS CNN object proposals and foreground masks
matched points
Sn =
Exemplarbased
shape prediction
exemplars
exemplar templates
SDS 58.8 0.5 60.1 34.4 29.5 60.6 40.0 73.6 6.5 52.4 31.7 62.0 49.1 45.6 47.9 22.6 43.5 26.9 66.2 66.1 43.8 Ours 63.6 0.3 61.5 43.9 33.8 67.3 46.9 74.4 8.6 52.3 31.3 63.5 48.8 47.9 48.3 26.3 40.1 33.5 66.7 67.8 46.3
Sn
…
object hypotheses
0.64
cow
aero
Chamfer matching
Overall Framework input
M1
0.28
chair
corner detector
Table 1: Per-class results of the joint detection and segmentation task using APr metric over 20 classes at 0.5 IoU on the VOC PASCAL 2012 segmentation validation set. All numbers are %. bike
B
˟A ˟
(c) Segmentation w/ subtracting occluding region
Quantitative Results – Detection
(d) Our result
• Occlusion cannot be handled by bottom-up segmentation • An exemplar-based shape prediction and occlusion regularization are introduced
Qualitative Results – Detection
• Classification scores of Figure (b) and (c) determine the energy assignment to the occluding region
Exemplar-based Shape Prediction (c) MCG result
(b) Segmentation w/o subtracting occluding region
cat
(c) Person
car
(b) Superpixel
bus
(a) Input
bottle
(b) Ground truth segmentation
boat
(a) Input
Class-specific Likelihood Map
bird
Motivation and Objective
§ Eocclusion is the Occlusion Regularization
• Grabcut initialization: Segmentation proposals + thresholded shape priors • Energy function E = Eappearance + Eocclusion + Elikelihood + Esmoothness § Eappearance models foreground and background appearances by using GMMs § Elikelihood is based on the class-specific likelihood map § Esmoothness is the same as in Grabcut
# of images SDS Ours SDS Ours
1449 1449 309 309
0.5 43.8 46.3 27.2 38.4
0.6 34.5 38.2 19.6 28.0
IoU Score 0.7 21.3 27.0 12.5 19.0
0.8 8.7 13.5 5.7 10.1
Qualitative Results – Segmentation
0.9 0.9 2.6 1.0 2.1
Discussion • When exemplar-based shape prediction is disabled, the detection performance drops from 46.3% to 39.3%; When occlusion regularization is disabled, the performance drops from 46.3% to 46%.
• A better estimate of object shape helps detection significantly.
Reference 1. P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, “Multiscale combinatorial grouping,” in CVPR, 2014 2. B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in ECCV, 2014. 3. C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive foreground extraction using iterated graph cuts,” in SIGGRAPH, 2004