FAST SVM-BASED KIDNEY DETECTION AND SEGMENTATION IN 3D ULTRASOUND IMAGES Roberto Ardon1 , Remi Cuingnet1 , Ketan Bacchuwar1 , Vincent Auvray1 1
Medisys Research Lab, Philips Healthcare, Suresnes, France
ABSTRACT We present a method to segment kidneys in 3D ultrasound images. The main challenges are high variability in kidney appearance, frequent presence of artifacts (shadows, speckle noise, etc.) and a strong constraint on computation time for clinical acceptance (less than 10 seconds). Our algorithm leverages a database of 480 3D images through a support vector machine(SVM)-based detection algorithm followed by a model-based deformation technique. Since severe pathologies induce strong deformations of kidneys, the proposed method encompasses intuitive interaction functions allowing the user to refine the result with a few clicks. Validation has been performed by learning on 120 cases and testing on 360; a perfect segmentation was reached automatically in 50% of the cases, and in 90% of the cases in less than 3 clicks. Index Terms— Kidney, Detection, Segmentation, 3D Ultrasound, Support Vector Machine, Convolution, Template Deformation 1. INTRODUCTION Ultrasound is the preferred imaging modality for kidney examinations. Its low cost, ease of accessibility and lack of radiations are among the main reasons for this preference. Currently, 2D B-mode is the most used ultrasound modality for these examinations that include measurements of long and short axis of the kidney. The kidney volume can also be estimated, and has, for instance, strong diagnostic value in pediatrics and in transplantation situations [1, 2, 3]. Unfortunately, these measurements have a low repeatability rate intra and inter practitioners due to differences in preferred probe positioning. Recently, 3D ultrasound has proven to provide increased precision and repeatability for distances and volumes estimations [4] but often additional software tools are needed to apprehend the three dimensions. Such tools require a time investment that surpasses the 2D/manual measuring duration in general. This significantly impedes the systematic use of 3D ultrasound in such quantification tasks. In this paper we present an automatic and fast solution to the problem of kidney segmentation in 3D US images. The main algorithmic challenges are the low image quality, the possible presence of artifacts (acoustic shadows), the variability in
the volume appearance (in particular in kidney contrast and its orientation) and the constraint on computational time (see figure 5). Our algorithm is composed of two steps: detection and segmentation. Detection is performed using an image convolution approach where the convolution kernel is learned, as detailed in section 2. The outcome of this first phase provides a favorable initialization to a template deformation algorithm described in Section 3. While constrained by a learned shape, the segmentation phase maximizes the image gradient flux through the segmentation border. In section 4 we present the result of an extensive validation on 360 images. Finally, Section 5 concludes this paper and provides material for future work.
2. KIDNEY DETECTION BY SVM-LEARNED KERNEL CONVOLUTION Previously proposed techniques for kidney segmentation in 3D US [1, 2] ask the user to initialize the segmentation process (e.g by providing the two extremities of the main axis). In contrast, we propose to automatically extract the bounding box of the kidney. Principle. We learn from a ground truth database a convolution kernel (W ) characterizing the expected kidney appearance. At test time, it is rotated and scaled to produce a series of convolutions with the image. The kidney bounding box is obtained through a max operation over all convolution results. Kernel design. The convolution kernel is learned from parallelepiped patches of different orientations and scales, either extracted close to the kidney, or from the background (Fig.1). More precisely, we consider transformations of the form Ψq,s,t = Rq ◦ Ds + Tt mapping parallelepiped regions of the image into a reference domain Ωr . Ψq,s,t incorporates a rotation Rq (parameterized by quaternion q), a scaling Ds (3D vector s) and a translation Tt (3D vector t). Taking the Laplacian of the image as the working feature, a patch is then defined as the restriction to Ωr of function Pq,s,t = ∆I ◦ Ψ−1 q,s,t .
(1)
Fig. 1. Illustration of our learning process: for each image b) of the database a), we generate positive kidney patches (green) and negative background patches (blue), as shown on column c). These are used as label inputs to the learning process that outputs the convolution kernel d) (top central slice, bottom vizualisation in space). Though illustrated on the center slice, the computations are performed on 3D volumes.
Then, extending to a set {Ik }k∈[1···N ] of images, the space of patches is the linear set of functions defined on Ωr : k P = Pq,s,t |k, q, s, t .
Test Phase. Function f can be rewritten as Z f (Pq,s,t ) = |JΨ | ∆I.W ◦ Ψq,s,t + b0 Ψq,s,t (Ωr )
= |JΨ | (∆I ∗ (W ◦ Ψq,s,0 )) (t) + b0 , (4) Acting on P, we now consider the linear function f (Pq,s,t )
= hPq,s,t , W i + b0 Z = Pq,s,t (x).W (x)dx + b0
(2) (3)
Ωr
where (W, b0 ) are originally unknowns. Using a learning database where patches corresponding to kidney bounding boxes are labeled +1, and corresponding to the background are labeled −1, we look for (W, b0 ) so that function f performs an optimal prediction on this database. We perform this optimization using a linear support vector machine because of its good generalization behavior. Data Base. The learning database is generated from 3D ultrasound images where a clinician has manually provided a segmentation of the kidney. For each ground truth image, we apply small (resp. large) random transformations to the patch delimited by the kidney’s bounding box (as in eq.(1)) to generate patches labeled +1 (resp. −1). In practice, per image, we extract a fixed number of positive and negative patches (30) which are normalized by their energy. The final kernel W is shown on fig.1. It matches the intuition as it contains regions corresponding to the expected white sinus, the dark parenchyma and the kidney-shaped boundary. Rotations and Scales. The kidney can present different orientations in the volume (depending on the position of the probe) and different scales (depending on the anatomy). We have leveraged our database to learn the expected orientations (Q) and scales (S) by k-means clustering (adapted to quaternions in the case of the orientation), yielding the set of possible transformations Ψ.
where |JΨ | is the Jacobian of the transformation. This expression reflects the detection procedure at test time. From the finite set of parameters Q × S, we build a series of filters W ◦ Ψq,s,0 and convolve ∆I with each of them. At different positions t, the sign of the convolution images informs on whether the box B = Ψq,s,t (Ωr ) qualifies as a kidney bounding box or not. Finally, among all well-classified patches we choose the one that is the further away from the separating hyper-plane (Fig.2): B ∗ = Ψ(Ωr ), Ψ =
argmax
f (∆I ◦ Ψ−1 q,s,t ).
(5)
q∈Q,s∈S,t∈Ω
It can also be interpreted as the highest response to the (scaled and rotated) convolution kernel shown on Fig.1.
Fig. 2. SVM Separation. The positive patch with the highest margin is selected.
Numerical precisions. Detection complexity is directly impacted by various discretization choices. The following settings proved to be a good compromise between speed and precision: Input image sub-sampled at 3 mm per voxel, convolution kernel (W ) of size 213 elements, 15 quaternions (Q) and 4 scales (S).
the optimization process as constraints on the sign of the deformed function ψ ◦ φ. In practice, we use a multi-threaded environment where, as the optimization iterations are performed, visual feedback is provided to the user, allowing realtime response interactions. 4. EXPERIMENTS AND RESULTS
3. KIDNEY SEGMENTATION Segmentation via template deformation. The detection performed in the previous section allows to position an initial kidney model, attached to the kernel, at the proper position, scale and orientation (the reference domain, Ωr , is chosen attached to this model). During the segmentation phase, it is deformed (Fig.3) following the framework described in [5] and also used in [6] to solve a similar problem. From the clinical application perspective, this framework has the advantage to offer a model-based segmentation tool (kidney shape is kept) with the possibility to easily interact with the result in real-time. Noting φ the distance function to the mean kidney model, the optimization searches for the transformation ψ minimizing Z Z Es (ψ) = − H(φ ◦ ψ)∆I + λ ku ∗ Kσ k2 (6) Ω Ω with ψ = (Id + u ∗ Kσ ) ◦ G H is the Heaviside function. G is a global transformation, which may correct or adjust the global pose and scale of the kidney model. u is a displacement field and Kσ is a Gaussian kernel that provides built-in smoothness. The first term of (6) stands for the data and aims ot maximize the image gradient flow (see [6]). As for the second term, it limits the model local deformations to stay close to a kidney shape.
Fig. 3. Illustration of the two steps of the algorithm: the red contour shows the detection result and the orange the final segmentation. Note the correction performed in the orthogonal view (bottom right).
User interactions. The implicit deformation framework [5] allows the user to interact by indicating points situated inside or outside the kidney. These points are introduced in
Fig. 4. Displaying: mean (point), median (horizontal bar), standard deviation (blue box) and ranges (10 and 90 percentiles) of the dice between the segmentation and the ground truth after a given number of clicks, over the test database of 360 volumes.
The validation of our method has been performed on a clinical dataset of 480 volumes acquired in 4 different clinical sites on iU22 ultrasound systems (Philips, The Netherlands) with different 3D probes (V6-2 and X6-1), spatial resolutions and fields of view. The patients are adults that can present diseased kidneys deviating from the generally expected appearance. As it is difficult to acquire images without artifacts while only three orthogonal slices are displayed during the acquisition, many volumes are corrupted by accoustic shadows. Depending on the probe positioning and setting, the kidneys present different appearances. The volumes are composed in average of 512 × 320 × 256 pixels. For each case, we have a ground truth kidney segmentation. The results presented here were obtained after learning on a database of 120 cases and testing on 360. We compared our results to the ground truths using the Dice coefficient defined as D(A, B) = 2.|A∩B| |A|+|B| . Figure 4 shows the score for the different results as a function of the number of clicks. With a median dice of 0.91, the automatic segmentation (obtained in less than 5 seconds on a personal computer) often provides a satisfactory segmentation. Note that the ground truth is very seldom exactly matched because of the high intra-operator variability. The following table displays the percentage of images satisfying a dice condition with respect to the ground truth. We see that dices above 0.75 are automatically obtained in about 80% of the cases, while excellent segmentations (Dice over 0.9) are reached automatically in more than half the cases, and in 3 clicks in 90% of the cases. Each click
Fig. 5. Results. The final segmentation is plotted in orange, the ground truth in light blue. a) Three volumes with correct segmentation (orthogonal views). Left: a favourable case (high contrast, no artifact), middle: accoustic shadow, right: low contrast, in particular to the liver. b) Illustration of the interactive correction. Left: two orthogonal slices, middle: the automatic segmentation, that is only partial, right: the corrections in two clicks (first the blue point is imposed to be out of the segmentation, next the red point is imposed to be in it). c) Three cases of failure of the automatic segmentation. Left: very low contrast, middle: major chysts, right: abnormally small kidney deprieved of parenchyma (chronic kidney disease).
adds about 1s to the computation time. Dice > 0.9 > 0.75 < 0.4
Auto 53% 77% 14%
1 click 72% 88% 6%
2 clicks 85% 94% 2%
3 clicks 90% 97% 1%
Fig.5 shows successful and failed segmentations in a variety of configurations. Once the segmentation is obtained measurements such as long axis, short axis, volume as well as automatic MPR captures are immediately available in order to generate automatic reports. 5. CONCLUSION AND PERSPECTIVES We proposed in this paper a novel and efficient approach to segment the kidney in 3D US images. This task is particularly challenging because of high variability, noise, artifacts and partial occlusions of the kidney. Our method computes in 5 to 8s the most relevant measurements a kidney examination should provide. A kernel convolution based kidney detector has been introduced, where a representative database has been used to learn the kernel. We showed that satisfactory segmentation is often obtained automatically and with minimal user interaction when manual intervention is needed. There is ground for improvement. Better results could be obtained with more advanced feature and/or more power-
ful classification techniques, such as structured support vector machines. 6. REFERENCES [1] C.S. Mendoza, X. Kang, N.M. Safdar, E. Myers, A.D. Martin, E. Grisan, C.A. Peters, and M.G. Linguraru, “Automatic analysis of pediatric renal ultrasound using shape, anatomical and image acquisition priors,” in MICCAI2013, 2013, pp. 259–266. [2] J.J. Cerrolaza, N.M. Safdar, C.A Peters, E. Myers, J. Jago, and M.G. Linguraru, “Segmentation of kidney in 3d-ultrasound images using gabor-based appearance models.,” in ISBI, 2014, pp. 633–636. [3] M. Martin-Fernandez and C. Alberola-Lopez, “An approach for contour detection of human kidneys from ultrasound images using markov random fields and active contours,” MedIA, vol. 9, no. 1, pp. 1–23, 2005. [4] K Bredahl, M Taudorf, A Long, L L¨onn, L Rouet, R Ardon, H Sillesen, and JP Eiberg, “3d ultrasound improves the accuracy of diameter measurement of the residual sac in evar patients,” European Journal of Vascular and Endovascular Surgery, vol. 46, no. 5, pp. 525–532, 2013. [5] B. Mory, O. Somphone, R. Prevost, and R. Ardon, “Template deformation with user constraints for live 3D inter-
active surface extraction,” in Proceedings of MICCAI ’11 Workshop MeshMed, 2011. [6] R. Prevost, B. Mory, J.M Correas, L.D. Cohen, and R. Ardon, “Kidney detection and real-time segmentation in 3d contrast-enhanced ultrasound images,” in ISBI 2012, May 2-5, 2012, 2012, pp. 1559–1562.