Mode Estimation using Pessimistic Scale Space Tracking Lewis D Griffin1 and Martin Lillholm2 1

Radiological Sciences, 5 Floor Thomas Guy House, Guys Campus, London SE1 9RT, UK [email protected] th

2

IT University of Copenhagen, Glentevej 67-69, DK 2400, Copenhagen, Denmark [email protected]

Abstract. Estimation of the mode of a distribution over n from discrete samples is introduced and three methods for its solution are developed and evaluated. The first solution is based on Fréchet’s definition of central tendencies. We show that algorithms based on this approach have only limited success due to the non-differentiability of the Fréchet measures. The second solution is based on tracking maxima through a Scale Space built from the samples. We show that this is more accurate than the Fréchet approach, but that tracking to very fine scales is unwarranted and undesirable. For our third method we analyze the reliability of the information across scale using an exact bootstrap analysis. This leads to a modified version of the Scale Space approach where unreliable information is downgraded (pessimistically) so that tracking into such regions does not occur. This modification improves the accuracy of mode estimation. We conclude with demonstrations on high-dimensional real and synthetic data, which confirm the technique’s accuracy and utility.

1 Introduction For a distribution D over a domain n the mode, like the mean and median, is simple to define1, it is the element of the domain that maximizes D ( x ) ; but unlike the mean and median, estimation of the mode from samples generated by the distribution is difficult. It is this estimation problem that concerns us here. In the remainder of the introduction we review previous approaches to this problem, then in sections 2-4 we describe and evaluate three (progressively better) approaches. In sections 5-6 we present results of our third technique, which we call Pessimistic Scale Space Track-

1

For some non-generic distributions, this definition fails to define a unique mode but this possible complication will be ignored in the remainder.

ing, when it is successfully applied to high-dimensional mode estimation problems. In section 7 we draw conclusions. 1.1 Previous Methods for Mode Estimation Mode estimation is straightforward if one has a prior expectation of the form of the density — one simply finds the MAP estimate of the density and reads off the mode. A more flexible version of this is the Gaussian mixture method (Everitt & Hand, 1981) that models a distribution as a sum of Gaussians of varying center and width. Again, the model is fitted to make the observed data as likely as possible and then the mode of the Gaussian mixture can be calculated by hill climbing. The power of this technique is in applications where the densities can be expected to be multi-modal; it is not a universal panacea for density estimation. If no parametric form for the density is available then the naïve way to proceed is to bin the data, plot a histogram and read off its mode. Obvious problems with this are a shift-dependency on the bin counts and a dependency on the bin width. The shift-dependency problem was solved by kernel methods (Parzen, 1962) where the discrete data (modeled as a collection of delta functions) is convolved by a smoothing kernel (see figure 1). As with filtering images (Griffin, 1995), much discussion of the optimal kernel shapes followed Parzen’s work but eventually the Gaussian was agreed on and this approach was popularized (Silverman, 1986).

Figure 1 – Shows at top-left a log-normal distribution ( µ = σ = 1 ) and 64 random samples from the distribution. This density and the samples are used in sections 2-4. All three histograms (top) are of these samples, but they have different bin widths. The kernel density estimates (bottom) are created using Gaussian windows of scale equated with the corresponding histogram bin widths.

To solve the question of the optimal bin width, Parzen proposed that the optimal kernel should minimize the L2 difference between the estimated and true densities. He further showed that this difference depends on (i) the kernel shape & width and (ii)

the L2-norm of the true density’s 2nd derivative; but since we don’t know the true density this fails to solve the problem. Numerous methods (Wand & Jones, 1995) have been proposed for finding a globally optimal kernel width, such as (i) to use a standard form for the true density (e.g. Gaussian (Fukunga, 1990)), (ii) ‘plug-in methods’ that estimate the density 2nd derivative by assuming normality of its 4th derivative (and so on), (iii) iterative plug-in (Scott, Tapia & Thompson, 1977). More recently, methods for estimating locally-optimal kernel width are being developed, including: (i) use of a ‘pilot estimate’ of the density obtained by filtering the data (raising the problem of what kernel width to use) (Hazelton, 1999), and (ii) iteratively estimating the density, thus bandwidths, thus density, etc. (Katkovnik & Shmulevich, 2002). The case of multivariate date is rarely tackled, an exception being (Gasser, Hall & Presnell, 1998); in order to use a pilot estimate technique, they make the strong assumption that the unknown density is the product of its marginal densities i.e. they assume independence between the different dimensions of the space. All the methods reviewed above are either: (i) derived by asymptotic analysis, (ii) use heuristics, or (iii) make strong assumptions about the true density. Very often more than one of these criticisms applies.

2 Mode Estimation using the Fréchet Definition Our first method of mode estimation is based on Fréchet’s definition of central location (Fréchet, 1948, Griffin, 1997). Given a distribution D ( x ) : n → he defines a family of central locations µr ≥ 0 as the minimizers of

(

r

x ∗D

)

1

r

(we will refer to this

as Fréchet’s measure). For r = 2 one obtains the mean, for r = 1 the median and in the limit as r → 0 the mode. The definition is elegant in that it produces the mean, median and mode in a single framework that applies to a distribution of any dimension, but there is a concealed subtlety in the definition of the mode of which one should be aware. The subtlety is that in general there will be multiple local minimizers for 0 ≤ r < 1 and these minimizers change their locations continuously with r . To use the Fréchet definition operationally to define the mode, one must make an additional assumption or requirement that the entire family of central locations µ r should be continuous with respect to r . Using this assumption, one proceeds by first locating the median µ1 , then tracking down from r = 1 to r = 0 following a continuous path of local minimizers of Fréchet’s measure as one does so – eventually at r = 0 one arrives at the mode. Fréchet’s definition is normally used with explicitly given distributions, but we wondered if it has use as an estimator. Our best effort to this end is shown in figure 2, where results of estimating the mode of a log-normal distribution from samples are presented. The contour plot shows Fréchet’s measure (logged and negated for clarity) on which is superimposed the path that we have tracked as r was reduced. Although the result is clearly close to the mode of the distribution from which the samples came (cf. figure 1) this is partly good fortune. If one looks closely, one notices that for

r < 1 there is almost no lateral movement of the tracked minimizer and because of this at r near 0, the global minimizer of Fréchet’s measure is not found.

r 2

1

 n r r −  1n ∑ x − pi   i =1 

1

0 Figure 2 – Shows an approach to mode estimation using Fréchet’s definition of central tendency. The variable x ranges across the domain of the distribution and is the horizontal axis of all 3 plots. The pi are the samples from the distribution, r is the parameter that indexes the family of central locations. See text (section 2) for more details.

The explanation for the failure of tracking is not difficult to discover and is illustrated by the panels in the center and bottom right of figure 2. These show the Fréchet measure for r = 0.37 which corresponds to the level marked by the horizontal ticks adjacent to the contour plot. In the expanded view of the measure one can clearly see its non-differentiable nature, which is why tracking has failed — gradient ascent algorithms find it difficult to stay on such singularly narrow ridges.

3 Mode Estimation using Scale Space Tracking (SST) Our second method is based on the more familiar kernel density approach to mode estimation (using Gaussians as our kernels), but rather than searching for a globally or locally optimal kernel widths we follow previous authors (Marchette & Wegman, 1997, Minnotte, Marchette & Wegman, 1998, Minonotte & Scott, 1993, Silverman, 1981) in considering the full continuum of widths (or scales). These authors have noted that one may track modes as they vary continuously in position and annihilate with anti-modes as scale is increased from fine-to-coarse. Indeed much of early Scale Space work was duplicated (or possibly preceded) in the field of non-parametric density estimation, including the observation that in 1-D, modes are never created as scale increases (Silverman, 1981). While these authors have concentrated on using the pattern of modes over scale (the ‘mode tree’) to understand and visualize multimodality, one may also use it as a numerical method of mode estimation. In 1-D the idea is as follows: at sufficiently coarse scale there is exactly one mode; since modes

are never created with increasing scale, this mode may be tracked through decreasing scale to one of the original sample points; and this sample point may be taken as an estimate of the mode of the true distribution2. An example of this is shown in figure 3.

Figure 3 – Shows an approach to mode estimation using Scale Space Tracking (SST). The horizontal axis is the domain of the density from which the sample points (same as figure 1) along the bottom have been generated. The vertical axis is log scale. The jagged polyline is the path of the mode tracking and its lower endpoint is the estimate of the mode that results.

2

We are unaware of any proof that in two or more dimensions the equivalent result — that the final mode that exists at coarse scale can be traced back to a sample point at zero scale — holds. This does not have any impact on our algorithms though.

At this point we will introduce formalism for the general case of SST in a D-dimensional space. Given n sample points p1 ,… , pn in D-dimensional space, define the number of sample points in a Gaussian bin, centred at x and of scale s, to be: n

w ( x, s ) = ∑ e i =1



pi − x 4s

2

. The volume of such a bin is v ( s ) = ( 4π s )

estimated density is d ( x , s ) =

w ( x, s ) v (s)

D

2

, so the kernel-

. Figure 3 shows the logarithm (for improved

visualization) of such a density created from the 64 sample points introduced in figure 1. To use this density for mode estimation, we start tracking at

(

)

T

x * = µ1 { p11 ,… , pn1} ,… , µ1 { p1D ,… , pnD } which is calculated by, separately for each dimension, taking the median of the sample point values; and

(

})

{

2

s* = µ 2  iqr { p11 ,… , p1n } ,… ,iqr { pD1 ,… , pDn }  which is the square of the   mean (over dimensions) of the inter-quartile range of the sample points. We use these robust measures to produce a starting point rather than the more obvious measures based on the sample mean and variance, because we often deal with highly kurtosed data (e.g. in section 6) for which mean and variance are very unstable measures. To track from our starting point we could simply use a gradient ascent method to move through x and s, but this is too uncontrolled. Consider figure 3. The starting point is that at the top of the figure. If we gradient ascend the density function shown in the figure, we will immediately rush towards finer scales where higher kernelestimated densities may be found. This charge to finer scales happens so quickly that the algorithm fails to track the path of the mode that exists at all scales. To prevent this it is not sufficient simply to remap (for example, logarithmically) the scale axis. Rather, we must carefully control the rate of reduction of s as is done in graduated non-convexity methods (Blake & Zisserman, 1987). We use a reduction rate γ=0.5. First we track from s ∗ to γ s∗ , then from γ s∗ to γ 2 s ∗ and so on. This is visible in figure 3 in which the intervals are equal size as the vertical axis is log scale. In each interval we reparameterize scale logarithmically, for example for the first interval we gradient ascend on the parameters x and t1 where s = γ s* + et1 and t1 is initialized to

( )

t1 = ln s* + ln (1 − γ ) . We terminate the scheme when s is a small fraction of its start-

ing value. Three details of our implementation are worth mentioning. First, we gradient ascend the log-density rather than the density. This is forced on us when we are dealing with high-dimensional problems (e.g. D=64 in section 6) because the unlogged bin volume v ( s ) = ( 4π s )

D

2

is so large that it causes numerical overflow. The second

detail is that we use Polak-Ribiere conjugate-gradient ascent (Press, Teukolsky, Vetterling & Flannery, 2002). This requires explicit gradient calculations but these are fast, simple and stable when using Gaussian kernels. The final detail is that when

calculating the density and its derivatives we ignore points that are more than 8 2 s from x as the Gaussian bin is of negligible weight at this distance. A consequence of this is that the ascent runs faster and faster as s reduces, so for computation time, it matters not if we track to unnecessarily fine scales. We have evaluated the performance of this scheme when estimating the mode of the log-normal distribution in figure 1. For each sample size in the range n = 22 ,… , 210 we generated that number of samples and used scale-space modetracking to estimate the mode. This was repeated 104 times for each sample size to get an accurate figure for the root-mean-squared estimation error. For comparison we performed an identical experiment using the Fréchet scheme of the previous section. The results are shown in figure 4 where it can be seen that Fréchet is superior up to sample sizes of 90, but SST is superior for larger samples. We hypothesize that this is because for small sample sizes the Scale Space method is biased towards the mean, while the Fréchet method is biased towards the median.

RMS error of mode estimate

ordinary scale-space

0.5

Frechet

0.1

4

8

16

32

64

128

256

512

1024

sample size

Figure 4 – Compares the performance on two approaches to mode estimation.

4 Mode Estimation using Pessimistic Scale Space Tracking (PSST) The Scale Space tracking of section 3 has a flaw that became very apparent when we used it on high dimensional problems. Inevitably the mode-estimate that the method returns will be one of the original samples and for a very high dimensional problem it is tremendously unlikely that any one of the samples will correspond to the mode of the true distribution. This seems to limit performance. To fix this it seems desirable to prevent the tracking reaching very fine scales, but how is one to define very fine? We have already noted the profusion of literature on selecting the optimal kernel width in

the Parzen density estimation approach. Figure 5 gives a clue to an alternative approach.

Figure 5 – Histograms of the 64 samples introduced in figure 1 with different bin widths.

The most notable aspect of the histograms of figure 5 is the increase in the number of modes with smaller bin width, but there is another interesting aspect — the bin heights become noticeably quantized for small bin width, because they have so few samples. Having very few samples in a bin means that its height is an unreliable estimate of the true density. We can quantify this unreliability through a thought experiment. Suppose that we repeatedly run a bootstrap process of sampling-withreplacement our data to make synthetic data (Efron & Tibshirani, 1993). Say we have a hard-edged bin containing m samples out of a total of n. If we bootstrap the data, the number of samples that we will see in the bin will be binomially distributed with mean m and standard deviation

m ( n − m ) . Our idea is to apply this same analysis to

fuzzy Gaussian bins/kernels rather than hard-edged bins. We continue the formalism introduced in the previous section, but now consider a fixed Gaussian kernel centred at x and of scale s. On the original data, this bin ‘sees’ w ( x , s ) samples. Now consider a single point q picked (during bootstrap) uniformly from the pi . This random process induces a random bin-count process w1 . The mean and variance of this process will be: 1 n − µ  w1 ( x , s )  = ∑ e n i =1 2 p −x 1 n  − i 4 s   V  w1 ( x , s )  = ∑ e n i =1   Now consider n points q1 ,… , qn

2

pi − x 4s

2

=

1 w ( x, s ) n

 2  − µ  w1 ( x , s )  = 1 w ( x , s 2 ) − 12 w ( x , s )2    n n  generated by a full bootstrap process from the pi .

(

)

Because each qi is independent and bootstrap is sampling with replacement, this will induce a random bin count process wn with mean and variance:

µ  wn ( x , s )  = nµ  w1 ( x , s )  = w ( x , s ) V  wn ( x , s )  = nV  w1 ( x , s )  = w ( x , s 2 ) − 1n w ( x , s )

2

Note how the variance formula is a function of the bin weights at scale s and s . We 2 illustrate this in figure 6 where the black kernel is of scale s and the grey kernel of scale s . This makes clear that the bootstrap bin-count process for Gaussian bins, 2 by depending on two measurements of the data, is more geometrically sensitive than it is for hard-edged histogram bins, which depends on only one measurement. We also note that it is the nice property of the Gaussian that when squared it gives another Gaussian, which leads to this simple formula for the variance.

mean = w

sd = w∗ − 1n w2

Where: w = # in black window, w* = # in grey, n = total # of samples Figure 6 – Illustrates the bin count of the black aperture during a bootstrap process.

At this point in the development of our method we take a step that though plausible has no rigorous justification. To prevent tracking into bins that have high density but small numbers, we reduce the number seen by a bin by some number of standard deviations, where the standard deviation comes from the preceding bootstrap analysis. The ‘logic’ here is that if we reduce by (say) 1.645 sds we get approximately the bin count that would be exceeded on 95% of bootstrap replications. The ‘approximately’ arises because we make the assumption that the bin counts would be normally distributed across bootstrap replications. Figure 7 shows how the Scale Space changes with 2.23 sds of downgrading. We call such a downgraded Scale Space a ‘Pessimistic Scale Space’ as it is in a sense sceptical about the number of samples in a bin.

Figure 7- Shows the standard Scale Space of the samples at the left (same as figure 3) and the ‘Pessimistic Scale Space’ on the right, where 2.23 sds of pessimism have been used. The jagged polylines show tracking in these Scale Spaces, and the dark discs mark the resulting mode estimates.

We are able to track in Pessimistic Scale Space in a very similar way to standard Scale Space. The following changes should be noted: (1) rather

use

d ( x, s ) = w ( x, s ) v ( s ) , −1

we

use

2 −1 d ( x , s ) =  w ( x , s ) − k w ( x , s 2 ) − 1n w ( x , s )  v ( s ) , where k is the number   of sds of pessimism, and the operator   returns the argument unchanged

if it is positive and 0 otherwise

(

)

(2) similarly to before, we actually ascend ln 1 + d rather than d

(

(3) explicitly computed gradients are still used but of ln 1 + d

)

rather than

ln ( d ) . The computation of these is more complicated but not excessively.

(4) We stop the process when tracking is no longer decreasing the scale i.e. when we reach a maximum relative to x and s. We have evaluated PSST using the methodology we used to compare the Fréchet and standard SST approaches. The results are shown in figure 8. Because we are unclear about how many sds of pessimism (k in the equation for d above) to use, we have tried k=1, 2, 3 and 4. What the figure shows is that the optimal k increases with the size of sample. For samples of n ≤ 12 k = 1 is best, for 12 < n ≤ 50 k = 2 is best, for 50 < n ≤ 800 k = 3 and for n > 800 k = 4 . Using this result and others not shown be have fitted a function to give the optimal k for n samples. It is kopt = −0.083 + 0.557 ln n . Figure 8 shows the accuracy of mode estimation using this

scheme and it can be seen that it is certainly always better than ordinary Scale Space or Fréchet tracking, and generally better than any of the pessimistic schemes with fixed k. We will return to this point in the conclusion, but we note now that fitting of our kopt formula has been done for a single 1-D distribution so it may not be general. Nevertheless it is the method that we will use for the high-dimensional mode estimation problems presented in sections 5 and 6. We also note that the value k = 2.23 used to generate figure 7 is the optimal value (according to our formula) for 64 samples.

1.0 sds

RMS error of mode estimate

2.0 sds 3.0 sds 4.0 sds

0.5

0.1

ordinary scale-space

Frechet

# sds = −0.083 + 0.557 ln N 4

8

16

32

64

128

256

512

1024

sample size

Figure 8 – Compares all three mode-estimation techniques presented. The coloured curves are from PSST with a fixed number of sds of pessimism. The black curve, which is approximately their envelope, is with a scheme where the number of sds depends on the sample size.

We conclude this section with figure 9, which is designed to give the reader a visual handle on the accuracy of mode estimation by PSST. We have generated 104 sets of 1024 samples from the log-normal distribution that we use throughout. The figure shows a histogram of one such set of samples. For each set we have calculated estimates of the mean and median in the usual way, and the mode using pessimistic tracking with 3.78 sds of pessimism which is the value that comes from our kopt formula. We have found the 95% confidence limits of these estimates and displayed them in figure 9, along with the true values of the mean, median and mode.

Figure 9 – A typical histogram of 1024 samples from the distribution shown by the black curve. Below this are 95% confidence intervals for mean estimation (right), median (middle) and mode (left) based on this number of samples i.e. not for this particular set of samples but for sets of samples of this size. The central ticks of each interval mark the true values. The mode estimation method used is pessimistic tracking with 3.78 sds of pessimism.

5 Results on Handwriting Data Our first example of high dimensional mode estimation uses handwriting data (Alimoglu & Alpaydin, 1996). The data consists of poly-line representations of the digits 0-9, captured using a stylus, writing within a box on a pressure sensitive tablet. For each digit there are 1100 instances written by 44 subjects (i.e. 25 per subject). The data was captured at high temporal resolution but has been sub-sampled down to 8 points equally spaced along the stylus tip trajectory, so each sample is a point in 16 . Estimating the mode by building hard-binned histograms is out of the question for such data, as even if we coarsely quantized each dimension into 8 ranges, the histogram would have 248 bins. Even if one adopted some efficient coding scheme to avoid memory problems, this approach could not succeed, as the data is so sparse: consider that 1.5491316 = 1100 . We have used the PSST method described in section 4, to estimate modes for each of the digits. For this we used k=3.82 sds of pessimism as suggested by our fitted formula kopt = −0.083 + 0.557 ln n . Each mode estimation computation took less than 1sec on a 1.1GHz PC. We show results in figure 10, along with a depiction of the raw data. We also show for comparison the mean digits (computed by straightforward averaging) and the dimension-wise-modal digits. The dimension-modal digits are computed by calculating the mode, using PSST, of each of the 16 dimensions separately.

Several points are notable about these results: 1. Several of the mean digits (particular 0, 5 & 8) are smaller than the raw digits; this is because of the sensitivity of the mean to outliers. 2. Most of the dimension-wise modal digits have five or more of their eight points on the border of the square; this is because each digits has been translated and scaled so that it has two of its points on the border. 3. By informal inspection, for all ten digits, the modal digit is either the best example of the digit or equal best with the mean or dimension-wise modal. While we have no way of knowing what the true modes of these data are, our estimates are certainly plausible.

Figure 10 – On the top row the 1100 poly-lines for each digit have been combined. The 2nd row shows the mean digits, the 3rd row the dimension-wise-modal digits and the 4th row shows the estimated modal digits.

6 Results on Image Profiles Our final example involves finding the mode of a distribution over the space of 1-D functions. We represent these functions with 64 samples so that this is a problem of finding the mode of a distribution over 64 . We are interested in this problem as it is part of a program of research aimed at discovering feature classes by investigating natural image statistics (Griffin, Lillholm & Nielsen, 2002, Tagliati & Griffin, 2001). We will consider 1-D profiles extracted at random position and orientation from 3 classes of images – Gaussian noise, Brownian noise (Pedersen, submitted) and natural images (van Hateren & van der Schaaf, 1998) – examples of which are shown in figure 11. The figure also shows how we process these samples: we measure them with 0th and 1st order Gaussians ( σ = 7 ) and then affinely scale them so that they then measure 0 and 1 with these two filters i.e. we bring them into the same metamery class (Koenderink, 1993) for these two filters. Although our primary interest is in the natural images we also use the noise images since we can prove what the mode is in

these two cases (Griffin et al., 2002): for Gaussian images the mode is a Gaussian first derivative, and for Brownian images the mode is an error function.

Natural Images

Gaussian Noise Images

Brownian Noise Images 0th & 1st order 1-D filters

extracting profiles from images

scaled profiles

Figure 11 – Shows at top examples from the three classes of image that we use in the experiment of section 6. The panel at bottom left shows how we extract profiles from these images, and at right the filters that we measure profiles with and how 10 typical profiles look after scaling to bring them into the same metamery class for these filters.

We collect 2.4×106 of each class of scaled profiles (each of which is represented as a 64-dimensional vector) and use PSST to estimate the mode of each class. Each mode-estimation takes approximately 40mins on a 1.1GHz PC. We repeat this four times (with new profiles each time) for each class of images so that we can calculate confidence intervals on our estimated modes. The results are shown in figure 12. The figure shows that within the limits of the confidence intervals the estimated modes of the two classes of noise image are correct i.e. that the mode of the Gaussian image profiles is a Gaussian 1st derivative and the mode of the Brownian image profiles is an error function. The mode of the natural image profiles is close to being a step

edge; the correctness of the mode estimates of the noise images makes us confident of the correctness of this result.

Gaussian Noise Images

Brownian Noise Images

Natural Images

Figure 12 – Estimated modes (solid with error bars) based on 2.4×106 samples. Four independent estimates of each mode were calculated to derive the confidence intervals. Also shown in each plot is (i) the true mode of the Gaussian noise images – a Gaussian 1st derivative of the same scale as the filters that define the metamery class, (ii) the true mode of the Brownian noise images – the integral of a Gaussian of the same scale as the filters that define the metamery class, and (iii) a centred step edge within the metamery class – the mode of natural image profiles is similar to this curve.

7 Conclusions We have presented three methods of mode estimation, but recommend the third – Pessimistic Scale Space Tracking – as the most effective. We have quantified its performance on a test 1-D log-normal distribution which will allow ready comparison with any methods proposed in the future. We have also demonstrated with the Gaussian and Brownian noise image profiles in section 6 that it recovers even highdimensional modes correctly. We are aware of two weak points in the derivation of our algorithm. The first is that there should be a better explanation of why pessimistic discounting should work at all. The second concerns our fitted equation that provides the optimum number of sds of pessimism to use as a function of the sample size. This equation is unjustified and based only on experiments with 1-D distributions and sample sizes up to 210. However the fact that the method is successful on 64-D distributions with very large sample sizes (2.4×106) suggests that it is not too bad a guess.

References Alimoglu, F., & Alpaydin, E. (1996). Methods of combining multiple classifiers based on different representations for pen-based handwriting recognition. 5th Turkish AI & ANN Symposium (Istanbul. Blake, A., & Zisserman, A. (1987). Visual Reconstruction. (MIT Press.

Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. (New York, London: Chapman and Hall. Everitt, B.S., & Hand, D.J. (1981). Finite Mixture Distributions. (London: Chapman Hall. Fréchet, M. (1948). Les elements aléatoires de nature quelconque dans un espace distancié. Annales de l'Institut Henri Poincaré, X, 215-308. Fukunga, K. (1990). Statistical Pattern Recognition. (New York: Academic Press. Gasser, T., Hall, P., & Presnell, B. (1998). Nonparametric estimation of the mode of a distribution of random curves. Journal of the Royal Statistical Society Series B-Statistical Methodology, 60, 681-691. Griffin, L.D. (1995). Descriptions of Image Structure. (London: PhD thesis, University of London. Griffin, L.D. (1997). Scale-imprecision space. Image and Vision Computing, 15 (5), 369-398. Griffin, L.D., Lillholm, M., & Nielsen, M. (2002). Natural image profiles are most likely to be step edges. Vision Research, submitted Hazelton, M.L. (1999). An optimal local bandwidth selector for kernel density estimation. Journal of Statistical Planning and Inference, 77 (1), 37-50. Katkovnik, V., & Shmulevich, I. (2002). Kernel density estimation with adaptive varying window size. Pattern Recognition Letters, 23 (14), 1641-1648. Koenderink, J.J. (1993). What is a feature? Journal of Intelligent Systems, 3 (1), 49-82. Marchette, D.J., & Wegman, E.J. (1997). The filtered mode tree. Journal of Computational and Graphical Statistics, 6 (2), 143-159. Minnotte, M.C., Marchette, D.J., & Wegman, E.J. (1998). The bumpy road to the mode forest. Journal of Computational and Graphical Statistics, 7 (2), 239-251. Minonotte, M.C., & Scott, D.W. (1993). The Mode Tree: A Tool for visualization of nonparametric density features. J. Comp. & Graph. Statist., 2, 51-68. Parzen, E. (1962). On estimation of probability density function and mode. Annals of Mathematical Statistics, 33, 520-531. Pedersen, K.S. (submitted). Properties of Brownian Image Models in Scale-Space. In: L.D. Griffin (Ed.) Proc. Scale Space 2003 (Springer. Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (2002). Numerical Recipes in C++. (Cambridge: Cambridge University Press. Scott, D.W., Tapia, R.A., & Thompson, J.R. (1977). Kernel density estimation revisited. Nonlinear Analysis, Theory, Methof & Application, 1, 339-372. Silverman, B.W. (1981). Using Kernel density Estimates to investigate Multimodality. J. Roy. Statist. Soc. B, 43, 97-99. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. (London: Chapman Hall. Tagliati, E., & Griffin, L.D. (2001). Features in Scale Space: Progress on the 2D 2nd Order Jet. In: M. Kerckhove (Ed.) LNCS, 2106 (pp. 51-62): Springer. van Hateren, J.H., & van der Schaaf, A. (1998). Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London Series B-Biological Sciences, 265 (1394), 359-366. Wand, M.P., & Jones, M.C. (1995). Kernel Smoothing. (London: Chapman Hall.

Mode Estimation using Pessimistic Scale Space Tracking

1 For some non-generic distributions, this definition fails to define a unique ..... mode estimation method used is pessimistic tracking with 3.78 sds of pessimism.

764KB Sizes 0 Downloads 166 Views

Recommend Documents

Phase estimation using a state-space approach based ...
In this paper, we propose an elegant spatial fringe analysis method for phase estimation. The method is presented in the context of digital holographic interferometry (DHI), which is a prominent optical technique for analyzing the deformation of a di

A Multiscale Mean Shift Algorithm for Mode Estimation ...
Computer Science, University College London, UK; [email protected] ... algorithm are that (i) it is simple, (ii) it is fast, (iii) it lacks data-dependent parameters ...

A Multiscale Mean Shift Algorithm for Mode Estimation 1. Introduction
Computer Science, University College London, UK; [email protected]. 2 ...... 36(4): p. 742-756. 13. Everitt, B.S. and D.J. Hand, Finite Mixture Distributions.

Fast Sub-Pixel Motion Estimation and Mode Decision ...
partition selection and only performs the 'precise' sub-pel search for the best ... Motion Estimation process contains two stages: integer pixel search .... full. COST avg. COST. COST avg. COST ii blocktype if c where COSTFull is the best COST after

Robust Tracking with Motion Estimation and Local ...
Jul 19, 2006 - The proposed tracker outperforms the traditional MS tracker as ... (4) Learning-based approaches use pattern recognition algorithms to learn the ...... tracking, IEEE Trans. on Pattern Analysis Machine Intelligence 27 (8) (2005).

Robust Tracking with Motion Estimation and Local ... - Semantic Scholar
Jul 19, 2006 - Visual tracking has been a challenging problem in computer vision over the decades. The applications ... This work was supported by an ERCIM post-doctoral fellowship at. IRISA/INRIA ...... 6 (4) (1995) 348–365. [31] G. Hager ...

Object Tracking using Particle Filters
happens between these information updates. The extended Kalman filter (EKF) can approximate non-linear motion by approximating linear motion at each time step. The Condensation filter is a form of the EKF. It is used in the field of computer vision t

Tracking Large-Scale Video Remix in Real-World Events - CiteSeerX
Our frame features have over 300 dimensions, and we empirically found that setting the number of nearest-neighbor candidate nodes to can approximate -NN results with approximately 0.95 precision. In running in time, it achieves two to three decimal o

2009_TRR_Draft_Video-Based Vehicle Detection and Tracking Using ...
2009_TRR_Draft_Video-Based Vehicle Detection and Tracking Using Spatiotemporal Maps.pdf. 2009_TRR_Draft_Video-Based Vehicle Detection and Tracking ...

TEST-BASED PARAMETER ESTIMATION OF A BENCH-SCALE ...
control tasks. This requires extrapolation outside of the data rank used to fit to the model (Lee ..... Torres-Ortiz, F.L. (2005). Observation and no lin- ear control of ...

Tracking Large-Scale Video Remix in Real ... - Research at Google
out with over 2 million video shots from more than 40,000 videos ... on sites like YouTube [37]. ..... (10). The influence indexes above captures two aspects in meme diffusion: the ... Popularity, or importance on social media is inherently multi-.

Practical Large-Scale Latency Estimation - Research at Google
network paths allows one to localize the communication, leading to lower backbone and inter-ISP link .... application-independent latency estimation service.

inteligibility improvement using snr estimation
Speech enhancement is one of the most important topics in speech signal processing. Several techniques have been proposed for this purpose like the spectral subtraction approach, the signal subspace approach, adaptive noise canceling and Wiener filte

Efficient Estimation of Word Representations in Vector Space
Sep 7, 2013 - Furthermore, we show that these vectors provide state-of-the-art perfor- ... vectors from huge data sets with billions of words, and with millions of ...

Large-scale Incremental Processing Using Distributed ... - USENIX
collection of machines, meaning that this search for dirty cells must be distributed. ...... to create a wide variety of infrastructure but could be limiting for application ...

Large-Scale Automated Refactoring Using ... - Research
matching infrastructure are all readily available for public con- sumption and improvements continue to be publicly released. In the following pages, we present ...

Speech Emotion Estimation in 3D Space
30], is an important research topic in signal processing and machine learning, because a ... The correlation coefficient (CC) between ˆyn and yn. 5.1. Elementary ...

Curvature Scale Space Based Affine-Invariant Trajectory Retrieval
represented as a trajectory can help mine more information about video data than without. On these lines of object trajectory based video retrieval, Chen et.