Notes on the Spectral Aspects of Linear Prediction of Speech Based on the Absolute Error Minimization Criterion Daniele Giacobello February 10, 2008 Abstract The standard linear prediction method exhibits spectral matching properties in the frequency domain due to Parseval’s theorem [1]: ∞ 

1 |e(n)| = 2π n=−∞ 2



π −π

|E(ejω )|2 dω.

(1)

It is also interesting to note that minimizing the squared error in the time domain and in the frequency domain leads to the same set of equations, namely the Yule-Walker equations [3]. To the best of our knowledge, the only relation existing between the time and frequency domain error using the 1-norm is the trivial Hausdorff-Young inequality [2]:  π ∞  1 |e(n)| < |E(ejω )|dω, (2) 2π −π n=−∞ which implies that time domain minimization does not corresponds to frequency domain minimization. It is therefore difficult to say if the 1norm based approach is always advantageous compared to the 2-norm based approach for spectral modeling, since the statistical character of the frequency errors is not clear. In this notes, we provide a proof sketch for a possible spectral interpretation of the linear prediction based on the 1-norm error minimization criterion.

1

1

Linear Prediction of Speech

Linear prediction of speech assumes that a sample of the time serie x(n), assumed to be reduntant and stationary, obtained by sampling a continuous speech signal x(t) can be represented as a linear combination of the previous samples and some error signal e(n) [4, 1]: x(n) =

K 

ak x(n − k) + e(n),

(3)

k=1

In other words, we can consider the time series x(n) as generated by all-pole filtering an excitation signal e(n) through the filter: H(z) =

1−

1 K

k=1

ak

z −k

=

1 , A(z)

(4)

Given the signal x(n) the problem is to determine the prediction coefficients vector a = [a1 , a2 , . . . , aK ]: this is usually done by minimizing the error according to some criterion. We can construct the cost function as depending from the coefficient vector: e(n) = x(n) −

K 

ak x(n − k) for n = N1 , . . . , N2

(5)

k=1

therefore the problem in (5) can be rewritten as a minimization problem: min epp = min x − Xapp

(6)

⎤ ⎤ ⎡ x(N1 ) x(N1 − 1) · · · x(N1 − K) ⎥ ⎥ ⎢ ⎢ .. .. .. x=⎣ ⎦,X = ⎣ ⎦ . . . x(N2 ) x(N2 − 1) · · · x(N2 − K)

(7)

a

having:

a



 1 p p and  · p is the p-norm defined as xp = ( N n=1 |x(n)| ) for p ≥ 1. Even if we did not make any statistical assumption about the signal, by doing this we have actually assumed that the error vector has a generalized Gaussian distribuition [5] with variables indipendent and identically distributed: p (8) p(e) ∝ exp−(λep ) 2

We can see this clearly by approaching the linear prediction problem as a maximum-likelihood (ML) estimation of parameters (8): max p(e) = max p(x|a) = min ln(p(x|a)) = min x − Xapp = min epp (9) a

a

a

a

a

same conclusion as in (6).

2

Spectral Matching Properites of 2-norm based linear prediction of speech

Having considered the signal x(n) generated by an auto-regressive process, we can rewrite (5) in the z−transform domain: K  −k E(z) = 1 − X(z) = A(z)X(z) (10) ak z k=1

Assuming x(n) deterministic, we can apply the Parseval’s theorem, the total error to be minimized is then given by:  1/2 ∞  2 e (n) = |E(ej2πf )|2 df (11) E= −1/2

n=−∞

where E(ej2πf ) is obtain evaluating E(z) on the unit circle z = ej2πf . Denoting the power spectra of the signal as: |E(ej2πf )|2 Sˆxx (f, x) = |A(ej2πf )|2

(12)

σ2 . |A(ej2πf )|2

(13)

and its approximation as: Sxx (f ) =

We can easily see that the spectrum |E(ej2πf )|2 is being modelled by a flat spectrum with magnitude σ 2 , this means that the error signal obtained with 2-norm minimization is an approximation of a white noise, because of this A(z) is sometimes known as “whitening filter”. From (11,12,13) we obtain that the total error can be rewritten as:  1/2 ˆ Sxx (f, x) 2 df (14) E=σ −1/2 Sxx (f ) 3

Thus, minimizing the total error E is equivalent to the minimization of the integrated ratio of the signal spectrum Sˆxx (f, x) by its approximation Sxx (f ). The way the spectrum Sˆxx (f, x) is being approximated by Sxx (f ) is largely reflected in the relation between the corresponding autocorrelation functions. Knowing that r(k) = rˆ(k) [1] for k = 1, . . . , K and that the autocorrelation of x(n) is the fourier transform of its spectrum:  rˆ(k) =

1/2

−1/2

Sˆxx (f, x)ej2πf k df

(15)

and r(k) is the autocorrelation of the impulse response of (4) and also the fourier transform of Sxx (f ), it follows that increasing the value of the order of the model K increases the range over rˆ(k) and r(k) are equal resulting in a better fit of Sxx (f ) to Sˆxx (f, x). Hence, for K → ∞ the two spectra become identical: Sxx (f ) = Sˆxx (f, x) as K → ∞

3

(16)

Linear Prediction Based on the Least Square Error

The most used error minimization criterion is the method of least squares (p = 2 in (6)), this method corresponds to the maximum likelihood approach when the error signal (or, the excitation of the filter in (4)) is considered to be a set of i.i.d. Gaussian variables: e ∼ N (0, Ce )

(17)

where Ce = σ 2 I is a identity matrix multilpied by a costant that corresponds to the variance of the error. One of the reasons for the Gaussian assumption lies in the maximum entropy principle which states that for known values of the first and second moments of a random process, the specific joint probability density which has the largest entropy is the Gaussian probability density. From the definition, the log-pdf will be: ln p(e) = −

1 1 N ln 2π − |Ce | − eT C−1 e e 2 2 2

4

(18)

If we solve (18) by maximazing ln p(e), considering that e = x − Xa we obtain: 

x − Xa ]} (19) aM L = arg min {[ x − Xa T C−1 e a

that has a closed-form unique solution:  −1 T −1 aM L = XT C−1 X Ce x e X

(20)

This becomes, considerng Ce = σ 2 I:  −1 T X x aM L = XT X

(21)

We would like to calculate the probability density function (pdf) as a function of the power spectral density (PSD). Knowing that filtering linearly a white Gaussian process outputs a signal that is still Gaussian process but not (or not necessarly) white, we can model the signal pdf as: x ∼ N (0, Cxx )

(22)

with Cxx that is no more a diagonal matrix (variables not uncorrelated and not indipendent). The log-pdf would be: ln p(x) = −

1 1 N ln 2π − |Cxx | − xT C−1 xx x 2 2 2

(23)

and each term can be made dependent from the PSD thanks to the asymptotic relations (for N → ∞) [6]: |Cxx | =

N 

λk (Cxx ) 

k=1

and: C−1 xx

=

N  k=1

N −1  k=0

 Sxx

2π k N



N −1  1 1 H vk vkH qk qk  2π λk (Cxx ) S ( k) k=0 xx N

(24)

(25)

with vk being a sinusoid that makes k cycles in N samples:     T 1 2πk 2π(N − 1) vk = √ 1, exp j , . . . , exp j N N N

5

(26)

Substituting the relations into (23) we obtain:   H 2    N −1 v x  1 2π N k ln Sxx k + ln p(x) = − ln 2π − 2 2 k=0 N Sxx ( 2π k) N

(27)

Noting that: 1 vkH x = DF TN (x)|ωk = √ X(ωk ) N

(28)

and

 H 2 vk x = 1 |X(ωk )|2 = Sˆx (ωk , x) (29) N represent a trasformation of the observations (the DFT) that corresponds with the periodogram, we can rewrite (27) as:     ˆ N −1 Sxx (ωk , x) 2π N 1 ln Sxx (30) ln p(x) = − ln 2π − k + 2 2 k=0 N Sxx ( 2π k) N In this form, it can result hard to understand, so multiplicating and dividing the second term for the band unit 1/N we have:     ˆ N −1  Sxx (ωk , x) N 1 2π N ln Sxx ln p(x) = − ln 2π − (31) k + 2 2 k=0 N N Sxx ( 2π k) N that for N → ∞ becomes: N N ln p(x)  − ln 2π − 2 2



1/2

−1/2

ln Sxx (f ) +

Sˆxx (f, x) df Sxx (f )

(32)

This is the asymptotic relation that holds until N is sufficiently large (ideally N → ∞). In the case of auto-regressive (AR) parametric spectral estimation, the PSD depends on a set of deterministic parameters θ that are the recursive component of the filter a = [a(1), . . . , a(K)]T and the scaling factor σ 2 : Sxx (f |θ) =

σ2 |A(f |a)|2

θ = [a, σ 2 ]T ∈ RK+1

6

(33)

the log-likelihood for the ML estimation becomes substituting (33) in (32):  N N 1/2 N 2 ln p(x|θ)  − ln 2π − ln(σ ) + ln |A(f |a)|2 df − 2 2 2 −1/2  1/2 N |A(f |a)|2 Sˆxx (f, x)df 2σ 2 −1/2

(34)

 1/2 For monic polynomials (with a(0) = 1) we have −1/2 ln |A(f |a)|2 df = 0, (34) therefore becomes:  1/2 N N N 2 ln p(x|θ)  − ln 2π − ln(σ ) − 2 |A(f |a)|2 Sˆxx (f, x)df (35) 2 2 2σ −1/2 Putting the first gradient to zero in respect to σ 2 : δ ln p(x|θ) N N =0→− 2 + 4 2 δσ 2σ 2σ



1/2

−1/2

|A(f |a)|2 Sˆxx (f, x)df

(36)

and therefore: 2



2

ˆ (a) = σ ˆ =σ

1/2

−1/2

|A(f |a)|2 Sˆxx (f, x)df

(37)

we have the the power depends on the recursive part of the filter a. Sobstituting into the log-likelihood function (35): ln p(x|a, σ ˆ 2 (a))  −

N N (1 + ln 2π) − ln σ 2 (a) 2 2

(38)

this means that maximizing ln p(x|a, σ ˆ 2 (a)) corresponds to minimizing σ 2 (a). It is now clear that the Gaussian maximum-likelihood estimation of the parameters that generated the signal x(n) corresponds to minimizing the integrated ratio of the signal spectrum Sˆxx (f, x) to its approximation Sxx (f |θ) (33). Proceeding with the calculations of the gradients (always assuming a ∈ RK ): δσ 2 (a) δ = δa(k) δa(k)



1/2

−1/2

A(f |a)A∗ (−f |a)Sˆxx (f, x)df

7

(39)

applying the properties of the derivative in the product of functions and developing the calculations, knowing that Sˆxx is real, we obtain that solving (39) is equivalent to solve: 

1/2

−1/2

A(f |a)Sˆxx (f, x)ej2πf n df = 0

(40)

developing the calculations: 

1/2

−1/2

Sˆxx (f, x)ej2πf n df +

K 

 a(k)

1/2

−1/2

k=1

Sˆxx (f, x)ej2πf (n−k) df = 0

(41)

the periodogram Sˆxx (f, x) is the Fourier transform of the sampled autocorrelation function (biased), therefore through (41) we will obtain the YuleWalker equations written now respect to the autocorrelation function [7]: rˆ(k) +

K 

a(k)ˆ r(n − k) = 0 for k = 1, . . . , K

(42)

k=1

We can also see that: 2

a) = rˆ(0) + σ ˆ (ˆ

K 

a(k)ˆ r(k)

(43)

k=1

4

Linear Prediction Based on the Least Absolute Error

Assuming that the process is Gaussian is based upon the fact that the Gaussian assumption is often sufficient for tractable mathematics, but also is based upon a very liberal view of the central limit theorem, which may be loosely stated: “almost any random process put into almost any linear system will come out almost Gaussian.” The linear prediction method based on the least absolute error has only recently started to be used as it does not have a closed form solution as the least square method (20) that can be solved easily. Neverthless, it seems to be really interesting when dealing with the representation of voiced speech were the excitation can be better represented by a sparse impulsive signal. 8

The method introduced in this section corresponds to the assumption that the error signal has a Laplacian probability density function, then the speech signal analyzed will still have a Laplacian distribuition [8]: x ∼ L(0, Cxx )

(44)

The Laplacian pdf, differently from the Gaussian, does not have a simple closed form that includes the covariance matrix of the analyzed signal. Althought many studies have been made in order to fill this gap. According to [9]:  T −1 −N/4+1/2   x Cxx x 2 T −1 KN/2−1 2x Cxx x (45) p(x) = (2π)N/2 |Cxx |1/2 2   where KN/2−1 2xT C−1 x denotes the modified bessel function of the secxx  ond kind and order N/2 − 1 evaluated at 2xT C−1 xx x. Noting that the Bessel T −1 function, for 2x Cxx x sufficiently large, behaves like:       π T C−1 x  KN/2−1 exp − 2xT C−1 x  2x (46) xx xx 2 2xT C−1 xx x we can rewrite the pdf as: 

−N/4+1/2 

   π  exp − 2xT C−1 x xx 2 2xT C−1 xx x (47) √ N −1 : to make it more clear we can rewrite ans set G = 1/ 2π     T −1 −N/4+1/4 2x Cxx x exp − 2xT C−1 x xx p(x)  G (48) 1/2 |Cxx |

2 p(x)  (2π)N/2 |Cxx |1/2

xT C−1 xx x 2

The log-likelihood function becomes: ln p(x) = ln G −

1 N − 1  T −1   T −1 ln 2x Cxx x − 2x Cxx x − ln |Cxx | 4 2

(49)

using the asymptotic relations in (24) and (25) and multiplying and dividing

9

by the band unit1/N we can rewrite as:    1/2 ˆ Sxx (f, x) N −1 ln p(x)  ln G − ln 2N df − 4 −1/2 Sxx (f )    1/2 ˆ N 1/2 Sxx (f, x) df − 2N ln |Sxx (f ) |df 2 −1/2 −1/2 Sxx (f )

(50)

Substituting the relations of (33) and remembering that for monic polyno 1/2 mials −1/2 ln |A(f |a)|2 df = 0, we obtain:    2N 1/2 N −1 ln |A(f |a)|2 Sˆxx (f, x)df − ln p(x|θ)  ln G − 4 σ 2 −1/2  (51)  1/2   N 2N |A(f |a)|2 Sˆxx (f, x)df − ln σ 2 2 σ −1/2 2 evaluating the first derivative of (51) with respect to σ 2 brings us to the following result:    1/2 2N |A(f |a)|2 Sˆxx (f, x)df (52) 2N σ2 = N −1 −1/2 which means that spectral flatness measure for K → ∞ is identical for both the 2-norm and 1-norm error minimization criterion.

References [1] J. Makhoul, “Linear Prediction: A Tutorial Review”, Proc. IEEE, vol. 63(4), pp. 561–580, Apr. 1975. [2] M. Reed and B. Simon, Methods of Modern Mathematical Physics II: Fourier Analysis, Self-adjointness, Academic Press, 1975. [3] P. Stoica and R. Moses, Spectral Analysis of Signals, Pearson Prentice Hall, 2005. [4] J. H. L. Hansen, J. G. Proakis, and J. R. Deller, Jr., Discrete-Time Processing of Speech Signals, Prentice-Hall, 1987. 10

[5] J.-R. Ohm, Multimedia Communication Technology: Representation, Transmission, and Identification of Multimedia Signals, Springer-Verlag, 2004. [6] U. Spagnolini, Statistical Signal Processing for Telecommunications, Politecnico Di Milano Press, 2004. (In Italian) [7] J.D. Markel and A.H. Gray, Linear Prediction of Speech, SpringerVerlag, 1976. [8] T. Eltoft, T. Kim, T. Lee, “ On the Multivariate Laplace Distribution”, IEEE Signal Processing Letters, Vol. 13, no. 5, pp. 300–303, 2006. [9] S. Kotz, N. Balakrishnan, N. L. Johnson, Continuous Multivariate Distributions, Volume 1, Models and Applications, 2nd edition, Wiley, 2000

11

Notes on the Spectral Aspects of Linear Prediction of ...

Feb 10, 2008 - that has a closed-form unique solution: aML = (. X. T. C. −1 e. X. )−1. X. T. C. −1 e x. (20). This becomes, considerng Ce = σ. 2. I: aML = (. X. T. X. )−1. X. T x. (21). We would like to calculate the probability density function (pdf) as a function of the power spectral density (PSD). Knowing that filtering linearly.

112KB Sizes 1 Downloads 166 Views

Recommend Documents

Spectral Numerical Weather Prediction Models
Nov 2, 2011 - time, and vertical discretization aspects relevant for such a model. Spectral Numerical ... National Center for Atmospheric Research, and the European Centre ... conferences, memberships, or activities, contact: Society for ...

On the Solution of Linear Recurrence Equations
In Theorem 1 we extend the domain of Equation 1 to the real line. ... We use this transform to extend the domain of Equation 2 to the two-dimensional plane.

detection of syn flooding attacks using linear prediction ...
A consequence of the SYN flooding attack is that a service can be brought down by sending ... difference of number of SYN and SYN+ACK packets with respect to a network. ... Number of SYN packets is not the best representation for half-open connection

On the use of perceptual Line Spectral pairs ...
India Software Operations Ltd, Bangalore, India. Goutam Saha received his BTech and PhD Degrees from ... are very popular especially in VoIP communication. LSFs are also successfully introduced in speaker recognition task ... A comparison is also sho

COMMERCIAL ASPECTS OF BIOCHIP TECHNOLOGY NOTES 2.pdf
Page 3 of 12. COMMERCIAL ASPECTS OF BIOCHIP TECHNOLOGY NOTES 2.pdf. COMMERCIAL ASPECTS OF BIOCHIP TECHNOLOGY NOTES 2.pdf. Open.

Aspects of knitting science- knitted fabric geometry, notes 2.pdf ...
Tensile resilience. Page 3 of 12. Aspects of knitting science- knitted fabric geometry, notes 2.pdf. Aspects of knitting science- knitted fabric geometry, notes 2.pdf.

Prediction of Head Orientation Based on the Visual ...
degrees of freedom including anterior-posterior, lateral, and vertical translations, and roll, yaw, and pitch rotations. Though the head moves in a 3D-space, the ...

On the indestructibility aspects of identity crisis.
Jul 18, 2008 - interested reader may consult [21] for more information. Following [21], we let Pκ(λ) = {x : x ⊆ λ ∧ |x| < κ}. We say κ is generically measurable if it carries a normal κ-complete precipitous ideal (generic large cardinals we

Aspects of Insulin Treatment
The Valeritas h-Patch technology has been used to develop a .... termed “cool factors,” such as colored and ... and overused sites, and there is a huge stress of ...

Aspects of Insulin Treatment
“modal day” display particularly useful. Data analysis with artificial intelligence software should be designed to recognize glucose patterns and alert patients and.

Investigation of the Spectral Characteristics.pdf
DEDICATION. This work is dedicated to my darling sisters. Mrs. Payman Mahmood. Mrs. Hanaw Ahmad. With love and respect... Whoops! There was a problem ...

Influence of prolonged bed-rest on spectral and ...
39.2(26.1). 17.6(10.8). 10.3(6.6). Activation timing (phase-lead/lag; PHZ) relative to goniometer signal (degrees). 50. 153.4(59.7). А135.3(44.9). 138.5(54.2).

Notes on Nature of Philosophy.pdf
http://smile.amazon.com/Trouble-Physics-String-Theory- Science/dp/061891868X/ref=asap_bc?ie=UTF8. --would Alien mathematicians 'invent' the same ...

On the identification of parametric underspread linear ...
pling for time delay estimation and classical results on recovery of frequencies from a sum .... x(t), which roughly defines the number of temporal degrees of free-.

The distribution of factorization patterns on linear ...
of |Aλ| as a power of q and of the size of the constant underlying the O–notation. We think that our methods may be extended to deal with this more general case, at least for certain classes of parameterizing affine varieties. 2. Factorization pat

On the Linear Programming Decoding of HDPC Codes
The decision boundaries divide the signal space into M disjoint decision regions, each of which consists of all the point in Rn closest in. Euclidean distance to the received signal r. An ML decoder finds which decision region Zi contains r, and outp

On the Interpretability of Linear Multivariate ...
analysis of neuroimaging data. However results ... neural response to an external trigger, e.g. a stimulus time course) the pattern. ˆA mapping S optimally (in the ...

Prediction of Aqueous Solubility Based on Large ...
level descriptors that encode both the topological environment of each atom and also the electronic influence of all other atoms. Data Sets Description. Data sets ...

Aqueous Solubility Prediction of Drugs Based on ...
A method for predicting the aqueous solubility of drug compounds was developed based on ... testing of the predictive ability of the model are described.

On a Probabilistic Combination of Prediction Sources - Springer Link
On a Probabilistic Combination of Prediction Sources ... 2 Prediction Techniques ...... Heckerman, D., Kadie, C.: Empirical Analysis of Predictive Algorithms for.

The biogeography of prediction error
of prediction errors in modelling the distribution of invasive species (Fitzpatrick & Weltzin, 2005). RDMs are conceptually similar to SDMs, in that they assess the ...

Prediction of Aqueous Solubility Based on Large ...
The mean absolute errors in validation ranged from 0.44 to. 0.80 for the ... charges. For these published data sets, construction of the quantitative structure¿-.