Multi-Speakers Direction-Of-Arrival Finding Using ICA ...

Viewer
Transcript

Multi-Speakers Direction-Of-Arrival Finding Using ICA M. Atashbar and M. H. Kahaei

Department of Electrical Engineering, Iran Univ. of Science & Technology, Tehran, 16844 Iran Email: atashbarLee.iust.ac.ir, kahaeiLiust.ac.ir Abstract: Multi-speakers direction finding and localization play important roles in a variety of applications. In this paper, a new quantitative measure of separation ratio is defined and then the Independent Component Analysis (ICA) is used to detect signal frequencies of different speakers. We then apply the single source direction finding technique to each separated signal. Using this technique, we expect to obtain a higher angular resolution for more than two speakers.

II. Introduction Speaker direction finding and localization can play important roles in wireless applications. As such, we may refer to automatic video conferencing in which the system estimates the speaker direction and steers a camera to track the speaker. This can widely be used in different applications of wireless systems such as distance-learning or network sensors. In automatic voice recognition, a beam is directed towards a speaker in order to improve the recognition rate. In hearing aid devices, the speech intelligibility is improved by aiming an optimal audio beam at the desired direction. The problem of estimating a source direction using a multi-sensor array has been studied extensively. A most commonly used strategy to solve this problem in the case of a single source, employs estimation of the time delay between two sensors. This strategy, however, is problematic when two or more speakers speak simultaneously in the presence of background noise. Estimating the speaker's direction is especially problematic when the bearing separation between the desired speaker and the competing voice source is small and the amplitude of both voice signals is similar. In the case of speech signals, certain characteristics can assist in distinguishing between the sources. One such feature is the spectral signature of each speaker. Baruch Berdugo used the frequency domain representation to separate two simultaneous speakers even during time intervals where the speech signals overlap. This was based on the fact that the spectral signature differs from one person to another, even for the same utterance. They used quantitative spectral differences between individual speakers, and showed that some 40% of the spectra in the range of 0-5 KHz frequency band differ by at least 10 dB, even when the two speakers make the same utterance at the same time with similar intensities. Next, the Time Delay Direction Finding (TDDF) algorithm was separately applied to each frequency to obtain a set of estimated DOA vectors. II. Our Proposition In this paper, to quantify the spectral separation between several speakers (more than two as considered by Baruch Berdugo), a new quantitative measure of the separation ratio is defined. Let Xi (k) (i = 1, 2,..., L k = 1, 2,..., N) be the N-point DFT of L speakers' signals. The definition of the spectral separation ratio (SSR) between L signals is given

by

SSRt SSL =

length {ARL (k) > TH} k

N

,

AR(k)

[

=nx

201go i( )-201og

L

( Y=1 )j

where AR (Amplitude Ratio) is the ratio of one signal to the other signals amplitudes at each frequency, the length operator counts the number of elements in a vector, and TH indicates the separation threshold. When L increase,

0-7803-9521-2/06/$20.00 §2006 IEEE.

1212

Hierarchical Constrained Local Model Using ICA and Its Application to ...