1

A Novel Method for Objective Evaluation of Converted Voice and Correlation with Subjective Score Dr. Arun Kumar, Dr. Ashish Verma, Daya Shanker Khudia, Rajat Agarwal  Abstract—this paper describes a novel method for objective evaluation of transformed voices. We have implemented and utilized a likelihood ratio based speaker verification system to objectively evaluate the transformed voices. We have performed subjective tests, MOS and hearing tests, to judge our purposed method. After normalization of log likelihood ratio, obtained from speaker verification, correlation with subjective score was calculated. Correlation results show that this method can be used to objectively evaluate the converted voices and thus tediousness of subjective tests can be avoided. Index Terms—Speaker, Objective, Subjective

I.

Verification,

Likelihood

ratio,

INTRODUCTION

In the literature subjective tests exist for evaluation of voice transformation system. Voice transformation refers to the process of modifying the speech signal in a person’s voice so that it sounds as if spoken by another person. However to evaluate the quality of voice transformation system no objective tests exist so far. The subjective tests are quite tedious and require a large no of listeners that rank the system after listening approximately 100 sentences. Only then a reliable estimate of the quality of speaker verification system can be achieved. It also requires the listeners to be trained appropriately by giving them suitable instructions on how to rate the converted sentences. There is also a large probability of existence of bias towards the source speaker while rating the converted sentences if the text of the converted sentence is similar to that spoken by the source speaker. Hence, the selection of testing sentences is also a tedious exercise and requires careful attention. The tests also need to be conducted in the laboratory under carefully controlled conditions and noiseless environment. All these difficulties motivate the development of an objective method to evaluate the voice conversion system which is not tedious and is free of all the biases and errors that exist in the subjective methods. The proposed objective method uses likelihood ratio based speaker verification system [1] for objective evaluation and correlation was calculated between the scores given by the speaker verification system and subjective tests. To generate the

subjective scores we have developed our own subjective test which is a slight modification of DCR tests and XAB tests already existing in the literature. Correlation values indicate that proposed objective method can be used in place of the subjective methods existing in the literature. II. SPEAKER VERIFICATION SYSTEM A. System description The area of speaker recognition is concerned with extracting the identity of the person speaking the utterance. The general area of speaker recognition is divided into two specific tasks: verification and identification. In verification, the goal is to determine from a voice sample if a person is whom he or she claims. In speaker identification, the goal is to determine which one of a group of known voices best matches the input voice sample. We have developed text independent speaker verification system. Any speaker recognition system has two phase namely training and testing phase. Training phase of any speaker recognition system has two main parts: feature extraction and statistical modeling. Feature extraction [3] is the front end of the speaker verification system and we are utilizing 39 dimensional Mel Frequency Cepstrum Coefficients [2] (MFCC’s) as feature vector for building of speaker model. 39 components of a vector consist of 1 energy and next 12 MFCC coefficients and then delta and delta-delta coefficients appended to these.

Figure 1. Training phase of speaker verification system The second step consists in obtaining a statistical model from these parameters. Gaussian Mixture Models (GMM) are the representative parametric models and widely used in the speaker verification tasks. This training scheme is also applied to the training of a background model. Figure 1 shows the general block diagram for training phase of the system.

2 Figure 2 shows a block diagram representation of the test phase of a speaker verification system. The entries of the system are a claimed identity and the speech samples pronounced by an unknown speaker.

Figure 2. Test phase of a speaker verification system The purpose of a speaker verification system is to verify if the speech samples correspond to the claimed identity. First, speech parameters are extracted from the speech signal using exactly the same module as for the training phase. Then, the speaker model corresponding to the claimed identity and a background model are extracted from the set of statistical models calculated during the training phase. Finally, using the speech parameters extracted and the two statistical models, the last module computes some scores, normalizes them, and makes an acceptance or a rejection decision. The normalization step requires some score distributions to be estimated during the training phase or/and the test phase. B. Database Training data: The speech utterances, in our database, consist of 20 sentences of speech from each speaker in the form of continuous Hindi sentences sampled at 16 kHz. We have got a total of 12 speakers, out of which 8 were male and 4 were female speakers. Test data for speaker verification system: Test data consists of about 10 sentences different from those used in training of GMM. Data for testing purpose consists speakers ak, ash, axs, dxh, nit, pxk, vpg, and vxt only. Beside this we have speech utterances from two speakers which are from outside of training and test data. C. Experiments We have taken 2, 3, and 4 seconds of speech from all the test speakers, from their all sentences and calculated likelihood ratio, based on which false rates and miss rates are plotted in an ROC curve to adjust optimum threshold value of likelihood for speaker verification system.

III. OBJECTIVE AND SUBJECTIVE TESTS. A. Objective tests The objective experiments were performed using the likelihood ratio based speaker verification system. The likelihood scores generated by the speaker verification system

need to be normalized before they can be used for correlation with the subjective scores. Normalization is essential because it is inherently easy for some of the speakers to be distinguished easily from the background population. These speakers may have strong individuality traits that make it easy for them to be verified. However, for some of the speakers it is inherently difficult to get verified as they resemble more to the background population. So to take into account these factors we perform different kind of normalization one of them being the initial distance normalization i.e. we calculate the distance between the likelihood scores of the transformed speech and the likelihood scores of source speaker and we normalize this improvement in likelihood score with initial distance between likelihood scores of source speaker and target speaker which finally gives us an estimate of the closeness of the transformed voice with that of the target speaker. B. Subjective tests The hearing test was used to assess the closeness of the perceived individuality of the transformed speech signal with that of actual target speaker. A scale of 1 to 5 is used for the scores where the individual scores represent the following perceptual scenarios. 5 Similar 4 Slightly similar 3 Difficult to decide 2 Slightly dissimilar 1 Dissimilar For Hearing test also, the speech sentence(s) in the voice of target speaker was played to a subject the sentence in the target speaker’s voice was different from the transformed sentence so that the experiment is not biased due to the reading style of the speaker for a particular text sentence. For analyzing the closeness of the transformed speech to the target speaker 3 to 4 sentences in the voice of the target speaker were played back to the subject so that they form a subjective opinion of the overall speaking rate and speaking style (the frequency of pauses, duration of pauses, etc.) of the target speaker. The transformed sentences, based on the different transformation techniques, were then played back to the subjects in a random order for rating. The subjects were asked to rate on the basis of similarity perceived with the target speaker rather than on the basis of degradation perceived in speech quality with respect to target speaker.

IV. RESULTS The objective tests were performed for 4 sets of source speakers namely ak, ash, pxk, nit with each of the source speakers voice being transformed to 4 target speakers ak, ash, pxk, vpg. These speakers are selected randomly out of the 12 speakers used for modeling the speaker verification system. Corresponding to each source-target pair we have 10

3 sentences. So in all we have 160 transformed sentences on which objective experiments were performed. Corresponding to each of these 160 transformed sentences we have extracted 2 sec, 3 sec and 4 sec of speech and performed experiments separately on them. Table 2 gives us percentage verification rate for source, converted and target if we use 2 sec, 3 sec and 4 sec length of speech.

REFERENCES [1]

[2]

Source is getting verified as target Converted file is getting verified as target Actual target is getting verified as target

2 sec of speech

3 sec of speech

4 sec of speech

9.2 %

6.9%

6.9%

53%

57%

59.2%

79%

85%

86%

Table 1: Verification rate for source sentences and converted sentences against target model The Hearing tests were performed with 5 subjects. The correlation values are given in Table 2 and Table 3. Subject Correlation with s objective scores Gaurav 0.50 Harish 0.53 Brijesh 0.52 Kapil 0.24 Anshul 0.25 Table 2: Correlation of different subjective scores with objective scores. Subject Correlation with s Gaurav Harish 0.65 Brijesh 0.31 Kapil 0.48 Anshul 0.50 Table 3: Correlation between scores given by different subjects. V. CONCLUSIONS Correlation between subjective scores is not more than .65 , so we normally do not accept correlation between objective and subjective score greater than this value. We are getting a correlation around .50 in most cases, which is good value considering the fact that correlation between subjective scores themselves is not more than .65. Hence, the proposed method can be used to objectively evaluate the converted voices.

[3]

Reynolds D.A. and Rose R.C., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, Vol. 3. No 1. Jan 1995. L.R. Rabiner and R.W. Schafer, “Digital Processing of Speech Signals”, Pearson Education 2005. Lawrence Rabiner and Biing-Hwang Juang, “Fundamentals of Speech Recognition”, Pearson Education 2003.

A Novel Method for Objective Evaluation of Converted Voice and ...

Objective, Subjective. I. INTRODUCTION. In the literature subjective tests exist for evaluation of voice transformation system. Voice transformation refers to the.

49KB Sizes 2 Downloads 334 Views

Recommend Documents

A General and Simplified Method for Rapid Evaluation ...
the measured capacitance obtained by an RF S- parameter measurement. Fig. 2. Example of Dektak trace: scan path (bottom) and surface profile (top). Next, a low-frequency CV measurement was ... Using Coventorware [10] as FEM software, the same 'experi

Evaluation of a Personalized Method for Proactive Mind ...
Learner models are at the core of intelligent tutoring systems (ITS). The development ..... When are tutorial dialogues more effective than reading?. Cognitive Science ... International Journal of Artificial Intelligence in Educa- tion (IJAIED), 8 ..

Evaluation of a Personalized Method for Proactive Mind ...
1Deparment of Computer Science and 2Department of Psychology, .... were standardized by school to alleviate any large discrepancies due to demographic.

Globalization and Business Objective Evaluation ...
Dec 7, 2017 - Ago-Dic 2017 https://sites.google.com/site/supradg/. Objective ... o Preferential Trade Agreement (PTA) o Free Trade Area (FTA) o Customs Union o Monetary and Fiscal Union o Economic Union/ Political Union o Regionalism ... Editor: Vand

The Method of Separation: A Novel Approach for ...
emerging field of application is the stability analysis of thin-walled ...... Conf. on Computational Stochastic Mechanics (CSM-5), IOS Press, Amsterdam. 6.

A Novel Method for Measuring and Monitoring ...
May 3, 2005 - constructed and measurements were made by observer 2. Plane 1 was used as the ... transferred to the SSD mode. The creation of the plane.

A novel method for 3D reconstruction: Division and ...
object with a satisfactory accuracy, multiple scans, which generally lead to ..... surface B leads to a non-overlapping surface patch. ..... automation, 2009. ICRA'09 ...

A Method for Metric-based Architecture Quality Evaluation
metric counts the number of calls which are used in .... Publishing Company, Boston, MA, 1997. [9]. ... Conference Software Maintenance and Reengineering,.

A Novel Method for Travel-Time Measurement for ...
simulation results obtained through use of a simulation program developed by the ... input data is taken from first-arrival travel-time measurements. The .... Data Recovery: ... beginning at 7 msec, at z=0, the free surface, corresponds to a wave.

A Novel Method for Travel-Time Measurement for ...
simulation results obtained through use of a simulation program developed by the authors. ... In contemporary modern wireless communications systems.

A novel method for measuring semantic similarity for XML schema ...
Enterprises integration has recently gained great attentions, as never before. The paper deals with an essential activity enabling seam- less enterprises integration, that is, a similarity-based schema matching. To this end, we present a supervised a

Development of new evaluation method for external safety ... - Safepark
Under Responsible Care companies follow these six principles: .... In this mobile centre the involved fire chiefs (or police chiefs) can plan how best to deal with ...

A method for the evaluation of meaning structures and its application ...
A method for the evaluation of meaning structures and its application in conceptual design.pdf. A method for the evaluation of meaning structures and its ...

Development of new evaluation method for external safety ... - Safepark
A fascinating description of the development of Responsible Care to a world wide ... checked by a call from the emergency response centre to each control room.

A method for the evaluation of meaning structures and its application ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. A method for the evaluation of meaning structures and its application in conceptual design.pdf. A method for

Information-theoretic Measures for Objective Evaluation ...
the viewpoint of classification applications. Using these ... been used in classification applications. Taking binary ...... numerical examples were done using the open source soft- ...... Clinical Monitoring and Computing, 1995, 11(3): 189−206.

Objective evaluation of spatial information acquisition ...
Unfortunately, these interfaces are known to distort and to reduce information flows: the operator has a partial and incorrect knowledge concerning the remote world status. As well, all operator's actions and motor intents are not fully taken into ac

A novel discriminative score calibration method for ...
For training, we use single word samples form the transcriptions. For evaluation, each in- put feature sequence is the span of the keyword detection in the utterance, and the label sequence is the corresponding keyword char sequence. The CTC loss of

A novel video summarization method for multi-intensity illuminated ...
Dept. of Computer Science, National Chiao Tung Univ., HsinChu, Taiwan. {jchuang, wjtsai}@cs.nctu.edu.tw, {terry0201, tenvunchi, latotai.dreaming}@gmail.com.

A novel time-memory trade-off method for ... - Semantic Scholar
Institute for Infocomm Research, Cryptography and Security Department, 1 ..... software encryption, lecture notes in computer science, vol. ... Vrizlynn L. L. Thing received the Ph.D. degree in Computing ... year. Currently, he is in the Digital Fore

Development of a Novel Method To Populate Native ... -
in the TCEP reduction mixture (viz., the four des species and the four 1S ..... we now have powerful tools to study the rate-determining steps in the oxidative ...

A Novel Nano Cellulose Preparation Method and Size ...
degradability and that it originates from renewable resources. There .... °C. This gives an energy consumption of 2,7356 GJ/t or 760 kWh/t. It is possible to ...