Response to the discussion of “Gaussian Process Regression for Multivariate Spectroscopic Calibration” Tao Chen, Julian Morris and Elaine Martin* School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K. *

E-mail: [email protected]; Tel.: +44 191 222 6231; Fax: +44 191 222 5748.

We are grateful to the discussants for their comments which highlight some interesting points, and raise issues to be addressed in future work. Our response to the discussion is given below.

1. Response to Dr. Naijun Sha’s comments

Dr. Sha [1] has proposed an extension to our work which addresses the problem of the development of a Gaussian process regression model for the calibration of multiple response variables. A covariance matrix function was formulated for the problem of multi-response calibration. The approach proposed is analogous to that described in our paper. Prior to receiving the comments from Dr. Sha, we had previously derived a similar form of covariance function for multi-response calibration. The details are discussed below.

Let y i , a column vector, be the q-dimensional response with sample index i, and x i be the corresponding covariate vector. The multi-response linear regression model is:

y i = K i b + εi

(1)

where K i is the Kronecker product of the identity matrix, I ( q × q) , and x Ti : K i = I ⊗ x Ti , and b is the concatenation of the regression vectors for the q response variables: b = [b1T ,L, b Tq ]T . In a Bayesian framework, the prior for the q-dimensional vector,

ε i

, is given by a zero mean Gaussian

Σ distribution: ε i ~ G (0, ) , where the full matrix Σ reflects the correlation between the multiple

responses. The prior for the regression vectors is defined as follows:

1

p (b g ) = G (0, σ b2 I ), g = 1,L, q

The prior distributions for

ε i

(2)

and b g can be regarded as a simplified form of those presented in

ε Sha [1]. More specifically, the correlation between ε i and j , i ≠ j , and between b g and b h ,

g ≠ h , is ignored. To consider this correlation issue, the introduction of matrix variates, and corresponding matrix normal distributions is necessary [1]. Following the approach described in Boyle & Frean [2], the response variables are organized in a long vector as follows:

y = [ y11 ,L, y1n ,L, y q1 ,L, y1n ]T

(3)

where n is the number of training samples. The Gaussian process covariance function between y gi and y hj can be obtained as follows:

Cijgh = Cov( y gi , y hj ) = E ( y gi y hjT ) = E (x iT b g b Th x j + ε gi ε hj ) = x iT E (b g b Th )x j + E (ε gi ε hj ) = σ b2 x Ti x j δ gh + Σ ghδ ij

(4)

where δ ij = 1 if i=j, otherwise δ ij = 0 , and Σ gh is the g-th row and h-th column of Σ . Equation (4) implies that the covariance function comprises two parts, that part which is required to model the correlation between two data points, and that part that reflects the correlation between two response variables.

Furthermore, the covariance function can be extended to more complex forms as proposed in our paper:

p   p  Cijgh = a0 + a1 ∑ xid x jd + v0 exp − ∑ wd ( xid − x jd ) 2 δ gh + Σ ghδ ij d =1  d =1  

Hence the covariance matrix for y can be written as:

2

(5)

C = Iq ⊗ Q +

Σ

⊗ In

(6)

where the i-th row and j-th column of Q is given by:

p  p  Qij = a0 + a1 ∑ xid x jd + v0 exp − ∑ wd ( xid − x jd ) 2  d =1  d =1 

(7)

The covariance matrix in Eq. (6) is a simplified form of that given in Sha [1], but it still accounts for the effect of the multiple responses through the introduction of the hyper-parameter, Σ . However, the implementation of the multi-response Gaussian process using the covariance matrix defined in Eq. (6), or the more general form given in Sha [1], is not straightforward numerically. More specifically, it has been observed in our simulation studies that it is difficult to ensure a positive definite covariance matrix. This is partly due to the increase in the size of the covariance matrix. However it is hypothesized that there is a more serious reason. More specifically it is considered to be a consequence of the increase in the complexity of the covariance matrix, as a result of introducing additional hyper-parameters that account for the correlation between the multiple responses. It is not clear how this issue can be addressed. In the implementation using hybrid Monte Carlo [3], one simplified approach that we have considered has been the rejection of those Monte Carlo samples that result in a non-positive definite covariance matrix. This approach is equivalent to assigning a zero prior probability to these samples. However from the results of the simulations, it was observed that this approach dramatically reduces the acceptance rate of the hybrid Monte Carlo algorithm, and thus this approach is not efficient. Therefore alternative solutions require to be investigated to address this numerical issue.

2. Response to Prof. Philip J. Brown’s comments

Professor Brown [4] has raised a number of important issues which are addressed under the following headings: 1) model interpretability, 2) covariance function and the prior assumptions, 3) computational cost, 4) hierarchical structure and 5) variable selection strategy. The last point has also been raised by Dr. Sha [1].

3

Model interpretability. Model interpretability, i.e. how the transmittance at different wavelengths are weighted in the Gaussian process model, is possible through the approach of “automatic relevance determination” (ARD) [3][5]. The Gaussian process with the covariance function as defined in Eq. (10) of our paper falls within the family of ARD models, therefore each predictor is associated with a hyper-parameter, wd , thereby enabling the determination of the relevance of the corresponding variable in the predictive model. More specifically the wd ’s serve as the “weights” for the predictor variables at different wavelengths.

Covariance function and the prior assumptions. The rationale for specifying the covariance function, and the corresponding prior distributions, has previously been discussed in Rasmussen’s thesis [5]. More specifically, the hyper-parameter, wd , determines the relative relevance of each predictor for the prediction as discussed in the response to the model interpretability issue. Consequently the scaling of the mean for the inverse Gamma prior for wd reflects that a priori the

wd ’s should decrease when p increases, thereby materializing in a small number of relevant predictors when p is large. Empirically, the application of the chosen covariance function and this specific prior been reported in a number of applications, for example Shi et al. [6]. Thus we would conclude that these choices are not specific to the present calibration problem, but are applicable to a wide variety of problems.

Computational cost. We have mentioned in our paper that the computational time scales cubically with sample size. For the two case studies presented in the paper, the CPU time for the training stage, where 1000 MCMC iterations were required, was 27.3 seconds and 558.5 seconds for the “Tablet” (60 training data points) and “Meat” (173 training data points) data sets, respectively. The Gaussian process was implemented using C++, and the program was executed on a Pentium-4 3.0 GHz computer running under Windows XP. For applications with more than 1000 training data points (a rare situation in spectroscopic calibration), sparse training strategies may be required to reduce the overall computational burden (see the references cited in our paper).

Hierarchical structure. To model observations that exhibit cluster structures, a mixture of Gaussian process models can be implemented [6][7]. However, there has been limited research undertaken into the block effects of covariates (e.g. one set of near infrared spectra plus one set of Raman spectra) in the context of Gaussian processes. One possible approach is to assign separate prior distributions to the ARD parameters of each block, and a hyper-prior across the blocks. This 4

Bayesian hierarchical approach appears attractive, but its effectiveness and applicability needs to be assessed through further case studies.

Variable selection strategy. We agree with Dr. Sha and Prof. Brown that variable selection is a powerful approach in regression problem, with regard to improving predictive performance, reducing future measurement costs and alleviating computational burden. We are currently implementing the Markov chain Monte Carlo approach with variable selection within a Gaussian process regression model.

Acknowledgments We appreciate Professor Cliff Spiegelman’s effort to make this discussion happen. T. Chen would like to acknowledge the financial support from the EPSRC KNOW-HOW (GR/R19366/01) and Chemicals Behaving Badly II (GR/R43853/01), and the UK ORS Award for his PhD study.

References

[1] N. Sha. Discussion of “Gaussian process regression for multivariate spectroscopic calibration”, Chemometrics and Intelligent Laboratory Systems, in press, 2006. [2] P. K. Boyle, M. R. Frean. Dependent Gaussian processes. In: Advances in Neural Information Processing Systems 17, MIT Press, 2005. [3] R. M. Neal. Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical Report No. 9702, Department of Statistics, University of Toronto, Canada, 1997. [4] P. J. Brown. Discussion of the paper by Chen, Morris and Martin, Chemometrics and Intelligent Laboratory Systems, in press, 2006. [5] C. E. Rasmussen. Evaluation of Gaussian processes and other methods for non-linear regression. Ph. D. thesis, University of Toronto, Canada, 1996. [6] J. Q. Shi, R. Murray-Smith, D. M. Titterington. Hierarchical Gaussian process mixtures for regression, Statistics and Computing 15 (2005) 31-41. [7] C. E. Rasmussen, Z. Ghahramani. Infinite mixtures of Gaussian process experts. In: Advances in Neural Information Processing Systems 14, MIT Press, 2002.

5

Response to the discussion of “Gaussian Process ... - Semantic Scholar

of a Gaussian process regression model for the calibration of multiple response .... like to acknowledge the financial support from the EPSRC KNOW-HOW ...

37KB Sizes 4 Downloads 96 Views

Recommend Documents

The response of consumption to income - Semantic Scholar
poor data: Campbell and Mankiw (1990), for example, lack international ...... Part of the explanation may be our different handling of seasonally unadjusted data.

RESPONSE CHARACTERISTICS OF RADIATION ... - Semantic Scholar
Acknowledgement: This research was supported by Southern California Edison under contract No. 8T073901 while at San Diego State University. This work became possible with diligent support from. David Deane and Kathryn McCarty while both were at San D

Ferritin and the response to oxidative stress - Semantic Scholar
*Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC 27157, U.S.A., †The Comprehensive Cancer Center,. Wake Forest ...

Panel Discussion: The Future of Biofuel - Semantic Scholar
is not economically competitive with gasoline, then subsidies distort the market by steering investment away from economically attractive gasoline and toward economically unattractive ethanol. Consumer well-being and overall eco- nomic efficiency suf

Ocean mediation of tropospheric response to ... - Semantic Scholar
Feb 25, 2015 - (RGCM) of the US Department of Energy's Office of Science (BER, ... uren, D. P.: Historical (1850–2000) gridded anthropogenic and biomass ...

On the Dynamic Nature of Response Criterion in ... - Semantic Scholar
pants study items from different taxonomic categories, with cate- gories studied ... recognition test containing targets and distractors from each class of stimuli.

On the Dynamic Nature of Response Criterion in ... - Semantic Scholar
Colorado State University. Larry L. Jacoby. Washington University in St. Louis ... Matthew G. Rhodes, Department of Psychology, Colorado State Uni- versity ...... Green, D. M., & Swets, J. A. (1966). Signal detection theory and psycho- physics.

Learning in the Cultural Process - Semantic Scholar
generation, then could a population, over many generations, be .... But we can imagine that, over time, the community of people .... physical representations) into one bit string that can be applied to .... Princeton: Princeton University Press.

The dynamic response of optical oxygen sensors ... - Semantic Scholar
Brian T. Glazer, Adam G. Marsh. ∗. , Kevin Stierhoff, George ... Tel.: +1-3026454367. E-mail address: [email protected] (A.G. Marsh). sediment–water interface ...

Soil Carbon and Nitrogen Storage in Response to ... - Semantic Scholar
Jul 6, 2006 - prevents invasion of woody species (Axelrod, 1985;. Wright, 1980; Archer et al., 1988, 2001). Therefore, prescribed fire is often employed as a land management tool to suppress the encroachment of woody plants into grass-dominated ecosy

Grain Yield Response Of Rice Cultivars Under ... - Semantic Scholar
when evaluated under controlled irrigation condition. ... irrigation condition and b) upland condition with .... cultivar performance for cultivar x location data.

WSTF regulates the H2A.X DNA damage response ... - Semantic Scholar
Dec 17, 2008 - response and expands our knowledge of domains that contain intrinsic ..... Full Methods and any associated references are available in the ...

Distributed forensics and incident response in the ... - Semantic Scholar
the investigator complete access to the underlying filesystem and memory. .... connecting to the Internet over a Network Address Trans- lation device, or behind a .... tions create a dedicated console process that waits for the client to complete ...

Grain Yield Response Of Rice Cultivars Under ... - Semantic Scholar
The correlation, path analysis and drought indices viz., relative yield (RY) and ... correlation and the path analysis were calculated as per INDOSTAT package.

A Relativistic Stochastic Process - Semantic Scholar
Aug 18, 2005 - be a valuable and widely used tool in astro-, plasma and nuclear physics. Still, it was not clear why the application of the so-called Chapman-Enskog approach [4] on this perfectly relativistic equation in the attempt to derive an appr

A multi-process account of startle modulation ... - Semantic Scholar
Recent data suggest that the magnitude of reflex potentiation ... Florida introductory psychology courses who participated for .... Data Analysis and Reduction.

A Process for Separation of Crosscutting Grid ... - Semantic Scholar
applications [6]. The services and tools provided by most Grid platforms can be .... definition of the pointcuts for the aspect should comprise join points of type ...

A multi-process account of startle modulation ... - Semantic Scholar
39.7 (2.9) 39.2 (2.1). 38.5 (4.1). 6300. 37.2 (2.9). 35.6 (5.0) 38.3 (3.7). 37.5 (8.6). 6550. 39.9 (3.3). 39.7 (4.3) 39.3 (3.6). 40.7 (4.9). No picture. 39.4 (1.9).

Lattice form adaptive infinite impulse response ... - Semantic Scholar
State Key Laboratory of Modern Acoustics and Institute of Acoustics, Nanjing University, Nanjing 210093,. China .... cos l•Fl z sin •zBl 1 z. 11. Bl z sin l•Fl z cos •zBl 1 z . For k l, there is. F k,l 1 z. B k,l z .... erated by the computer

Xylem hydraulic adjustment and growth response ... - Semantic Scholar
Apr 18, 2012 - earlywood largely occurs before leaf-out with carbon stored during the previous .... within individual tree data series of xylem traits (Diggle et al. 2002). The models were of ..... the number of big vessels. Growth was inversely ...

THE QUEEN'S UNIVERISTY OF BELFAST TO THE ... - Semantic Scholar
Dec 1, 2000 - XCS Performance and Population Structure in Multi-Step Environments ..... FSW with population-wide subsumption for mutated classifiers ...... Back, MIT Press. Maes, P, (1991b), The Agent Network Architecture, in Proceedings of the 1991.

THE QUEEN'S UNIVERISTY OF BELFAST TO THE ... - Semantic Scholar
Dec 1, 2000 - maximum size. In the Canonical LCS there are two message lists - an input message list, and an output message list. The input message list receives input messages from the environment and ...... Table 4.2 - Converged Predictions to two

Letters to the Editor - Semantic Scholar
Loewy RL, Bearden CE, Johnson JK, Raine A, Cannon TD: The. Prodromal Questionnaire (PQ): preliminary validation of a self- report screening m easure for prodrom al and ... vention of a psychosis is a therapeutic aim. An illness does not have to be se

Letters to the Editor - Semantic Scholar
Am J Psychiatry 168:11, November 2011 ajp.psychiatryonline.org. 1221. Field Testing Attenuated Psychosis Syndrome. Criteria. To the Editor: Attenuated psychotic symptoms that mani- fest before the first psychotic episode of schizophrenia are an impor